File size: 13,334 Bytes

---
license: mit
library_name: xgboost
pipeline_tag: tabular-regression
tags:
  - tabular-regression
  - ens
  - ethereum
  - web3
  - domain-names
  - price-prediction
  - nft
datasets:
  - quantumly/ens-appraiser-data
base_model: sentence-transformers/all-mpnet-base-v2
metrics:
  - r_squared
  - mape
  - rmse
model-index:
  - name: ENS Appraiser v0.2
    results:
      - task:
          type: tabular-regression
          name: ENS Domain Price Prediction
        dataset:
          name: ENS Appraiser Multi-source Training Data
          type: quantumly/ens-appraiser-data
        metrics:
          - type: r_squared
            value: 0.3081
            name: R² (log USD, test)
          - type: median_ape
            value: 1.383
            name: Median APE (test)
          - type: rmse
            value: 1.5469
            name: RMSE (log USD, test)
---

# ENS Appraiser v0.2

A gradient-boosted regressor that predicts the USD sale price of an
ENS (`.eth`) domain name from on-chain history, semantic embeddings of the
label, and macro-market context.

This is the **v0 baseline** — handcrafted features + mpnet PCA + KNN
comparable-sale aggregates. Built to establish an honest, leakage-free
floor that future versions improve on.

## Quick numbers

Trained on ~265k ENS secondary sales (Jan 2022 – Sep 2023), evaluated on
2,744 sales in **Q1–Q2 2024** (held out by date, never seen during training):

| Split | n      | R² (log USD) | RMSE (log USD) | Median APE | Bias  |
|-------|--------|--------------|----------------|------------|-------|
| Train | 265,240 | 0.7700      | 0.7744         | 32.5%      | +0.000 |
| Val   | 3,545   | 0.6602      | 1.0678         | 57.0%      | +0.203 |
| Test  | 2,744   | **0.3081**  | 1.5469         | 138.3%     | +0.732 |

**Plain-English read:** for a typical mid-tier name in test, the model is
within ~2× of the actual sale price. The long tail — celebrity names,
3-letter premiums, regime shifts — is where it misses, often by 100×+ in
either direction.

## What's good

- **Mid-tier names, $50–$5,000 range:** usually within 2× of actual.
- **Length and character composition:** strong signals captured well.
  The model knows 3-letter ASCII names are premium and 12-letter random
  handles are cheap.
- **Wordlist hits:** matches against Wikipedia, GeoNames, US first names,
  stock tickers, and SEC EDGAR are picked up correctly. `paris.eth` is
  flagged as a city, `nike.eth` as a brand.
- **Comparable-sale anchoring:** the top two features are `knn_mean_log`
  and `knn_p90_log` — the model leans heavily on "what did similar names
  sell for recently?" which is the right intuition for valuation.

## What's not

- **Celebrity / brand premium:** a name's value to a known buyer
  (Coinbase wanting `coinbase.eth`, a luxury brand wanting their mark)
  is invisible to this model. It can detect that `nike.eth` is a brand
  word, but not that the sale price reflects Nike's interest specifically.
- **3-letter premium tail:** names like `mph.eth`, `uma.eth` sold for
  $20k–$40k in test; the model predicted $100–$200. The training set
  underweights short premiums because most sales there are 5+ letters.
- **Regime shift on test:** test set median price is ~4× higher than
  training median due to the 2023 → 2024 ENS market shift. Recency-weighted
  training (1-year half-life) helps but doesn't fully close the gap.
- **Bidirectional errors:** worst predictions split roughly evenly
  between under-prediction (hot names the model didn't recognize) and
  over-prediction (cold names that just didn't move). 138% MedianAPE is
  honest but uncomfortable.

## How it's built

| Component | Detail |
|---|---|
| Algorithm | XGBoost regressor (170 boosted trees, max_depth=7) |
| Target | `log(sale_price_usd)` |
| Features | 146 total |
| Training data | 265,240 sales, Jan 2022 – Sep 2023 |
| Training time | ~10 min on a single A100 |
| Model size | 3.3 MB |

### Feature breakdown

- **Handcrafted (15):** length, n_digits, n_letters, n_special, palindrome,
  is_all_digits, is_all_letters, is_ascii, has_unicode, starts/ends_digit,
  max_char_run, n_unique_chars
- **Wordlist hits (8):** Wikipedia titles, GeoNames cities, US first names,
  ISO 3166 countries, stock tickers, SEC EDGAR companies, Wiktionary EN,
  plus a `wordlist_hits` total
- **Grails clubs (~45):** binary membership in each curated `.eth` club
  (`999club`, `pre-punks`, `palindromes`, `pokemon_gen1`, etc.)
- **Trademark conflict (1):** active USPTO mark in Nice classes 9, 35, 36,
  38, 41, 42, 45 with matching `mark_text_norm`
- **Holder behavior (2):** `name_age_days`, `prior_transfer_count`
  (leakage-safe — only counts transfers strictly before the sale block)
- **Macro context (5):** Fear & Greed Index, ETH chain TVL, ETH stablecoin
  market cap, ETH DEX volume, total NFT marketplace fees on the sale day
- **mpnet PCA (64):** 768-dim `all-mpnet-base-v2` embeddings of the label,
  PCA-reduced to 64 dims (95% explained variance)
- **KNN comparable sales (8):** for each label, FAISS-retrieve top-50
  semantic neighbors (HNSW index), filter near-duplicates (sim > 0.999),
  take the most-recent prior sale of each, aggregate as `knn_count`,
  `knn_mean_log`, `knn_median_log`, `knn_p90_log`, `knn_max_sim`,
  `knn_min_sim`, `knn_log_max`, `knn_log_min`. **Strict leakage prevention:**
  only neighbors with sales **before** the current sale's date count.

### Top 10 features by gain

| Rank | Feature | Gain |
|---:|---|---:|
| 1 | `knn_mean_log` | 1,714 |
| 2 | `knn_p90_log` | 1,613 |
| 3 | `len` | 1,364 |
| 4 | `in_wikipedia` | 1,052 |
| 5 | `is_all_digits` | 944 |
| 6 | `knn_median_log` | 604 |
| 7 | `n_digits` | 338 |
| 8 | `pca_000` | 289 |
| 9 | `n_clubs` | 282 |
| 10 | `ends_digit` | 277 |

Five of the top ten are KNN-comp or PCA features, which means the
embedding pipeline is doing real work — it's not just paying for itself,
it's the dominant signal alongside length.

## Training data + leakage controls

Built from the [`quantumly/ens-appraiser-data`](https://huggingface.co/datasets/quantumly/ens-appraiser-data)
dataset:

- **Sales labels:** Alchemy `getNFTSales` for ENS BaseRegistrar + NameWrapper
  contracts. Wei amounts converted to USD via CoinGecko hourly OHLC at
  the sale's block timestamp. **Coverage gap:** Alchemy `getNFTSales` v2
  truncates at block 19,768,978 (May 2024) and does not index Blur
  marketplace sales. v0 ships with this gap; closing it is a v1 priority.
- **Registrations + transfers:** The Graph's [ENS subgraph](https://thegraph.com/explorer/subgraphs/5XqPmWe6gjyrJtFn9cLy237i4cWw2j9HcUJEXsP5qGtH).
- **Wordlists:** Wiktionary dumps, Wikipedia EN article titles, GeoNames
  `cities500`, US Census baby names, NASDAQ Trader ticker dumps,
  SEC EDGAR company tickers, ISO 3166 country list.
- **Macro:** alternative.me Fear & Greed Index, DefiLlama (TVL, stablecoin
  mcap, DEX volume, NFT marketplace fees).
- **Trademarks:** USPTO Trademark Case Files Dataset (annual research dump).
- **Embeddings:** `sentence-transformers/all-mpnet-base-v2`, encoded once
  for all 3.5M ENS labels in the dataset.

### Leakage controls

The first version of this model accidentally leaked future information
through `lifetime_transfer_count` (it counted *all* transfers ever for a
labelhash, including transfers that happened *after* the sale being
predicted). The leaky model showed **train R² 0.81 / test R² −0.29** — the
classic catastrophic-overfit signature where the model collapses to
predicting the population mean on held-out data.

The current model uses `prior_transfer_count`, which only counts transfers
where `transfer_block < sale_block` per row. It moved to rank #11 in
feature importance (was #1 by 3.3×). KNN comparable-sale features have a
similar safeguard: a neighbor's sale only counts if it happened strictly
before the sale being predicted.

### Train/Val/Test split

Fixed-window temporal split:

- **Train:** sales with `sale_date < 2023-10-01`
- **Val:** sales 2023-10-01 → 2023-12-31
- **Test:** sales 2024-01-01 onwards

This prevents the v0.1 mistake of training on 2022 prices and asking the
model to extrapolate to a 2024 market regime that's ~4× more expensive
on average. Val and test are in the same regime so val RMSE is a
meaningful proxy for test.

Training rows are weighted with an exponential recency decay (1-year
half-life, normalized to mean=1.0) so the model leans on 2023 dynamics
without throwing away the older data entirely.

## Intended use

This model is intended for **research and analytics**, not as a price
oracle and not for live trading.

**Reasonable uses:**

- Bulk valuation of mid-tier ENS portfolios for tax/accounting purposes
- Identifying obviously over- or under-listed names on secondary markets
- Sanity-checking a listing price before posting
- Producing comparable-sale ranges for negotiation context

**Out of scope:**

- Pricing 3-letter, 1-2 letter, or otherwise-premium names with confidence
- Pricing celebrity / known-brand names where the buyer pool is concentrated
- Predicting prices for names in the post-May-2024 marketplace mix
  (Blur dominance, marketplace fee changes)
- Any high-stakes financial decision based on a single point estimate

## Limitations

- **Sales coverage**: Jan 2022 – May 2024 only, no Blur. ~2 years of recent
  sales (mid-2024 onwards) are missing entirely from training. Closing
  this gap requires either a new sales source (Reservoir/SimpleHash both
  defunct as of 2024–2025) or direct `eth_getLogs` decoding of Seaport,
  Blur, X2Y2, LooksRare events, planned for v1.
- **Celebrity premium**: there's no feature here for "is this a famous
  person/place/thing?" beyond Wikipedia-title matching. v1 adds
  LLM-derived structured features (`fame_score`, `name_kind`,
  `crypto_relevance`, `brand_collision_risk`) which should close most
  of this gap.
- **Out-of-distribution labels**: pure-digit labels (`0001`),
  punycode/emoji, and l33tspeak get less benefit from mpnet embeddings
  since they're out of distribution for the pretrained model. Length and
  charset features partially compensate.
- **Time drift**: the ENS market shifts noticeably every 6–12 months as
  marketplace dominance, fee structures, and DAO actions move. Predictions
  on names sold "right now" will lag any regime shift since the training
  cutoff.
- **Test-set thinness**: only 2,744 sales meet the $10 floor and post-Jan-2024
  cutoff. The reported test R² has roughly ±0.08 95% CI — useful as a
  ballpark, not a precise number.

## How to use

```python
from huggingface_hub import hf_hub_download
import xgboost as xgb
import pickle

model_path = hf_hub_download(
    repo_id="quantumly/ens-appraiser",
    filename="v0_appraiser_xgb.json",
)
pca_path = hf_hub_download(
    repo_id="quantumly/ens-appraiser",
    filename="v0_pca_mpnet.pkl",
)

booster = xgb.Booster()
booster.load_model(model_path)
with open(pca_path, "rb") as f:
    pca = pickle.load(f)

# Inference also requires:
#  1. mpnet embedding for the label (sentence-transformers/all-mpnet-base-v2)
#  2. Handcrafted/wordlist/club/trademark/holder/macro features
#  3. KNN comp lookup against the dataset repo's FAISS index
#
# A self-contained inference notebook is planned in the dataset repo.
```

The 146 features expected by the booster are listed in `v0_metadata.json`
under `feature_cols`, in the exact order required by `xgb.DMatrix`.

## Reproducibility

The training notebook ([`v0_appraiser_v2.ipynb`](https://huggingface.co/datasets/quantumly/ens-appraiser-data/blob/main/notebooks/v0_appraiser_v2.ipynb))
runs end-to-end on a Colab A100 high-RAM instance in ~25 minutes:

1. Downloads all source parquets from the dataset repo
2. Reconstructs USD prices via CoinGecko hourly OHLC join
3. Resolves labels for both BaseRegistrar and NameWrapper sales
4. Computes all features
5. Builds HNSW index for KNN
6. Trains XGBoost with early stopping
7. Saves model + metadata + diagnostics
8. Uploads to this model repo

All randomness is seeded (`seed=42` for XGBoost, PCA, sample weights).

## Roadmap

**v1 priorities** (in expected R² delta order):

1. **LLM-derived features** — Llama 3.1 8B local inference over all 3.5M
   labels, extracting `fame_score`, `name_kind`, `cultural_origin`,
   `crypto_relevance`, `brand_collision_risk`, plus a description-embedding.
   Expected delta: +0.05–0.10 test R².
2. **Recent sales backfill** via direct `eth_getLogs` decoding of
   Seaport / Blur / Wyvern / X2Y2 / LooksRare events. Closes the
   May 2024 → present coverage gap and adds Blur. Expected delta:
   +0.03–0.06 test R² and a much bigger test set.
3. **Multi-embedding ensemble** — concatenate mpnet with `bge-base-en-v1.5`
   and `e5-base-v2`, PCA the joint space. Expected delta: +0.02–0.04.
4. **Cross-encoder reranker** for KNN comps. Expected delta: +0.02–0.03.
5. **Contrastive fine-tuning** of mpnet on price-similarity triplets.
   Expected delta: +0.03–0.05.

## Citation

```bibtex
@misc{ens_appraiser_2026,
  author    = {Drobnič, Nejc},
  title     = {ENS Appraiser v0.2},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/quantumly/ens-appraiser}
}
```

## License + contact

MIT. Questions, corrections, pull requests: nejc@nejc.dev