SDG SFT Round-2 LoRA Adapter (v0.1)

A second-round LoRA adapter on Qwen/Qwen3.5-9B-Base, trained on a corpus generated by zndx/sdg-sft-r1. Released specifically to demonstrate a negative result on naive iterated self-distillation — overall reward continues to rise, but only by raising the floor on hard scenarios. The ceiling on easy scenarios saturates and AUC discrimination regresses.

Status: v0.1, peer-review preview. Curator: @zndx

Headline result

Held-out 50-scenario evaluation, mean R across 4 generations per scenario:

Stage	overall mean R	good_mean	bad_mean	R_A pass rate	AUC
Base	0.208	0.205	0.210	0.55	0.478
SFT-r1	0.289	0.311	0.268	0.68	0.590
SFT-r2 (this adapter)	0.318	0.309	0.327	0.76	0.475

The +10 % round-2 gain comes entirely from bad-scenario reward (+22 %). Good-scenario reward is effectively tied between r1 and r2 (0.311 vs 0.309). AUC discrimination regresses to ~0.48 — the model no longer distinguishes scenarios by quality. R_A pass rate continues climbing (0.55 → 0.68 → 0.76).

Why this happens (mode collapse)

Round-2 corpus statistics show the mechanism directly:

Corpus	Source policy	n samples	R mean	Unique template_ids
v2	base	665	0.522	171 / 540
v3 (used here)	SFT-r1	740	0.513	88 / 540

The SFT-r1 policy strongly prefers a narrower set of catalog templates. When that policy generates the round-2 corpus, the new training distribution is half as diverse as the round-1 corpus. SFT-r2 then over-specialises on that narrower subset, raising its average reward on samples it has seen while losing generalisation flexibility.

This is a clean experimental demonstration of why naive iterated self-distillation requires explicit diversity preservation — mix-in of round-1 samples, anti-clustering penalties in the reward, or higher round-2 sampling temperature.

Training details

Identical hyperparameters to SFT-r1 except for the input corpus:

Hyperparameter	Value
Base model	`Qwen/Qwen3.5-9B-Base`
Source corpus	`rejection_samples_v3.jsonl` (740 samples from SFT-r1)
Trainable params	29.1M / 8.98B (0.32 %)
LoRA rank `r`	16
Epochs	2
Total grad steps	94
Final train loss	0.139 (vs SFT-r1's 0.216 — 36 % lower)
Final token accuracy	96.4 %
Final entropy	0.126
Wall time	77.8 min on 2× RTX 4090

Lower final train loss is consistent with mode collapse: the corpus is more self-similar, so SFT can fit it more tightly.

When to use this vs SFT-r1

For most generation tasks: use SFT-r1 (zndx/sdg-sft-r1). It generalises better and the AUC discrimination is meaningful.
For research on iterated self-distillation / mode collapse: use SFT-r2 to reproduce the negative result, or as a "before" baseline for a diversity-preserving variant.

Related artifacts

zndx/sdg-bertopic-correspondence-v0.1 — the corpus that fed both adapters.
zndx/sdg-sft-r1 — the first-round adapter (recommended for downstream use).

Citation

@misc{sdg-sft-r2-v01,
  title  = {SDG SFT Round-2 LoRA Adapter (v0.1) — Iterated Self-Distillation Mode-Collapse Baseline},
  author = {Hill, Ryan and contributors},
  year   = {2026},
  url    = {https://huggingface.co/zndx/sdg-sft-r2}
}

Downloads last month: 17

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zndx/sdg-sft-r2

Base model

Qwen/Qwen3.5-9B-Base

Adapter

(10)

this model