Instructions to use zndx/sdg-sft-r2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use zndx/sdg-sft-r2 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B-Base") model = PeftModel.from_pretrained(base_model, "zndx/sdg-sft-r2") - Notebooks
- Google Colab
- Kaggle
SDG SFT Round-2 LoRA Adapter (v0.1)
A second-round LoRA adapter on Qwen/Qwen3.5-9B-Base, trained on a
corpus generated by zndx/sdg-sft-r1.
Released specifically to demonstrate a negative result on naive iterated
self-distillation β overall reward continues to rise, but only by
raising the floor on hard scenarios. The ceiling on easy scenarios
saturates and AUC discrimination regresses.
Status: v0.1, peer-review preview. Curator: @zndx
Headline result
Held-out 50-scenario evaluation, mean R across 4 generations per scenario:
| Stage | overall mean R | good_mean | bad_mean | R_A pass rate | AUC |
|---|---|---|---|---|---|
| Base | 0.208 | 0.205 | 0.210 | 0.55 | 0.478 |
| SFT-r1 | 0.289 | 0.311 | 0.268 | 0.68 | 0.590 |
| SFT-r2 (this adapter) | 0.318 | 0.309 | 0.327 | 0.76 | 0.475 |
The +10 % round-2 gain comes entirely from bad-scenario reward (+22 %). Good-scenario reward is effectively tied between r1 and r2 (0.311 vs 0.309). AUC discrimination regresses to ~0.48 β the model no longer distinguishes scenarios by quality. R_A pass rate continues climbing (0.55 β 0.68 β 0.76).
Why this happens (mode collapse)
Round-2 corpus statistics show the mechanism directly:
| Corpus | Source policy | n samples | R mean | Unique template_ids |
|---|---|---|---|---|
| v2 | base | 665 | 0.522 | 171 / 540 |
| v3 (used here) | SFT-r1 | 740 | 0.513 | 88 / 540 |
The SFT-r1 policy strongly prefers a narrower set of catalog templates. When that policy generates the round-2 corpus, the new training distribution is half as diverse as the round-1 corpus. SFT-r2 then over-specialises on that narrower subset, raising its average reward on samples it has seen while losing generalisation flexibility.
This is a clean experimental demonstration of why naive iterated self-distillation requires explicit diversity preservation β mix-in of round-1 samples, anti-clustering penalties in the reward, or higher round-2 sampling temperature.
Training details
Identical hyperparameters to SFT-r1 except for the input corpus:
| Hyperparameter | Value |
|---|---|
| Base model | Qwen/Qwen3.5-9B-Base |
| Source corpus | rejection_samples_v3.jsonl (740 samples from SFT-r1) |
| Trainable params | 29.1M / 8.98B (0.32 %) |
LoRA rank r |
16 |
| Epochs | 2 |
| Total grad steps | 94 |
| Final train loss | 0.139 (vs SFT-r1's 0.216 β 36 % lower) |
| Final token accuracy | 96.4 % |
| Final entropy | 0.126 |
| Wall time | 77.8 min on 2Γ RTX 4090 |
Lower final train loss is consistent with mode collapse: the corpus is more self-similar, so SFT can fit it more tightly.
When to use this vs SFT-r1
- For most generation tasks: use SFT-r1 (
zndx/sdg-sft-r1). It generalises better and the AUC discrimination is meaningful. - For research on iterated self-distillation / mode collapse: use SFT-r2 to reproduce the negative result, or as a "before" baseline for a diversity-preserving variant.
Related artifacts
zndx/sdg-bertopic-correspondence-v0.1β the corpus that fed both adapters.zndx/sdg-sft-r1β the first-round adapter (recommended for downstream use).
Citation
@misc{sdg-sft-r2-v01,
title = {SDG SFT Round-2 LoRA Adapter (v0.1) β Iterated Self-Distillation Mode-Collapse Baseline},
author = {Hill, Ryan and contributors},
year = {2026},
url = {https://huggingface.co/zndx/sdg-sft-r2}
}
- Downloads last month
- 17
Model tree for zndx/sdg-sft-r2
Base model
Qwen/Qwen3.5-9B-Base