Post Training Versions - Qwen 0.6B - a AIPlans Collection

AIPlans 's Collections

Post Training Versions - Qwen 0.6B

updated 19 days ago

Different versions of Qwen 0.6b, where the only difference is the post training method used. The post training database should be the hh rlhf dataset.

Upvote

AIPlans/qwen3-8b-ipo-hh-rlhf

Text Generation • Updated Jul 17 • 4
AIPlans/qwen3-0.6b-dpo-lora

Text Generation • 0.6B • Updated Sep 18 • 4 • 1
AIPlans/qwen3-0.6b-hh-rlhf-sft

0.6B • Updated 19 days ago • 19

Note Full SFT on HH-RLHF dataset (Helpful, Harmless, Honest). All 600M parameters fine-tuned for 3 epochs.

Upvote

Collection guide
Browse collections