AIPlans/qwen3-8b-ipo-hh-rlhf
Text Generation
•
Updated
•
4
Different versions of Qwen 0.6b, where the only difference is the post training method used. The post training database should be the hh rlhf dataset.
Note Full SFT on HH-RLHF dataset (Helpful, Harmless, Honest). All 600M parameters fine-tuned for 3 epochs.