Issues with Fine Tuning

#37
by rirv938 - opened

Hi. Great work on the models. Qwen team always produces great models.

I am running into an issue when fine tuning this model with Transformers Trainer. Basically both deepspeed stage 3 OR FSDP libraries result in a hanging on the first training step.

It may be that I am configuring something wrong during training (but I use the same script for many other models). So perhaps an example fine tuning script would be useful here. This is similar to GPTOSS which provides an example script "https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers".

Thanks for any help here.

Robert.

@rirv938 , as far as i know, deepspeed stage 3 is not supported for MoE models, so you can try to train with stage2

Sign up or log in to comment