Issues with Fine Tuning

#37

by rirv938 - opened Oct 26

Oct 26

Hi. Great work on the models. Qwen team always produces great models.

I am running into an issue when fine tuning this model with Transformers Trainer. Basically both deepspeed stage 3 OR FSDP libraries result in a hanging on the first training step.

It may be that I am configuring something wrong during training (but I use the same script for many other models). So perhaps an example fine tuning script would be useful here. This is similar to GPTOSS which provides an example script "https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers".

Thanks for any help here.

Robert.

maksimlitvinov39kg

4 days ago

@rirv938 , as far as i know, deepspeed stage 3 is not supported for MoE models, so you can try to train with stage2

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment