deepmath-qwen3-4b-instruct-grpo-lora-eagle3-spec4

This model was trained using VERL (Volcano Engine Reinforcement Learning).

Training Details

  • Experiment Name: qwen3-4b-instruct-grpo-lora-eagle3-spec4
  • Final Step: 21
  • Base Model: /cmlscratch/anirudhs/Qwen3-4B
  • Training Method: PPO/GRPO with LoRA

Model Card

This is a fine-tuned language model trained with reinforcement learning.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("asatheesh/deepmath-qwen3-4b-instruct-grpo-lora-eagle3-spec4")
tokenizer = AutoTokenizer.from_pretrained("asatheesh/deepmath-qwen3-4b-instruct-grpo-lora-eagle3-spec4")

# Your inference code here

Citation

If you use this model, please cite:

@software{verl,
  title = {VERL: Volcano Engine Reinforcement Learning},
  url = {https://github.com/volcengine/verl}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading