deepmath-qwen3-4b-instruct-grpo-lora-eagle3-spec4
This model was trained using VERL (Volcano Engine Reinforcement Learning).
Training Details
- Experiment Name: qwen3-4b-instruct-grpo-lora-eagle3-spec4
- Final Step: 21
- Base Model: /cmlscratch/anirudhs/Qwen3-4B
- Training Method: PPO/GRPO with LoRA
Model Card
This is a fine-tuned language model trained with reinforcement learning.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("asatheesh/deepmath-qwen3-4b-instruct-grpo-lora-eagle3-spec4")
tokenizer = AutoTokenizer.from_pretrained("asatheesh/deepmath-qwen3-4b-instruct-grpo-lora-eagle3-spec4")
# Your inference code here
Citation
If you use this model, please cite:
@software{verl,
title = {VERL: Volcano Engine Reinforcement Learning},
url = {https://github.com/volcengine/verl}
}