deepmath-qwen3-4b-instruct-grpo-lora-eagle3-spec4

This model was trained using VERL (Volcano Engine Reinforcement Learning).

Training Details

Experiment Name: qwen3-4b-instruct-grpo-lora-eagle3-spec4
Final Step: 21
Base Model: /cmlscratch/anirudhs/Qwen3-4B
Training Method: PPO/GRPO with LoRA

Model Card

This is a fine-tuned language model trained with reinforcement learning.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("asatheesh/deepmath-qwen3-4b-instruct-grpo-lora-eagle3-spec4")
tokenizer = AutoTokenizer.from_pretrained("asatheesh/deepmath-qwen3-4b-instruct-grpo-lora-eagle3-spec4")

# Your inference code here

Citation

If you use this model, please cite:

@software{verl,
  title = {VERL: Volcano Engine Reinforcement Learning},
  url = {https://github.com/volcengine/verl}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning