Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
jaygala24
's Collections
RL post-training
RL post-training
updated
about 4 hours ago
Upvote
-
jaygala24/Qwen3-4B-GRPO-KL-math-reasoning
Text Generation
•
4B
•
Updated
6 days ago
•
1.01k
jaygala24/Qwen3-4B-GRPO-math-reasoning
Text Generation
•
4B
•
Updated
6 days ago
•
853
jaygala24/Qwen3-4B-ReMax-math-reasoning
Text Generation
•
4B
•
Updated
6 days ago
•
800
jaygala24/Qwen3-1.7B-GRPO-KL-math-reasoning
Text Generation
•
2B
•
Updated
6 days ago
•
804
jaygala24/Qwen3-1.7B-GRPO-math-reasoning
Text Generation
•
2B
•
Updated
6 days ago
•
812
jaygala24/Qwen3-1.7B-ReMax-math-reasoning
Text Generation
•
2B
•
Updated
6 days ago
•
857
jaygala24/Qwen2.5-3B-GRPO-KL-math-reasoning
Text Generation
•
3B
•
Updated
6 days ago
•
767
jaygala24/Qwen2.5-3B-GRPO-math-reasoning
Text Generation
•
3B
•
Updated
6 days ago
•
792
jaygala24/Qwen2.5-3B-ReMax-math-reasoning
Text Generation
•
3B
•
Updated
6 days ago
•
438
jaygala24/Qwen2.5-1.5B-GRPO-KL-math-reasoning
Text Generation
•
2B
•
Updated
6 days ago
•
502
jaygala24/Qwen2.5-1.5B-GRPO-math-reasoning
Text Generation
•
2B
•
Updated
6 days ago
•
553
jaygala24/Qwen2.5-1.5B-ReMax-math-reasoning
Text Generation
•
2B
•
Updated
6 days ago
•
424
jaygala24/Qwen2.5-0.5B-GRPO-KL-math-reasoning
Text Generation
•
0.5B
•
Updated
6 days ago
•
523
jaygala24/Qwen2.5-0.5B-GRPO-math-reasoning
Text Generation
•
0.5B
•
Updated
6 days ago
•
552
jaygala24/Qwen2.5-0.5B-ReMax-math-reasoning
Text Generation
•
0.5B
•
Updated
6 days ago
•
434
jaygala24/Qwen3-1.7B-RLOO-math-reasoning
Text Generation
•
2B
•
Updated
about 2 hours ago
•
508
jaygala24/Qwen2.5-3B-RLOO-math-reasoning
Text Generation
•
3B
•
Updated
about 2 hours ago
•
462
jaygala24/Qwen2.5-1.5B-RLOO-math-reasoning
Text Generation
•
2B
•
Updated
about 2 hours ago
•
432
jaygala24/Qwen2.5-0.5B-RLOO-math-reasoning
Text Generation
•
0.5B
•
Updated
about 2 hours ago
•
380
jaygala24/Qwen2.5-0.5B-DAPO-math-reasoning
Text Generation
•
0.5B
•
Updated
about 2 hours ago
•
378
jaygala24/Qwen3-1.7B-DAPO-math-reasoning
Text Generation
•
2B
•
Updated
about 2 hours ago
•
371
jaygala24/Qwen2.5-3B-DAPO-math-reasoning
Text Generation
•
3B
•
Updated
about 2 hours ago
•
359
jaygala24/Qwen2.5-1.5B-DAPO-math-reasoning
Text Generation
•
2B
•
Updated
about 2 hours ago
•
358
Upvote
-
Share collection
View history
Collection guide
Browse collections