-
-
-
-
-
-
Inference Providers
Active filters: grpo
mradermacher/Poe-8B-GLM5-Opus4.6-Sonnet4.5-Kimi-Grok-Gemini-3-pro-preview-HERETIC-GGUF
8B • Updated
• 2.06k
• 2
trentmkelly/Qwen3-14B-ZeroGPT-beta
codelion/Qwen3-4B-execution-world-model-lora
Text Generation
• Updated
• 5
• 6
debashis/llama-1b-tool-router-grpo
Text Generation
• 1B • Updated
• 1
• 3
lokahq/Trinity-Mini-DrugProt-Think
Text Generation
• Updated
• 52
• 5
Jackrong/Qwen3-4B-Thinking-2507-GLM-4.7-Distilled-GGUF
Text Generation
• 4B • Updated
• 895
• 2
Text Generation
• 8B • Updated
• 15
• 1
mradermacher/DECS_7B-GGUF
8B • Updated
• 579
• 1
mradermacher/DECS_7B-i1-GGUF
8B • Updated
• 3.69k
• 2
mradermacher/Poe-8B-GLM5-Opus4.6-Sonnet4.5-Kimi-Grok-Gemini-3-pro-preview-HERETIC-i1-GGUF
8B • Updated
• 5.33k
• 1
Shreyansh327/Qwen3-1.7B-grpo-gsm8k
Reinforcement Learning
• 2B • Updated
• 88
• 1
Text Generation
• 0.1B • Updated
• 4
8B • Updated
sergiopaniego/Qwen2-0.5B-GRPO-test
Updated
Novaciano/ESP-NSFW-GRPO-1B-Sin_Censura-GGUF
1B • Updated
• 73
• 4
nbd22/Llama-3.1-8B-Instruct-GRPO-gsm8k-ft-lora
Updated
sergiopaniego/Qwen2-0.5B-GRPO
Updated
philschmid/qwen-2.5-3b-r1-countdown
Text Generation
• 3B • Updated
• 9
• 8
spinech/qwen-2.5-3b-r1-countdown
Text Generation
• 3B • Updated
• 2
Dongwei/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
• 2B • Updated
• 4
• 1
spinech/qwen2.5-3b-r1-rearc-stage1
Text Generation
• 3B • Updated
• 12
Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO
Text Generation
• 8B • Updated
• 9
• 1
MasterControlAIML/DeepSeek-R1-Strategy-Qwen-2.5-1.5b-Unstructured-To-Structured
Text Generation
• 2B • Updated
• 9
• 5
mradermacher/DeepSeek-R1-Strategy-Qwen-2.5-1.5b-Unstructured-To-Structured-GGUF
2B • Updated
• 127
• 2
hyunw3/qwen-2.5-0.5b-r1-countdown
Text Generation
• 0.5B • Updated
hyunw3/qwen-2.5-0.5b-r1-countdown_lr1.0e-6
Text Generation
• 0.5B • Updated
• 2
mgaimm/qwen-2.5-3b-r1-countdown
Text Generation
• 3B • Updated
• 1
MasterControlAIML/DeepSeek-R1-Qwen-2.5-1.5b-Latest-Unstructured-To-Structured
Text Generation
• Updated
• 11
• 5