Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Mixture of Experts

Carbon Emissions

Models

37

Full-text search

Active filters: verl

junnyu/Qwen2.5-7B-Instruct-1M-GRPO_logic_KK_5PPL

Text Generation • 8B • Updated Feb 13 • 7

sonyashijin/qwen3-32b-verilog-lora

Updated Jul 1 • 10

LichengLiu03/Qwen2.5-3B-UFO

Text Generation • 3B • Updated Jul 23 • 79 • 2

LichengLiu03/Qwen2.5-3B-UFO-1turn

Text Generation • 3B • Updated Jul 10 • 10 • 2

mradermacher/Qwen2.5-3B-UFO-GGUF

3B • Updated Jul 4 • 334 • 1

mradermacher/Qwen2.5-3B-UFO-1turn-GGUF

3B • Updated Jul 4 • 98 • 1

alphadl/ppo-gsm8k-0.5b

Text Generation • 0.6B • Updated Aug 4 • 98 • 2

Jasaxion/MathSmith-HC-Problem-Synthesizer-Qwen3-8B

8B • Updated Nov 11 • 7 • 1

Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B

8B • Updated Nov 11 • 40 • 1

thejaminator/grpo-feature-vector-step-1

Updated Aug 27 • 3

MindIntLab/Psyche-R1

Text Generation • 8B • Updated Sep 4 • 432 • 9

GMagoLi/test-upload

Text Generation • Updated Sep 18

karthik/verl-qwen2.5-0.5b-gsm8k-ppo-step360

Text Generation • 0.5B • Updated Sep 21 • 13

samhitha2601/llama3.2-3b-ppo

Reinforcement Learning • Updated Oct 23 • 4

samhitha2601/llama3.2-3b-ppo-critic

Reinforcement Learning • Updated Oct 23 • 6

mradermacher/MathSmith-HC-Problem-Synthesizer-Qwen3-8B-GGUF

8B • Updated Nov 12 • 1.26k • 1

mradermacher/MathSmith-HC-Problem-Synthesizer-Qwen3-8B-i1-GGUF

8B • Updated 21 days ago • 1.3k • 1

mradermacher/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B-GGUF

8B • Updated Nov 12 • 104 • 1

mradermacher/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B-i1-GGUF

8B • Updated 20 days ago • 1.21k • 1

asatheesh/deepmath-qwen3-4b-instruct-grpo-lora

Reinforcement Learning • Updated 9 days ago

asatheesh/deepmath-qwen3-4b-instruct-drgrpo-lora

Reinforcement Learning • Updated 9 days ago

asatheesh/deepmath-qwen3-4b-instruct-rloo-lora

Reinforcement Learning • Updated 9 days ago

asatheesh/deepmath-qwen3-4b-instruct-grpo-lora-eagle3-spec2

Reinforcement Learning • Updated 9 days ago

asatheesh/deepmath-qwen3-4b-instruct-grpo-lora-eagle3-spec4

Reinforcement Learning • Updated 9 days ago

asatheesh/deepmath-qwen3-4b-instruct-grpo-lora-ngram-spec4

Reinforcement Learning • Updated 9 days ago

asatheesh/deepmath-qwen3-4b-instruct-rloo-lora-eagle3-spec5

Reinforcement Learning • Updated 9 days ago

asatheesh/deepmath-qwen3-4b-instruct-drgrpo-lora-eagle3-spec5

Reinforcement Learning • Updated 9 days ago

asatheesh/deepmath-qwen3-4b-instruct-grpo-lora-eagle3-spec5

Reinforcement Learning • Updated 9 days ago

asatheesh/deepmath-qwen3-4b-instruct-rloo-lora-ngram-spec5

Reinforcement Learning • Updated 9 days ago

asatheesh/deepmath-qwen3-4b-instruct-drgrpo-lora-ngram-spec5

Reinforcement Learning • Updated 9 days ago