-
nvidia/Nemotron-Cascade-8B
Text Generation • 8B • Updated • 2.41k • 43 -
nvidia/Nemotron-Cascade-8B-Thinking
Text Generation • 8B • Updated • 1.19k • 26 -
nvidia/Nemotron-Cascade-14B-Thinking
Text Generation • 15B • Updated • 2.41k • 44 -
nvidia/Nemotron-Cascade-8B-Intermediate-ckpts
Text Generation • Updated • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2512.13607
-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 75 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 106 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 500 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 30
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 195 -
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Paper • 2508.14444 • Published • 39 -
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Paper • 2507.06261 • Published • 64 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 273
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 7.37k • 1.23k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 349 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
Adversarial Flow Models
Paper • 2511.22475 • Published • 22 -
DiP: Taming Diffusion Models in Pixel Space
Paper • 2511.18822 • Published • 28 -
Asking like Socrates: Socrates helps VLMs understand remote sensing images
Paper • 2511.22396 • Published • 4 -
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper • 2512.05591 • Published • 16
-
Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper • 2510.00938 • Published • 58 -
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT
Paper • 2509.19284 • Published • 22 -
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
Paper • 2509.25810 • Published • 5 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 269
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 463 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Paper • 2501.04682 • Published • 99 -
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
Paper • 2501.04686 • Published • 53 -
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Paper • 2512.13607 • Published • 26
-
nvidia/Nemotron-Cascade-8B
Text Generation • 8B • Updated • 2.41k • 43 -
nvidia/Nemotron-Cascade-8B-Thinking
Text Generation • 8B • Updated • 1.19k • 26 -
nvidia/Nemotron-Cascade-14B-Thinking
Text Generation • 15B • Updated • 2.41k • 44 -
nvidia/Nemotron-Cascade-8B-Intermediate-ckpts
Text Generation • Updated • 7
-
Adversarial Flow Models
Paper • 2511.22475 • Published • 22 -
DiP: Taming Diffusion Models in Pixel Space
Paper • 2511.18822 • Published • 28 -
Asking like Socrates: Socrates helps VLMs understand remote sensing images
Paper • 2511.22396 • Published • 4 -
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper • 2512.05591 • Published • 16
-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 75 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 106 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 500 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 30
-
Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper • 2510.00938 • Published • 58 -
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT
Paper • 2509.19284 • Published • 22 -
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
Paper • 2509.25810 • Published • 5 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 269
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 195 -
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Paper • 2508.14444 • Published • 39 -
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Paper • 2507.06261 • Published • 64 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 273
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 463 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 7.37k • 1.23k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 349 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Paper • 2501.04682 • Published • 99 -
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
Paper • 2501.04686 • Published • 53 -
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Paper • 2512.13607 • Published • 26