-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
Collections
Discover the best community collections!
Collections including paper arxiv:2503.07536
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88
-
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88 -
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
Paper • 2503.07703 • Published • 37 -
Gemini Embedding: Generalizable Embeddings from Gemini
Paper • 2503.07891 • Published • 45 -
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper • 2503.07572 • Published • 47
-
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Paper • 2503.04724 • Published • 72 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88 -
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper • 2503.07572 • Published • 47
-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Paper • 2502.11573 • Published • 9 -
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper • 2502.02339 • Published • 22 -
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper • 2502.11775 • Published • 9 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 142
-
VLM-Reasoner/LMM-R1-MGT-PerceReason
Visual Question Answering • 4B • Updated • 601 • 4 -
VLM-Reasoner/VerMulti
Viewer • Updated • 34.4k • 178 • 3 -
VLM-Reasoner/deepscaler
Viewer • Updated • 40.3k • 43 • 3 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88
-
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88 -
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
Paper • 2503.21460 • Published • 83 -
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper • 2504.01990 • Published • 299 -
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks
Paper • 2507.01001 • Published • 47
-
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Paper • 2502.14768 • Published • 47 -
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Paper • 2502.12853 • Published • 29 -
Diverse Inference and Verification for Advanced Reasoning
Paper • 2502.09955 • Published • 18 -
Distillation Scaling Laws
Paper • 2502.08606 • Published • 47
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
Paper • 2502.07374 • Published • 40 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152 -
S*: Test Time Scaling for Code Generation
Paper • 2502.14382 • Published • 63
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 142
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88
-
VLM-Reasoner/LMM-R1-MGT-PerceReason
Visual Question Answering • 4B • Updated • 601 • 4 -
VLM-Reasoner/VerMulti
Viewer • Updated • 34.4k • 178 • 3 -
VLM-Reasoner/deepscaler
Viewer • Updated • 40.3k • 43 • 3 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88
-
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88 -
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
Paper • 2503.07703 • Published • 37 -
Gemini Embedding: Generalizable Embeddings from Gemini
Paper • 2503.07891 • Published • 45 -
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper • 2503.07572 • Published • 47
-
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88 -
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
Paper • 2503.21460 • Published • 83 -
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper • 2504.01990 • Published • 299 -
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks
Paper • 2507.01001 • Published • 47
-
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Paper • 2503.04724 • Published • 72 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88 -
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper • 2503.07572 • Published • 47
-
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Paper • 2502.14768 • Published • 47 -
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Paper • 2502.12853 • Published • 29 -
Diverse Inference and Verification for Advanced Reasoning
Paper • 2502.09955 • Published • 18 -
Distillation Scaling Laws
Paper • 2502.08606 • Published • 47
-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Paper • 2502.11573 • Published • 9 -
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper • 2502.02339 • Published • 22 -
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper • 2502.11775 • Published • 9 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
Paper • 2502.07374 • Published • 40 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152 -
S*: Test Time Scaling for Code Generation
Paper • 2502.14382 • Published • 63