Collections
Discover the best community collections!
Collections including paper arxiv:2509.08827
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 276 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 240 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 259
-
A Survey of Direct Preference Optimization
Paper • 2503.11701 • Published -
Reinforcement Learning in Vision: A Survey
Paper • 2508.08189 • Published • 29 -
A Technical Survey of Reinforcement Learning Techniques for Large Language Models
Paper • 2507.04136 • Published -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 189
-
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 189 -
wsqstar/bert-finetuned-weibo-luobokuaipao
Text Classification • 0.1B • Updated • 43 • 1 -
brivil1/lithuanian-sentiment-analysis-DistilBERT
Text Classification • 0.1B • Updated • 23 -
brivil1/lithuanian-sentiment-analysis-ByT5
0.3B • Updated • 9
-
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 189 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 101 -
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 660
-
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 189 -
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Paper • 2510.27492 • Published • 81 -
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
Paper • 2511.09611 • Published • 68
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225 -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 189 -
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
Paper • 2508.21148 • Published • 140
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper • 2412.13303 • Published • 72 -
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 116 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 53 -
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper • 2509.12201 • Published • 104
-
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 189 -
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Paper • 2510.27492 • Published • 81 -
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
Paper • 2511.09611 • Published • 68
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 276 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 240 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 259
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225 -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 189 -
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
Paper • 2508.21148 • Published • 140
-
A Survey of Direct Preference Optimization
Paper • 2503.11701 • Published -
Reinforcement Learning in Vision: A Survey
Paper • 2508.08189 • Published • 29 -
A Technical Survey of Reinforcement Learning Techniques for Large Language Models
Paper • 2507.04136 • Published -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 189
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper • 2412.13303 • Published • 72 -
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 116 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 53 -
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper • 2509.12201 • Published • 104
-
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 189 -
wsqstar/bert-finetuned-weibo-luobokuaipao
Text Classification • 0.1B • Updated • 43 • 1 -
brivil1/lithuanian-sentiment-analysis-DistilBERT
Text Classification • 0.1B • Updated • 23 -
brivil1/lithuanian-sentiment-analysis-ByT5
0.3B • Updated • 9
-
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 189 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 101 -
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 660