-
Monitored Markov Decision Processes
Paper • 2402.06819 • Published -
Generalization in Monitored Markov Decision Processes (Mon-MDPs)
Paper • 2505.08988 • Published -
Bayesian Risk Markov Decision Processes
Paper • 2106.02558 • Published -
Sotopia-RL: Reward Design for Social Intelligence
Paper • 2508.03905 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2603.17187
-
ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning
Paper • 2603.10160 • Published • 26 -
Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously
Paper • 2603.12262 • Published • 31 -
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings
Paper • 2603.13594 • Published • 149 -
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Paper • 2603.17187 • Published • 139
-
FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use
Paper • 2603.08262 • Published • 42 -
On-Policy Context Distillation for Language Models
Paper • 2602.12275 • Published • 4 -
Online Experiential Learning for Language Models
Paper • 2603.16856 • Published • 59 -
Mixture-of-Depths Attention
Paper • 2603.15619 • Published • 80
-
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Paper • 2603.17187 • Published • 139 -
Attention Residuals
Paper • 2603.15031 • Published • 184 -
MOSS-TTS Technical Report
Paper • 2603.18090 • Published • 13 -
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
Paper • 2603.23516 • Published • 49
-
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning
Paper • 2602.12099 • Published • 62 -
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning
Paper • 2602.10560 • Published • 31 -
G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design
Paper • 2602.08253 • Published • 27 -
ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression
Paper • 2602.11008 • Published • 18
-
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
Paper • 2602.23008 • Published • 37 -
SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards
Paper • 2602.21158 • Published • 1 -
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Paper • 2603.17187 • Published • 139 -
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
Paper • 2602.08234 • Published • 76
-
Self-Supervised Prompt Optimization
Paper • 2502.06855 • Published • 18 -
Context Learning for Multi-Agent Discussion
Paper • 2602.02350 • Published • 4 -
XSkill: Continual Learning from Experience and Skills in Multimodal Agents
Paper • 2603.12056 • Published • 33 -
Online Experiential Learning for Language Models
Paper • 2603.16856 • Published • 59
-
Monitored Markov Decision Processes
Paper • 2402.06819 • Published -
Generalization in Monitored Markov Decision Processes (Mon-MDPs)
Paper • 2505.08988 • Published -
Bayesian Risk Markov Decision Processes
Paper • 2106.02558 • Published -
Sotopia-RL: Reward Design for Social Intelligence
Paper • 2508.03905 • Published • 23
-
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
Paper • 2602.23008 • Published • 37 -
SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards
Paper • 2602.21158 • Published • 1 -
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Paper • 2603.17187 • Published • 139 -
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
Paper • 2602.08234 • Published • 76
-
ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning
Paper • 2603.10160 • Published • 26 -
Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously
Paper • 2603.12262 • Published • 31 -
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings
Paper • 2603.13594 • Published • 149 -
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Paper • 2603.17187 • Published • 139
-
FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use
Paper • 2603.08262 • Published • 42 -
On-Policy Context Distillation for Language Models
Paper • 2602.12275 • Published • 4 -
Online Experiential Learning for Language Models
Paper • 2603.16856 • Published • 59 -
Mixture-of-Depths Attention
Paper • 2603.15619 • Published • 80
-
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Paper • 2603.17187 • Published • 139 -
Attention Residuals
Paper • 2603.15031 • Published • 184 -
MOSS-TTS Technical Report
Paper • 2603.18090 • Published • 13 -
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
Paper • 2603.23516 • Published • 49
-
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning
Paper • 2602.12099 • Published • 62 -
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning
Paper • 2602.10560 • Published • 31 -
G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design
Paper • 2602.08253 • Published • 27 -
ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression
Paper • 2602.11008 • Published • 18
-
Self-Supervised Prompt Optimization
Paper • 2502.06855 • Published • 18 -
Context Learning for Multi-Agent Discussion
Paper • 2602.02350 • Published • 4 -
XSkill: Continual Learning from Experience and Skills in Multimodal Agents
Paper • 2603.12056 • Published • 33 -
Online Experiential Learning for Language Models
Paper • 2603.16856 • Published • 59