Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2604.27085

Co-Evolving Policy Distillation

Paper • 2604.27083 • Published 18 days ago • 64
Efficient Training on Multiple Consumer GPUs with RoundPipe

Paper • 2604.27085 • Published 18 days ago • 40
Leveraging Verifier-Based Reinforcement Learning in Image Editing

Paper • 2604.27505 • Published 17 days ago • 57

Agent Collaborations

Running

3

Efficient Optimizer Live

🤗

3

Dashboard for the Efficient Optimizer challenge
ml-intern-explorers/efficient-optimizer-collab

590 kB
Running

1

Parameter Golf Live

🤗

1

Live chat + leaderboard for the Parameter Golf challenge
ml-intern-explorers/parameter-golf-collab

922 kB

Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Paper • 2510.03259 • Published Sep 26, 2025 • 57
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Paper • 2510.07242 • Published Oct 8, 2025 • 30
First Try Matters: Revisiting the Role of Reflection in Reasoning Models

Paper • 2510.08308 • Published Oct 9, 2025 • 24
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3, 2025 • 76

Efficient Training on Multiple Consumer GPUs with RoundPipe

Paper • 2604.27085 • Published 18 days ago • 40
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

Paper • 2604.04921 • Published Apr 6 • 113

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 326
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process

Paper • 2512.23988 • Published Dec 30, 2025 • 19
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time

Paper • 2512.25075 • Published Dec 31, 2025 • 16
Guiding a Diffusion Transformer with the Internal Dynamics of Itself

Paper • 2512.24176 • Published Dec 30, 2025 • 8

When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Paper • 2402.17193 • Published Feb 27, 2024 • 26
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Paper • 2410.23743 • Published Oct 31, 2024 • 64
Direct Preference Optimization Using Sparse Feature-Level Constraints

Paper • 2411.07618 • Published Nov 12, 2024 • 17
Transformer^2: Self-adaptive LLMs

Paper • 2501.06252 • Published Jan 9, 2025 • 55

Co-Evolving Policy Distillation

Paper • 2604.27083 • Published 18 days ago • 64
Efficient Training on Multiple Consumer GPUs with RoundPipe

Paper • 2604.27085 • Published 18 days ago • 40
Leveraging Verifier-Based Reinforcement Learning in Image Editing

Paper • 2604.27505 • Published 17 days ago • 57

Efficient Training on Multiple Consumer GPUs with RoundPipe

Paper • 2604.27085 • Published 18 days ago • 40
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

Paper • 2604.04921 • Published Apr 6 • 113

Agent Collaborations

Running

3

Efficient Optimizer Live

🤗

3

Dashboard for the Efficient Optimizer challenge
ml-intern-explorers/efficient-optimizer-collab

590 kB
Running

1

Parameter Golf Live

🤗

1

Live chat + leaderboard for the Parameter Golf challenge
ml-intern-explorers/parameter-golf-collab

922 kB

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 326
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process

Paper • 2512.23988 • Published Dec 30, 2025 • 19
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time

Paper • 2512.25075 • Published Dec 31, 2025 • 16
Guiding a Diffusion Transformer with the Internal Dynamics of Itself

Paper • 2512.24176 • Published Dec 30, 2025 • 8

Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Paper • 2510.03259 • Published Sep 26, 2025 • 57
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Paper • 2510.07242 • Published Oct 8, 2025 • 30
First Try Matters: Revisiting the Role of Reflection in Reasoning Models

Paper • 2510.08308 • Published Oct 9, 2025 • 24
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3, 2025 • 76

When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Paper • 2402.17193 • Published Feb 27, 2024 • 26
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Paper • 2410.23743 • Published Oct 31, 2024 • 64
Direct Preference Optimization Using Sparse Feature-Level Constraints

Paper • 2411.07618 • Published Nov 12, 2024 • 17
Transformer^2: Self-adaptive LLMs

Paper • 2501.06252 • Published Jan 9, 2025 • 55

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs