Papers
arxiv:2603.20691

SWE-Next: Scalable Real-World Software Engineering Tasks for Agents

Published on Mar 21
Authors:
,
,
,
,

Abstract

SWE-Next is a framework that efficiently collects scalable software engineering tasks by executing commit pairs from real pull requests and reusing repository environments to reduce costs.

AI-generated summary

Executable software engineering data is valuable for training SWE agents, but scaling it remains difficult for two reasons: only a small fraction of real repository changes yield verifiable, high-signal task instances, and naively building repository-specific environments quickly becomes the dominant systems cost. We present SWE-Next, an execution-grounded framework for scalable SWE task and trajectory collection. On the data side, SWE-Next mines real merged pull requests, executes candidate base/merged commit pairs, and retains only those that produce strict test improvements without regressions, yielding self-verifying instances. It also applies strict submission gating so that collected trajectories remain evidence-driven rather than speculative. On the systems side, SWE-Next introduces reusable repo-quarter profiles, which reuse the same environment across nearby commits in time while keeping each task run separate and reproducible. Using only 30 hours and 639GB of environment storage, SWE-Next processes 3,971 seed repositories and 102,582 candidate commit pairs mined from real merged PRs to construct a dataset of 2,308 self-verifying instances. Experiments show that SWE-Next improves downstream pass@1 with fewer or comparable training trajectories, indicating that its gains come not from a stronger trajectory generator, but from higher-signal execution-grounded supervision and more efficient data collection.

Community

๐Ÿš€ Introducing SWE-Next: a scalable, execution-grounded framework for building SWE training data from real merged PRs. SWE-Next processes 3,971 repositories and 102K commit pairs to construct 2,308 verified instances, and collecting the full dataset takes just 30 hours and 639 GB.

๐Ÿงฉ Key idea: repo-quarter profiles โ€” reuse a single environment across temporally nearby commits, cutting storage from over 30 TB to just 639 GB.

๐Ÿ“ˆ SFT results: with only 3K+ high-quality trajectories, our models reach 17.4% on SWE-Bench Verified at 7B and 30.0% at 14B.

Building executable SWE environments is expensive:
โŒ One Docker image per commit = storage explodes at scale
โŒ Most real PRs don't yield verifiable training signal (~74.5% don't improve tests)
โŒ Leaky prompts + weak submission gating โ†’ low-quality trajectories

๐Ÿงฉ Repo-quarter profiles: Instead of building a new environment per commit, we map each commit to a (repo, quarter) profile โ€” a shared, reusable Docker image for that repo's dependency regime in that time window.
The image caches system packages + a venv but never bakes in source code.
At runtime: mount the commit snapshot โ†’ copy-on-start โ†’ run tests in isolation.
One image. Many commits. No rebuilding.

Everything is open
๐Ÿ—‚๏ธ Paper: arxiv.org/abs/2603.20691
๐Ÿค— Dataset: huggingface.co/datasets/TIGER-Lab/SWE-Next
๐Ÿš€ SFT Trajectories: huggingface.co/datasets/TIGER-Lab/SWE-Next-SFT-Trajectories
๐Ÿค– SWE-Next-7B: huggingface.co/TIGER-Lab/SWE-Next-7B
๐Ÿค– SWE-Next-14B: huggingface.co/TIGER-Lab/SWE-Next-14B
๐Ÿ’ป Code: github.com/TIGER-AI-Lab/SWE-Next

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2603.20691
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 2

Datasets citing this paper 4

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.20691 in a Space README.md to link it from this page.

Collections including this paper 1