Instructions to use inclusionAI/Ring-2.6-1T with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use inclusionAI/Ring-2.6-1T with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="inclusionAI/Ring-2.6-1T", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("inclusionAI/Ring-2.6-1T", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use inclusionAI/Ring-2.6-1T with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "inclusionAI/Ring-2.6-1T"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inclusionAI/Ring-2.6-1T",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/inclusionAI/Ring-2.6-1T

SGLang

How to use inclusionAI/Ring-2.6-1T with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "inclusionAI/Ring-2.6-1T" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inclusionAI/Ring-2.6-1T",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "inclusionAI/Ring-2.6-1T" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inclusionAI/Ring-2.6-1T",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use inclusionAI/Ring-2.6-1T with Docker Model Runner:
```
docker model run hf.co/inclusionAI/Ring-2.6-1T
```

🤗 Hugging Face | 🤖 ModelScope | 🐙 ling.tbox.cn

Ring-2.6-1T

Introducing Ring-2.6-1T: a trillion-parameter flagship reasoning model designed for real-world complex task scenarios, making it available to developers, researchers, and enterprise environments for validation, adaptation, and further development.

The goal of Ring-2.6-1T is not simply to pursue larger parameter scale , but to address the real production environments that large models are entering: agent workflows, engineering development, scientific research analysis, complex business systems, and enterprise automation processes. In these scenarios, models need not only to "answer questions," but also to understand context, plan steps, invoke tools, execute continuously, and maintain stability over long-horizon tasks.

Ring-2.6-1T has achieved key upgrade in three areas:

Comprehensively enhanced Agent execution capability: Moving from "being able to answer" to "being able to execute," with more stable performance in multi-step tasks, tool collaboration, contextual planning, and advancing complex workflows.
Reasoning Effort mechanism: Supporting two reasoning intensity levels, high and xhigh, allowing developers to flexibly adjust the depth of thinking according to task complexity, achieving a better balance among effectiveness, speed, and cost.
Innovative asynchronous reinforcement learning training paradigm: Leveraging an Async RL architecture combined with the IcePop algorithm to improve the training efficiency and stability of long-horizon reinforcement learning for trillion-parameter models, providing foundational support for agent capabilities and complex reasoning.

Model Downloads

You can download Ring-2.6-1T from the following table. If you are located in mainland China, we also provide the model on ModelScope to speed up the download process.

Model	Context Length	Download
Ring-2.6-1T	128K -> 256K (YaRN)	🤗 HuggingFace 🤖 ModelScope

Note: If you are interested in the previous version, please visit the past model collections on Huggingface or ModelScope.

Agent Capability: From "Understanding Tasks" to "Continuously Executing Tasks"

In real business systems, models often face not isolated Q&A, but continuous, multi-turn, complex tasks that require tool collaboration. Ring-2.6-1T has been specifically enhanced for such scenarios, enabling more stable task decomposition, step planning, tool invocation, error correction, and context continuation.

Looking at benchmark results, Ring-2.6-1T high demonstrates outstanding performance in real-world task execution evaluations: achieving 87.60 on PinchBench, notably higher than GPT-5.4 xHigh and Gemini-3.1-Pro high; scoring 63.82 on ClawEval, ranking among the top comparable models; and reaching 95.32 on Tau2-Bench in the Telecom scenario, with a gap of less than 1 point from the highest-scoring model, demonstrating its stable execution capability in complex business processes, tool collaboration, and industry-specific tasks.

This means that Ring-2.6-1T not only understands user intent but can also continuously drive tasks forward in real workflows. Whether in personal assistant agents, enterprise process automation, or code generation, task decomposition, and engineering collaboration in coding agent scenarios, Ring-2.6-1T functions more like a workflow engine that is executable, responsive to feedback, and capable of iteration.

Reasoning Effort: High and xHigh Configurations — Fast When Needed, Deep When Required

In practice, not all tasks require the same level of reasoning resources. A format conversion or information organization task has entirely different demands on the model's depth of thinking compared to a math competition problem or a complex system analysis.

To address this, Ring-2.6-1T introduces an adjustable Reasoning Effort mechanism, supporting two reasoning effort levels: high and xhigh.

high is designed for high-frequency agent workflows, suitable for multi-turn interactions, tool collaboration, task decomposition, and production-grade default invocation. It maintains a high task completion rate while reducing unnecessary reasoning token overhead, making the model faster, more stable, and more cost-effective in real-world workflows.
xhigh is tailored for high-difficulty tasks such as mathematics, scientific research, complex logical analysis, and multi-path exploration, granting the model more extensive reasoning space. In challenging reasoning benchmarks, Ring-2.6-1T xhigh demonstrates strong capability ceilings: scoring 66.18 on ARC-AGI-V2, surpassing Gemini-3.1-Pro high and Claude-Opus-4.7 xhigh; achieving 95.83 on AIME 26, on par with multiple leading models; and reaching 88.27 on GPQA Diamond, reflecting robust professional knowledge comprehension and complex reasoning capabilities.

With the high and xhigh configuration options, developers can dynamically allocate reasoning resources based on task characteristics: use high for everyday workflows to achieve greater efficiency, and switch to xhigh for complex reasoning tasks to unlock the model's full capability ceiling.

Asynchronous Async RL Training + IcePop Algorithm: Supporting Stable Reinforcement Learning for Trillion-Parameter Models

Conducting reinforcement learning training on trillion-parameter models is itself an enormous engineering challenge. In traditional synchronous RL training, policy generation (rollout) and gradient updates are tightly coupled, leading to:

GPU waiting: Low GPU resource utilization, with substantial computational power wasted on synchronization waits;
Insufficient training throughput: Prolonged training cycles with limited iteration speed;
Instability in long-horizon training: Prone to policy collapse or reward signal degradation during extended training.

Ring-2.6-1T adopts an asynchronous (Async) reinforcement learning training architecture, decoupling policy sampling and parameter updates into independent pipelines, achieving:

Significantly improved training throughput and resource utilization: Sampling and updates execute in parallel, dramatically increasing GPU utilization and boosting training efficiency by several times;
Support for longer training cycles: The decoupled architecture is inherently suited for large-scale, long-duration continuous training, eliminating training interruptions caused by synchronization bottlenecks.

Building on this, we apply the IcePop algorithm from Ring-1T to the async RL training process, addressing training instability. This innovation in the training paradigm enables us to conduct sufficient and stable reinforcement learning optimization on trillion-parameter models, pushing both agent execution capabilities and reasoning capabilities to new ceilings. We will release the details of the Stick-Breaking algorithm combined with Async architecture in our upcoming technical report.

Quickstart

🚀 Try Online

https://ling.tbox.cn/chat

Deployment

SGLang

Environment Preparation

We will later submit our model to SGLang official release, now we can prepare the environment following steps:

git clone -b ling_2_5 git@github.com:antgroup/sglang.git
cd sglang

# Install the python packages
pip install --upgrade pip
pip install -e "python"

Run Inference

Both BF16 and FP8 models are supported by SGLang now. It depends on the dtype of the model in ${MODEL_PATH}.

Here is the example to run Ring-2.6-1T with multiple GPU nodes, where the master node IP is ${MASTER_IP} and server port is ${PORT}:

Start server:

# Node 0:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 0 
# Node 1:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 1 
# Node 2:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 2 
# Node 3:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 3

# This is only an example. Please adjust arguments according to your actual environment.

Client:

curl -s http://${MASTER_IP}:${PORT}/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'