Instructions to use WithinUsAI/Infinite.Code.III with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WithinUsAI/Infinite.Code.III with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="WithinUsAI/Infinite.Code.III", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("WithinUsAI/Infinite.Code.III", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use WithinUsAI/Infinite.Code.III with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "WithinUsAI/Infinite.Code.III"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WithinUsAI/Infinite.Code.III",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/WithinUsAI/Infinite.Code.III

SGLang

How to use WithinUsAI/Infinite.Code.III with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "WithinUsAI/Infinite.Code.III" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WithinUsAI/Infinite.Code.III",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "WithinUsAI/Infinite.Code.III" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WithinUsAI/Infinite.Code.III",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use WithinUsAI/Infinite.Code.III with Docker Model Runner:
```
docker model run hf.co/WithinUsAI/Infinite.Code.III
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Infinite.Code.III — Recursive Language Model

"Not a Large Language Model. A Recursive Mind."

Overview

Infinite.Code.III is a 1.210B-parameter Recursive Language Model (RLM) built from scratch as a unified Hybrid Mind architecture. Unlike standard LLMs that apply a fixed forward-pass transformer, Infinite.Code.III integrates Self-Automated (S.A.) learning systems as architectural primitives — they are not pipeline steps; they are woven into every decoder layer.

Property	Value
Parameters	1.210B
Context Window	1,000,000 tokens
Architecture	Recursive Language Model (RLM)
Attention	Grouped-Query Attention (GQA) 10/5 heads
Positional Encoding	RoPE (θ = 500,000, long-ctx scaled)
FFN	Alternating Dense / Mixture-of-Experts (8 experts, top-2)
Vocabulary	65,536 BPE tokens
Layers	20
Hidden Size	1280
Weight Format	safetensors (bfloat16 trained, float32 saved)
Modalities	Text · Image · Audio · Video
License	Apache 2.0

S.A. System Architecture

S.A. Meta Learning

Each layer has a learnable adaptive_alpha scalar (sigmoid-gated) that blends the transformed output with the layer's top-of-layer residual. This is the meta-learning channel — it learns how much each transformation contributes per layer.

S.A. Reinforcement Learning

RewardHead (D → 512 → 1 scalar) attaches to the final hidden states. During RL fine-tuning (RLHF / GRPO), this head provides the value signal. Pass output_reward=True during rollout collection.

S.A. Continual Learning

HybridMemory LTM uses exponential moving average write-back (0.95 × old + 0.05 × new) — knowledge accumulates across forward passes without overwriting, resisting catastrophic forgetting.

S.A. Adaptive Learning

The per-layer adaptive_alpha gate is trained end-to-end, self-calibrating each layer's write strength to the residual stream.

S.A. Rewriting Learning

Every 3rd layer runs RewriteAttention — a 4-head causal self-attention pass that lets the model revise its own intermediate token representations within a single forward pass.

S.A. NLP + S.A. Problem Solving

MetaOutputMixer at decoder output applies a 3-way soft gate (language / code / math-logic) via NLPGate. The final representation is a content-adaptive weighted mixture of three parallel projections.

S.A. Innovation Learning

Odd-numbered layers use MoELayer — 8 experts, top-2 routing, each a SwiGLU FFN with 2048-dim intermediate.

S.A. DeBugging

DebugHookManager gradient hook registry. Set debug_mode: true in config to activate mean-absolute-gradient logging on the embedding and any registered tensor. Zero cost when disabled.

S.A. Advanced Long/Short-Term Memory

HybridMemory (every 4th layer):

STM: 512-slot soft-attention read buffer (refreshed each pass)
LTM: 2048-slot persistent EMA key-value store (continual write-back)

S.A. Recursive Seed Learning

RecursiveSeedGate on every layer — depth-4 intra-layer recursion: seeds a 256-dim vector, projects to full D, gates with sigmoid, re-seeds from updated h. Creates true within-layer feedback loops.

Multimodal Inputs

Modality	Projector	Input Shape
Image	`ImageProjector` Linear(1024→2560→1280)	`(B, N_patches, 1024)`
Audio	`AudioProjector` GRU(80→512) + Linear	`(B, T_frames, 80)`
Video	`VideoProjector` Linear + TransformerEncoderLayer	`(B, F_frames, 1024)`

Fine-Tuning

SFT Recommended Hyperparameters

Setting	Value
Learning Rate	2e-5
LR Schedule	cosine + 100-step warmup
Batch Size	1–4 per GPU + grad accumulation ×8
Max Seq Length	start at 8192, scale to 1M
Precision	bfloat16
Optimizer	AdamW (β₁=0.9, β₂=0.95, ε=1e-8, wd=0.1)
Grad Clip	1.0

RLHF / GRPO

The reward_head is the built-in value model. Pass output_reward=True during rollout. The scalar is differentiable — plug directly into TRL GRPOTrainer.

Citation

@misc{infinite_code_iii_2025,
  title   = {Infinite.Code.III: A Recursive Language Model with Self-Automated Learning},
  author  = {GODsStrongestSoldier},
  year    = {2025},
  url     = {https://huggingface.co/GODsStrongestSoldier/Infinite.Code.III},
  note    = {1.210B Recursive Language Model, 1M context window}
}

Downloads last month: -

Safetensors

Model size

1B params

Tensor type

F32

Collection including WithinUsAI/Infinite.Code.III

“WithIn Us AI” (Recursive Models)

Collection

Recursive Language Models designed By (WithIn Us AI) at core. The RLM’s are in total 11. All are ready base models for pre-training • 9 items • Updated 1 day ago • 2