Instructions to use WithinUsAI/Infinite.Code.III with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WithinUsAI/Infinite.Code.III with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="WithinUsAI/Infinite.Code.III", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("WithinUsAI/Infinite.Code.III", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use WithinUsAI/Infinite.Code.III with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "WithinUsAI/Infinite.Code.III" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WithinUsAI/Infinite.Code.III", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/WithinUsAI/Infinite.Code.III
- SGLang
How to use WithinUsAI/Infinite.Code.III with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "WithinUsAI/Infinite.Code.III" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WithinUsAI/Infinite.Code.III", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "WithinUsAI/Infinite.Code.III" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WithinUsAI/Infinite.Code.III", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use WithinUsAI/Infinite.Code.III with Docker Model Runner:
docker model run hf.co/WithinUsAI/Infinite.Code.III
Infinite.Code.III — Recursive Language Model
"Not a Large Language Model. A Recursive Mind."
Overview
Infinite.Code.III is a 1.210B-parameter Recursive Language Model (RLM) built from scratch as a unified Hybrid Mind architecture. Unlike standard LLMs that apply a fixed forward-pass transformer, Infinite.Code.III integrates Self-Automated (S.A.) learning systems as architectural primitives — they are not pipeline steps; they are woven into every decoder layer.
| Property | Value |
|---|---|
| Parameters | 1.210B |
| Context Window | 1,000,000 tokens |
| Architecture | Recursive Language Model (RLM) |
| Attention | Grouped-Query Attention (GQA) 10/5 heads |
| Positional Encoding | RoPE (θ = 500,000, long-ctx scaled) |
| FFN | Alternating Dense / Mixture-of-Experts (8 experts, top-2) |
| Vocabulary | 65,536 BPE tokens |
| Layers | 20 |
| Hidden Size | 1280 |
| Weight Format | safetensors (bfloat16 trained, float32 saved) |
| Modalities | Text · Image · Audio · Video |
| License | Apache 2.0 |
S.A. System Architecture
S.A. Meta Learning
Each layer has a learnable adaptive_alpha scalar (sigmoid-gated) that blends the
transformed output with the layer's top-of-layer residual. This is the meta-learning
channel — it learns how much each transformation contributes per layer.
S.A. Reinforcement Learning
RewardHead (D → 512 → 1 scalar) attaches to the final hidden states.
During RL fine-tuning (RLHF / GRPO), this head provides the value signal.
Pass output_reward=True during rollout collection.
S.A. Continual Learning
HybridMemory LTM uses exponential moving average write-back
(0.95 × old + 0.05 × new) — knowledge accumulates across forward passes
without overwriting, resisting catastrophic forgetting.
S.A. Adaptive Learning
The per-layer adaptive_alpha gate is trained end-to-end, self-calibrating
each layer's write strength to the residual stream.
S.A. Rewriting Learning
Every 3rd layer runs RewriteAttention — a 4-head causal self-attention
pass that lets the model revise its own intermediate token representations
within a single forward pass.
S.A. NLP + S.A. Problem Solving
MetaOutputMixer at decoder output applies a 3-way soft gate
(language / code / math-logic) via NLPGate. The final representation
is a content-adaptive weighted mixture of three parallel projections.
S.A. Innovation Learning
Odd-numbered layers use MoELayer — 8 experts, top-2 routing,
each a SwiGLU FFN with 2048-dim intermediate.
S.A. DeBugging
DebugHookManager gradient hook registry. Set debug_mode: true in config to
activate mean-absolute-gradient logging on the embedding and any registered tensor.
Zero cost when disabled.
S.A. Advanced Long/Short-Term Memory
HybridMemory (every 4th layer):
- STM: 512-slot soft-attention read buffer (refreshed each pass)
- LTM: 2048-slot persistent EMA key-value store (continual write-back)
S.A. Recursive Seed Learning
RecursiveSeedGate on every layer — depth-4 intra-layer recursion:
seeds a 256-dim vector, projects to full D, gates with sigmoid,
re-seeds from updated h. Creates true within-layer feedback loops.
Multimodal Inputs
| Modality | Projector | Input Shape |
|---|---|---|
| Image | ImageProjector Linear(1024→2560→1280) |
(B, N_patches, 1024) |
| Audio | AudioProjector GRU(80→512) + Linear |
(B, T_frames, 80) |
| Video | VideoProjector Linear + TransformerEncoderLayer |
(B, F_frames, 1024) |
Fine-Tuning
SFT Recommended Hyperparameters
| Setting | Value |
|---|---|
| Learning Rate | 2e-5 |
| LR Schedule | cosine + 100-step warmup |
| Batch Size | 1–4 per GPU + grad accumulation ×8 |
| Max Seq Length | start at 8192, scale to 1M |
| Precision | bfloat16 |
| Optimizer | AdamW (β₁=0.9, β₂=0.95, ε=1e-8, wd=0.1) |
| Grad Clip | 1.0 |
RLHF / GRPO
The reward_head is the built-in value model. Pass output_reward=True
during rollout. The scalar is differentiable — plug directly into TRL GRPOTrainer.
Citation
@misc{infinite_code_iii_2025,
title = {Infinite.Code.III: A Recursive Language Model with Self-Automated Learning},
author = {GODsStrongestSoldier},
year = {2025},
url = {https://huggingface.co/GODsStrongestSoldier/Infinite.Code.III},
note = {1.210B Recursive Language Model, 1M context window}
}
- Downloads last month
- -