Instructions to use sasa2000/Alpamayo-R1-10B-Text-Only with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sasa2000/Alpamayo-R1-10B-Text-Only with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="sasa2000/Alpamayo-R1-10B-Text-Only") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("sasa2000/Alpamayo-R1-10B-Text-Only") model = AutoModelForCausalLM.from_pretrained("sasa2000/Alpamayo-R1-10B-Text-Only") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use sasa2000/Alpamayo-R1-10B-Text-Only with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sasa2000/Alpamayo-R1-10B-Text-Only" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sasa2000/Alpamayo-R1-10B-Text-Only", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/sasa2000/Alpamayo-R1-10B-Text-Only
- SGLang
How to use sasa2000/Alpamayo-R1-10B-Text-Only with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "sasa2000/Alpamayo-R1-10B-Text-Only" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sasa2000/Alpamayo-R1-10B-Text-Only", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "sasa2000/Alpamayo-R1-10B-Text-Only" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sasa2000/Alpamayo-R1-10B-Text-Only", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use sasa2000/Alpamayo-R1-10B-Text-Only with Docker Model Runner:
docker model run hf.co/sasa2000/Alpamayo-R1-10B-Text-Only
Alpamayo-R1-10B Text-Only
This is a text-only extraction of nvidia/Alpamayo-R1-10B, also known as Alpamayo 1.
The original checkpoint is a vision-language-action model with:
- a Qwen3-VL/Cosmos-style VLM backbone,
- a vision tower,
- a diffusion/action expert,
- trajectory/action projection modules.
This repository keeps only the language backbone from vlm.model.language_model.* plus vlm.lm_head.weight, and saves it as a standalone Hugging Face Qwen3ForCausalLM checkpoint.
What Changed
- Source model:
nvidia/Alpamayo-R1-10B - Output architecture:
Qwen3ForCausalLM - Output
model_type:qwen3 - Kept tensors: 399
- Dropped tensors: 767
- Output weights: 4 safetensors shards
- Removed components include
vlm.model.visual.*,expert.*,action_in_proj.*,action_out_proj.*, andaction_space.*
The source repository does not include tokenizer files. The tokenizer here is based on Qwen/Qwen3-VL-8B-Instruct and extended with Alpamayo placeholder special tokens up to the model vocabulary size 155697.
For GGUF conversion compatibility, the tokenizer config stores the Alpamayo placeholder tokens in additional_special_tokens, and the BPE vocab.json / merges.txt files are included alongside tokenizer.json.
Validation
Validated locally with:
torch 2.12.1+cputransformers 5.12.1safetensors 0.8.0
Checks performed:
AutoConfig.from_pretrained(...)loads asQwen3ConfigAutoTokenizer.from_pretrained(...)loads asQwen2Tokenizer- tokenizer length is
155697 AutoTokenizer.from_pretrained(...)loads withoutextra_special_tokenscompatibility errors in current TransformersAutoModelForCausalLM.from_pretrained(...)loads asQwen3ForCausalLM- Forward pass succeeds on a short text prompt
- Output logits shape:
(1, 10, 155697) - No
visual,vision,projector,language_model,expert,action_*, orvlm.*tensor names remain in the exported checkpoint
Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_dir = "path/to/alpamayo_r1_10b_text_only"
tokenizer = AutoTokenizer.from_pretrained(model_dir, fix_mistral_regex=True)
model = AutoModelForCausalLM.from_pretrained(
model_dir,
torch_dtype="auto",
device_map="auto",
)
inputs = tokenizer("Explain a safe driving decision at a busy intersection.", return_tensors="pt").to(model.device)
with torch.no_grad():
output_ids = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
Limitations
This checkpoint is text-only. It does not include the original vision tower, robotics/action expert, diffusion trajectory decoder, multimodal processors, or trajectory decoding logic.
This is an unofficial derived checkpoint and is not released by NVIDIA.
License
The source model states that its weights are released under a non-commercial license. Use of this derived checkpoint must comply with the original model license and any applicable terms.
- Downloads last month
- 1
Model tree for sasa2000/Alpamayo-R1-10B-Text-Only
Base model
nvidia/Alpamayo-R1-10B