Alpamayo-R1-10B Text-Only

This is a text-only extraction of nvidia/Alpamayo-R1-10B, also known as Alpamayo 1.

The original checkpoint is a vision-language-action model with:

  • a Qwen3-VL/Cosmos-style VLM backbone,
  • a vision tower,
  • a diffusion/action expert,
  • trajectory/action projection modules.

This repository keeps only the language backbone from vlm.model.language_model.* plus vlm.lm_head.weight, and saves it as a standalone Hugging Face Qwen3ForCausalLM checkpoint.

What Changed

  • Source model: nvidia/Alpamayo-R1-10B
  • Output architecture: Qwen3ForCausalLM
  • Output model_type: qwen3
  • Kept tensors: 399
  • Dropped tensors: 767
  • Output weights: 4 safetensors shards
  • Removed components include vlm.model.visual.*, expert.*, action_in_proj.*, action_out_proj.*, and action_space.*

The source repository does not include tokenizer files. The tokenizer here is based on Qwen/Qwen3-VL-8B-Instruct and extended with Alpamayo placeholder special tokens up to the model vocabulary size 155697. For GGUF conversion compatibility, the tokenizer config stores the Alpamayo placeholder tokens in additional_special_tokens, and the BPE vocab.json / merges.txt files are included alongside tokenizer.json.

Validation

Validated locally with:

  • torch 2.12.1+cpu
  • transformers 5.12.1
  • safetensors 0.8.0

Checks performed:

  • AutoConfig.from_pretrained(...) loads as Qwen3Config
  • AutoTokenizer.from_pretrained(...) loads as Qwen2Tokenizer
  • tokenizer length is 155697
  • AutoTokenizer.from_pretrained(...) loads without extra_special_tokens compatibility errors in current Transformers
  • AutoModelForCausalLM.from_pretrained(...) loads as Qwen3ForCausalLM
  • Forward pass succeeds on a short text prompt
  • Output logits shape: (1, 10, 155697)
  • No visual, vision, projector, language_model, expert, action_*, or vlm.* tensor names remain in the exported checkpoint

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_dir = "path/to/alpamayo_r1_10b_text_only"

tokenizer = AutoTokenizer.from_pretrained(model_dir, fix_mistral_regex=True)
model = AutoModelForCausalLM.from_pretrained(
    model_dir,
    torch_dtype="auto",
    device_map="auto",
)

inputs = tokenizer("Explain a safe driving decision at a busy intersection.", return_tensors="pt").to(model.device)
with torch.no_grad():
    output_ids = model.generate(**inputs, max_new_tokens=128)

print(tokenizer.decode(output_ids[0], skip_special_tokens=True))

Limitations

This checkpoint is text-only. It does not include the original vision tower, robotics/action expert, diffusion trajectory decoder, multimodal processors, or trajectory decoding logic.

This is an unofficial derived checkpoint and is not released by NVIDIA.

License

The source model states that its weights are released under a non-commercial license. Use of this derived checkpoint must comply with the original model license and any applicable terms.

Downloads last month
1
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sasa2000/Alpamayo-R1-10B-Text-Only

Finetuned
(1)
this model