GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Paper • 2503.14734 • Published • 7
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Fine-tuned from nvidia/GR00T-N1.7-3B on the
conflict_maniskill color_object task split.
Paper: https://arxiv.org/abs/2503.14734 Project: https://github.com/NVIDIA/Isaac-GR00T
GR00T N1.7 is NVIDIA's generalist robot foundation model (3B parameters). It uses a Qwen3-VL-2B vision-language backbone with a diffusion-based action head. The model outputs action chunks of 16 steps via denoising diffusion.
checkpoints/
color_object/
model-00001-of-00002.safetensors (4.7 GB)
model-00002-of-00002.safetensors (1.8 GB)
model.safetensors.index.json
config.json
processor/
experiment_cfg/
gr00t/ # model, data, training library
scripts/ # finetune launch scripts, data prep, eval
finetune_jobs/ # per-split SLURM scripts
conflict_panda_config.py # modality config used for training
launch_finetune_local.py # main finetune entrypoint
prepare_conflict_data.py # data conversion script
finetune.md # detailed fine-tuning guide
| Parameter | Value |
|---|---|
| Base model | nvidia/GR00T-N1.7-3B |
| Steps | 10,000 |
| GPUs | 4x A100 |
| Global batch size | 32 |
| Learning rate | 1e-4 |
| Warmup ratio | 0.05 |
| Tuned modules | projector + diffusion head (LLM and visual encoder frozen) |
| Action chunking | 16 steps |
| Denoising steps | 4 (inference) |
import torch
from gr00t.model.gr00t_n1 import GR00TPolicy
from gr00t.data.schema import EmbodimentTag
from scripts.conflict_panda_config import MODALITY_CONFIG
policy = GR00TPolicy.from_pretrained(
"checkpoints/color_object",
modality_config=MODALITY_CONFIG,
embodiment_tag=EmbodimentTag.NEW_EMBODIMENT,
denoising_steps=4,
torch_dtype=torch.bfloat16,
)
policy.eval().cuda()
obs = {
"video.image": image_tensor, # (1, 1, H, W, 3) uint8
"video.wrist_image": wrist_tensor, # (1, 1, H, W, 3) uint8
"state.arm": arm_state, # (1, 1, 7) float32
"state.gripper": gripper_state, # (1, 1, 1) float32
"annotation.human.task_description": ["pick up the color-matching object"],
}
with torch.no_grad():
actions = policy.get_action(obs)
# actions["action.arm"]: (1, 16, 7)
# actions["action.gripper"]: (1, 16, 1)
See finetune.md for the complete fine-tuning guide.
@article{bjorck2025gr00t,
title = {GR00T N1: An Open Foundation Model for Generalist Humanoid Robots},
author = {Bjorck, Johan et al.},
journal = {arXiv preprint arXiv:2503.14734},
year = {2025}
}