YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

GR00T N1.7 -- color_object Checkpoint

Fine-tuned from nvidia/GR00T-N1.7-3B on the conflict_maniskill color_object task split.

Paper: https://arxiv.org/abs/2503.14734 Project: https://github.com/NVIDIA/Isaac-GR00T

What is GR00T N1.7

GR00T N1.7 is NVIDIA's generalist robot foundation model (3B parameters). It uses a Qwen3-VL-2B vision-language backbone with a diffusion-based action head. The model outputs action chunks of 16 steps via denoising diffusion.

Repository Structure

checkpoints/
  color_object/
    model-00001-of-00002.safetensors   (4.7 GB)
    model-00002-of-00002.safetensors   (1.8 GB)
    model.safetensors.index.json
    config.json
    processor/
    experiment_cfg/
gr00t/              # model, data, training library
scripts/            # finetune launch scripts, data prep, eval
  finetune_jobs/    # per-split SLURM scripts
  conflict_panda_config.py   # modality config used for training
  launch_finetune_local.py   # main finetune entrypoint
  prepare_conflict_data.py   # data conversion script
finetune.md         # detailed fine-tuning guide

Training Details

Parameter Value
Base model nvidia/GR00T-N1.7-3B
Steps 10,000
GPUs 4x A100
Global batch size 32
Learning rate 1e-4
Warmup ratio 0.05
Tuned modules projector + diffusion head (LLM and visual encoder frozen)
Action chunking 16 steps
Denoising steps 4 (inference)

Quick Inference

import torch
from gr00t.model.gr00t_n1 import GR00TPolicy
from gr00t.data.schema import EmbodimentTag
from scripts.conflict_panda_config import MODALITY_CONFIG

policy = GR00TPolicy.from_pretrained(
    "checkpoints/color_object",
    modality_config=MODALITY_CONFIG,
    embodiment_tag=EmbodimentTag.NEW_EMBODIMENT,
    denoising_steps=4,
    torch_dtype=torch.bfloat16,
)
policy.eval().cuda()

obs = {
    "video.image": image_tensor,           # (1, 1, H, W, 3) uint8
    "video.wrist_image": wrist_tensor,     # (1, 1, H, W, 3) uint8
    "state.arm": arm_state,               # (1, 1, 7) float32
    "state.gripper": gripper_state,       # (1, 1, 1) float32
    "annotation.human.task_description": ["pick up the color-matching object"],
}
with torch.no_grad():
    actions = policy.get_action(obs)
# actions["action.arm"]: (1, 16, 7)
# actions["action.gripper"]: (1, 16, 1)

See finetune.md for the complete fine-tuning guide.

Citation

@article{bjorck2025gr00t,
  title   = {GR00T N1: An Open Foundation Model for Generalist Humanoid Robots},
  author  = {Bjorck, Johan et al.},
  journal = {arXiv preprint arXiv:2503.14734},
  year    = {2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for yqi19/GR00T