You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SimLingo-QwenVL3-2B

SimLingo-QwenVL3-2B is a vision-language-action model for closed-loop autonomous driving in CARLA. This release is based on the SimLingo codebase and uses a Qwen3-VL-2B-Instruct backbone inside the SimLingo driving stack.

This repository provides two checkpoints:

  • Epoch 13 (trained for 14 epochs)
  • Epoch 14 (trained for 15 epochs)

Each checkpoint is released in two formats:

  • checkpoints/epoch=013.ckpt: PyTorch Lightning checkpoint for training resumption.
  • checkpoints/epoch=013.pt: exported weights for SimLingo evaluation and inference.

Model Overview

Evaluation

The checkpoint epoch=013.ckpt was evaluated on Bench2Drive.

Benchmark Checkpoint Driving Score (DS) Success Rate (%)
Bench2Drive epoch=013.ckpt 63.94 30.00

Bench2Drive is a CARLA closed-loop driving benchmark with 220 short routes and one safety-critical scenario per route. The corresponding .pt file in this repo is exported from the same epoch and is intended for use with the SimLingo evaluation code.

Usage

This model is not intended to be loaded with vanilla Transformers alone. It depends on the SimLingo repository for preprocessing, control prediction, and CARLA agent execution.

Typical closed-loop evaluation uses the exported .pt file together with the SimLingo codebase:

python /path/to/simlingo/Bench2Drive/leaderboard/leaderboard/leaderboard_evaluator.py \
  --agent /path/to/simlingo/team_code/agent_simlingo.py \
  --agent-config /path/to/checkpoints/epoch=013.pt

For cluster-based evaluation, refer to start_eval_simlingo.py in the SimLingo repository.

Training Context

This checkpoint comes from the training run:

  • Run name: 2026_04_18_18_01_31_simlingo_qwen3
  • Max epochs: 15
  • Batch Size: 96
  • Learning Rate: 3e-5
  • Precision: bf16-mixed
  • Training seed: 9876

The associated Hydra config uses:

  • vision backbone variant: Qwen3-VL-2B-Instruct
  • language backbone variant: Qwen3-VL-2B-Instruct

Intended Use

This model is intended for:

  • research on closed-loop driving in CARLA
  • benchmarking on Bench2Drive
  • studying language-conditioned driving behavior in simulation

This model is not intended for:

  • real-world driving deployment
  • safety-critical use outside simulation

Limitations

  • Results are from simulation and do not imply real-world safety.
  • Performance depends on the exact SimLingo code, CARLA version, and benchmark setup.
  • Closed-loop metrics can vary if evaluation infrastructure, GPU mapping, or simulator stability differ across machines.

Citation

If you use this checkpoint, please cite SimLingo and Bench2Drive.

@InProceedings{Renz2025cvpr,
  title={SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment},
  author={Renz, Katrin and Chen, Long and Arani, Elahe and Sinavski, Oleg},
  booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}

@inproceedings{Jia2024NeurIPS,
  title={Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving},
  author={Xiaosong Jia and Zhenjie Yang and Qifeng Li and Zhiyuan Zhang and Junchi Yan},
  booktitle={NeurIPS 2024 Datasets and Benchmarks Track},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Model tree for ZhanqiuG/Simlingo-Qwen3-VL-2B

Finetuned
(218)
this model

Dataset used to train ZhanqiuG/Simlingo-Qwen3-VL-2B

Paper for ZhanqiuG/Simlingo-Qwen3-VL-2B