You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

SimLingo-QwenVL3-2B

SimLingo-QwenVL3-2B is a vision-language-action model for closed-loop autonomous driving in CARLA. This release is based on the SimLingo codebase and uses a Qwen3-VL-2B-Instruct backbone inside the SimLingo driving stack.

This repository provides two checkpoints:

Epoch 13 (trained for 14 epochs)
Epoch 14 (trained for 15 epochs)

Each checkpoint is released in two formats:

checkpoints/epoch=013.ckpt: PyTorch Lightning checkpoint for training resumption.
checkpoints/epoch=013.pt: exported weights for SimLingo evaluation and inference.

Model Overview

Model family: SimLingo
Backbone: Qwen3-VL-2B-Instruct
Modality: front-camera vision + route / control conditioning
Primary use case: closed-loop autonomous driving research in simulation
Codebase: https://github.com/RenzKa/simlingo
Paper: https://arxiv.org/abs/2503.09594

Evaluation

The checkpoint epoch=013.ckpt was evaluated on Bench2Drive.

Benchmark	Checkpoint	Driving Score (DS)	Success Rate (%)
Bench2Drive	`epoch=013.ckpt`	63.94	30.00

Bench2Drive is a CARLA closed-loop driving benchmark with 220 short routes and one safety-critical scenario per route. The corresponding .pt file in this repo is exported from the same epoch and is intended for use with the SimLingo evaluation code.

Usage

This model is not intended to be loaded with vanilla Transformers alone. It depends on the SimLingo repository for preprocessing, control prediction, and CARLA agent execution.

Typical closed-loop evaluation uses the exported .pt file together with the SimLingo codebase:

python /path/to/simlingo/Bench2Drive/leaderboard/leaderboard/leaderboard_evaluator.py \
  --agent /path/to/simlingo/team_code/agent_simlingo.py \
  --agent-config /path/to/checkpoints/epoch=013.pt

For cluster-based evaluation, refer to start_eval_simlingo.py in the SimLingo repository.

Training Context

This checkpoint comes from the training run:

Run name: 2026_04_18_18_01_31_simlingo_qwen3
Max epochs: 15
Batch Size: 96
Learning Rate: 3e-5
Precision: bf16-mixed
Training seed: 9876

The associated Hydra config uses:

vision backbone variant: Qwen3-VL-2B-Instruct
language backbone variant: Qwen3-VL-2B-Instruct

Intended Use

This model is intended for:

research on closed-loop driving in CARLA
benchmarking on Bench2Drive
studying language-conditioned driving behavior in simulation

This model is not intended for:

real-world driving deployment
safety-critical use outside simulation

Limitations

Results are from simulation and do not imply real-world safety.
Performance depends on the exact SimLingo code, CARLA version, and benchmark setup.
Closed-loop metrics can vary if evaluation infrastructure, GPU mapping, or simulator stability differ across machines.

Citation

If you use this checkpoint, please cite SimLingo and Bench2Drive.

@InProceedings{Renz2025cvpr,
  title={SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment},
  author={Renz, Katrin and Chen, Long and Arani, Elahe and Sinavski, Oleg},
  booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}

@inproceedings{Jia2024NeurIPS,
  title={Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving},
  author={Xiaosong Jia and Zhenjie Yang and Qifeng Li and Zhiyuan Zhang and Junchi Yan},
  booktitle={NeurIPS 2024 Datasets and Benchmarks Track},
  year={2024}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

Model tree for ZhanqiuG/Simlingo-Qwen3-VL-2B

Base model

Qwen/Qwen3-VL-2B-Instruct

Finetuned

(218)

this model

Dataset used to train ZhanqiuG/Simlingo-Qwen3-VL-2B

Paper for ZhanqiuG/Simlingo-Qwen3-VL-2B

SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment

Paper • 2503.09594 • Published Mar 12, 2025 • 2