SimLingo-QwenVL3-2B
SimLingo-QwenVL3-2B is a vision-language-action model for closed-loop autonomous driving in CARLA. This release is based on the SimLingo codebase and uses a Qwen3-VL-2B-Instruct backbone inside the SimLingo driving stack.
This repository provides two checkpoints:
- Epoch 13 (trained for 14 epochs)
- Epoch 14 (trained for 15 epochs)
Each checkpoint is released in two formats:
checkpoints/epoch=013.ckpt: PyTorch Lightning checkpoint for training resumption.checkpoints/epoch=013.pt: exported weights for SimLingo evaluation and inference.
Model Overview
- Model family: SimLingo
- Backbone: Qwen3-VL-2B-Instruct
- Modality: front-camera vision + route / control conditioning
- Primary use case: closed-loop autonomous driving research in simulation
- Codebase: https://github.com/RenzKa/simlingo
- Paper: https://arxiv.org/abs/2503.09594
Evaluation
The checkpoint epoch=013.ckpt was evaluated on Bench2Drive.
| Benchmark | Checkpoint | Driving Score (DS) | Success Rate (%) |
|---|---|---|---|
| Bench2Drive | epoch=013.ckpt |
63.94 | 30.00 |
Bench2Drive is a CARLA closed-loop driving benchmark with 220 short routes and one safety-critical scenario per route. The corresponding .pt file in this repo is exported from the same epoch and is intended for use with the SimLingo evaluation code.
Usage
This model is not intended to be loaded with vanilla Transformers alone. It depends on the SimLingo repository for preprocessing, control prediction, and CARLA agent execution.
Typical closed-loop evaluation uses the exported .pt file together with the SimLingo codebase:
python /path/to/simlingo/Bench2Drive/leaderboard/leaderboard/leaderboard_evaluator.py \
--agent /path/to/simlingo/team_code/agent_simlingo.py \
--agent-config /path/to/checkpoints/epoch=013.pt
For cluster-based evaluation, refer to start_eval_simlingo.py in the SimLingo repository.
Training Context
This checkpoint comes from the training run:
- Run name:
2026_04_18_18_01_31_simlingo_qwen3 - Max epochs: 15
- Batch Size: 96
- Learning Rate: 3e-5
- Precision:
bf16-mixed - Training seed:
9876
The associated Hydra config uses:
- vision backbone variant:
Qwen3-VL-2B-Instruct - language backbone variant:
Qwen3-VL-2B-Instruct
Intended Use
This model is intended for:
- research on closed-loop driving in CARLA
- benchmarking on Bench2Drive
- studying language-conditioned driving behavior in simulation
This model is not intended for:
- real-world driving deployment
- safety-critical use outside simulation
Limitations
- Results are from simulation and do not imply real-world safety.
- Performance depends on the exact SimLingo code, CARLA version, and benchmark setup.
- Closed-loop metrics can vary if evaluation infrastructure, GPU mapping, or simulator stability differ across machines.
Citation
If you use this checkpoint, please cite SimLingo and Bench2Drive.
@InProceedings{Renz2025cvpr,
title={SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment},
author={Renz, Katrin and Chen, Long and Arani, Elahe and Sinavski, Oleg},
booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}
@inproceedings{Jia2024NeurIPS,
title={Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving},
author={Xiaosong Jia and Zhenjie Yang and Qifeng Li and Zhiyuan Zhang and Junchi Yan},
booktitle={NeurIPS 2024 Datasets and Benchmarks Track},
year={2024}
}
Model tree for ZhanqiuG/Simlingo-Qwen3-VL-2B
Base model
Qwen/Qwen3-VL-2B-Instruct