Instructions to use zilinhuang/DriveVLM-RL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- stable-baselines3
How to use zilinhuang/DriveVLM-RL with stable-baselines3:
from huggingface_sb3 import load_from_hub checkpoint = load_from_hub( repo_id="zilinhuang/DriveVLM-RL", filename="{MODEL FILENAME}.zip", ) - Notebooks
- Google Colab
- Kaggle
DriveVLM-RL: pretrained policy checkpoints
Final SAC policy checkpoints for DriveVLM-RL: Neuroscience-Inspired Reinforcement Learning with Vision-Language Models for Safe and Deployable Autonomous Driving.
Policies are trained in CARLA Town02 with three independent seeds. The VLM reward components (CLIP, YOLOv8s, Qwen3-VL) are used only during training; these checkpoints are pure Stable-Baselines3 SAC policies and need no VLM at inference.
- Code: https://github.com/zilin-huang/DriveVLM-RL
- Paper: https://arxiv.org/abs/2603.18315
- Project page: https://zilin-huang.github.io/DriveVLM-RL-website/
Checkpoints
Normal mode
Standard DriveVLM-RL policies (full dual-pathway reward).
| File | Seed (run id) | Training steps |
|---|---|---|
drivevlm_rl_normal_seed1_20251029_115740.zip |
20251029_115740 | 980k |
drivevlm_rl_normal_seed2_20260423_120202.zip |
20260423_120202 | 1.16M |
drivevlm_rl_normal_seed3_20260423_120553.zip |
20260423_120553 | 1.07M |
Extreme mode
Policies trained under the extreme reward setting with the safety penalty disabled (R_penalty = 0), used for the robustness study in the paper.
| File | Seed (run id) | Training steps |
|---|---|---|
drivevlm_rl_extreme_seed1_20251104_141251.zip |
20251104_141251 | 1.02M |
drivevlm_rl_extreme_seed2_20260425_023939.zip |
20260425_023939 | 1.06M |
drivevlm_rl_extreme_seed3_20260425_023946.zip |
20260425_023946 | 1.03M |
Results
All numbers are mean ± std over 3 training seeds. Metrics: AS = average speed,
TD = travel distance (m), RC = route completion, SR = success rate,
CS = collision speed (collision severity; lower is safer). ↑ higher is better,
↓ lower is better. Bold marks the best driving method per column. Raw per-seed CSVs
are in results/.
Main comparison (CARLA Town02, in-distribution)
| Method | Venue | AS ↑ | TD ↑ | RC ↑ | SR ↑ | CS ↓ |
|---|---|---|---|---|---|---|
| TIRL-SAC | TR-C'22 | 0.45 ± 0.77 | 1.49 ± 2.32 | 0.01 ± 0.01 | 0.00 ± 0.00 | 0.29 ± 0.50 |
| Chen-SAC | T-ITS'22 | 24.32 ± 0.46 | 162.01 ± 17.67 | 0.49 ± 0.08 | 0.50 ± 0.10 | 16.04 ± 2.51 |
| ASAP | RSS'23 | 11.53 ± 10.22 | 25.00 ± 24.92 | 0.12 ± 0.11 | 0.00 ± 0.00 | 7.07 ± 5.96 |
| ChatScene-PPO | CVPR'24 | 14.78 ± 0.30 | 127.85 ± 10.39 | 0.44 ± 0.14 | 0.40 ± 0.10 | 6.05 ± 1.28 |
| Revolve | ICLR'25 | 17.42 ± 0.80 | 134.37 ± 15.26 | 0.40 ± 0.12 | 0.40 ± 0.20 | 10.33 ± 2.25 |
| Revolve-auto | ICLR'25 | 14.12 ± 3.07 | 129.14 ± 33.22 | 0.33 ± 0.12 | 0.40 ± 0.20 | 7.80 ± 1.06 |
| VLM-SR | NeurIPS'23 | 0.06 ± 0.05 | 2.26 ± 1.26 | 0.01 ± 0.00 | 0.00 ± 0.00 | 0.66 ± 1.14 |
| RoboCLIP | NeurIPS'23 | 0.13 ± 0.09 | 3.46 ± 2.32 | 0.02 ± 0.01 | 0.00 ± 0.00 | 0.01 ± 0.02 |
| VLM-RM | ICLR'24 | 0.08 ± 0.01 | 3.60 ± 0.38 | 0.02 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
| LORD | WACV'25 | 0.36 ± 0.59 | 4.10 ± 6.11 | 0.03 ± 0.03 | 0.00 ± 0.00 | 1.52 ± 2.63 |
| VLM-RL | TR-C'25 | 14.38 ± 1.53 | 138.08 ± 16.68 | 0.51 ± 0.08 | 0.40 ± 0.00 | 10.09 ± 11.93 |
| DriveVLM-RL (ours) | This work | 14.54 ± 1.81 | 186.59 ± 14.00 | 0.57 ± 0.03 | 0.57 ± 0.15 | 1.75 ± 3.02 |
Several reward-only baselines (TIRL-SAC, VLM-SR, RoboCLIP, VLM-RM, LORD) collapse to a near-stationary policy (AS ≈ 0, RC ≈ 0.01) and avoid collisions trivially, so their low CS is degenerate. Among methods that complete routes, DriveVLM-RL drives the farthest, completes the most of the route, succeeds most often, and has the lowest collision severity.
Cross-town generalization (zero-shot, trained on Town02)
| Town | Method | RC ↑ | SR ↑ | CS ↓ |
|---|---|---|---|---|
| Town01 | ChatScene-PPO | 0.33 ± 0.04 | 0.30 ± 0.10 | 4.87 ± 0.48 |
| Town01 | VLM-RL | 0.27 ± 0.08 | 0.03 ± 0.06 | 10.64 ± 3.24 |
| Town01 | DriveVLM-RL | 0.21 ± 0.03 | 0.03 ± 0.06 | 1.59 ± 0.80 |
| Town03 | ChatScene-PPO | 0.34 ± 0.06 | 0.10 ± 0.00 | 15.27 ± 4.49 |
| Town03 | VLM-RL | 0.28 ± 0.04 | 0.07 ± 0.06 | 18.20 ± 8.46 |
| Town03 | DriveVLM-RL | 0.38 ± 0.04 | 0.10 ± 0.00 | 10.97 ± 3.05 |
| Town04 | ChatScene-PPO | 0.27 ± 0.02 | 0.10 ± 0.00 | 15.20 ± 3.16 |
| Town04 | VLM-RL | 0.18 ± 0.02 | 0.17 ± 0.06 | 8.54 ± 4.16 |
| Town04 | DriveVLM-RL | 0.18 ± 0.07 | 0.07 ± 0.12 | 3.57 ± 2.45 |
| Town05 | ChatScene-PPO | 0.29 ± 0.09 | 0.07 ± 0.06 | 8.02 ± 7.06 |
| Town05 | VLM-RL | 0.22 ± 0.06 | 0.00 ± 0.00 | 6.77 ± 11.73 |
| Town05 | DriveVLM-RL | 0.30 ± 0.02 | 0.03 ± 0.06 | 3.57 ± 1.50 |
DriveVLM-RL attains the lowest collision severity (CS) in every unseen town.
Traffic density (CARLA Town02; Regular is the training density)
| Density | Method | RC ↑ | SR ↑ | CS ↓ |
|---|---|---|---|---|
| Empty | ChatScene-PPO | 0.57 ± 0.00 | 0.90 ± 0.00 | 8.92 ± 4.04 |
| Empty | VLM-RL | 0.53 ± 0.12 | 0.77 ± 0.32 | 0.00 ± 0.00 |
| Empty | DriveVLM-RL | 0.57 ± 0.05 | 0.70 ± 0.10 | 0.00 ± 0.00 |
| Regular | ChatScene-PPO | 0.44 ± 0.14 | 0.40 ± 0.10 | 6.05 ± 1.28 |
| Regular | VLM-RL | 0.51 ± 0.08 | 0.40 ± 0.00 | 10.09 ± 5.93 |
| Regular | DriveVLM-RL | 0.57 ± 0.03 | 0.57 ± 0.15 | 1.75 ± 3.02 |
| Dense | ChatScene-PPO | 0.41 ± 0.09 | 0.20 ± 0.17 | 4.77 ± 1.10 |
| Dense | VLM-RL | 0.37 ± 0.07 | 0.27 ± 0.06 | 6.93 ± 1.63 |
| Dense | DriveVLM-RL | 0.46 ± 0.08 | 0.33 ± 0.15 | 2.28 ± 1.83 |
Extreme mode (safety penalty disabled, R_penalty = 0)
| Method | AS ↑ | TD ↑ | RC ↑ | SR ↑ | CS ↓ |
|---|---|---|---|---|---|
| ChatScene-PPO | 15.20 ± 0.39 | 137.51 ± 12.42 | 0.38 ± 0.05 | 0.47 ± 0.12 | 3.91 ± 1.75 |
| VLM-RL | 14.52 ± 0.56 | 136.33 ± 31.86 | 0.49 ± 0.09 | 0.40 ± 0.10 | 4.59 ± 4.09 |
| DriveVLM-RL | 15.17 ± 1.85 | 149.69 ± 34.85 | 0.44 ± 0.03 | 0.50 ± 0.10 | 0.69 ± 1.09 |
Even with the explicit safety penalty removed, DriveVLM-RL keeps the lowest collision severity and highest success rate, showing the dual-pathway semantic reward instills safe behavior rather than relying on a hand-tuned penalty.
Usage
from huggingface_hub import hf_hub_download
ckpt = hf_hub_download("zilinhuang/DriveVLM-RL",
"drivevlm_rl_normal_seed1_20251029_115740.zip")
# then, in the DriveVLM-RL repo (vlm-rl env):
# python eval/eval.py --model <ckpt> --config drivevlm_rl --town Town02 --density regular
- Downloads last month
- -