arxiv:2606.02956

The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset

Published on Jun 1

· Submitted by

Fabian Immel on Jun 5

Karlsruhe Institute of Technology

Upvote

Authors:

Richard Schwarzkopf ,

Fabian Immel ,

Alexander Blumberg ,

Kaiwen Wang ,

Fabian Konstantinidis ,

Julian Truetsch ,

Carlos Fernandez ,

Annika Bätz ,

Kevin Rösch ,

Marlon Steiner ,

Willi Poh ,

Yinzhe Shen ,

Royden Wagner ,

Felix Hauser ,

Dominik Strutz ,

Jaime Villa ,

Gleb Stepanov ,

Holger Caesar ,

Ömer Şahin Taş ,

Frank Bieder

Abstract

KITScenes Multimodal dataset provides high-fidelity European driving data with comprehensive 3D maps and diverse urban environments for embodied AI research.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Existing autonomous driving datasets have enabled major progress, but fall short in sensor fidelity, map completeness, or geographic diversity. We present KITScenes Multimodal, a European dataset built around high-fidelity sensors and maps. Our fully synchronized sensor suite combines high-resolution global-shutter cameras, long-range lidar beyond 400m, 4D imaging radar, and redundant GNSS/INS localization. Our HD maps are, to our knowledge, the most complete of any sensor dataset, validated through autonomous driving trials on open-source software. For the first time in a public dataset, all driving-relevant traffic elements, such as traffic lights, are mapped in 3D to a reprojection-accurate level with full topological connectivity. Recorded in cities with irregular street layouts and mixed traffic modes, our dataset complements existing datasets by broadening the available geographic diversity. We also introduce four benchmarks, each advancing spatial learning for embodied AI: online HD map construction, long-range depth estimation, novel view synthesis, and end-to-end driving. Project page: https://kitscenes.com/

View arXiv page View PDF Project page Add to collection

Community

immel-f

Paper author Paper submitter about 9 hours ago

•

edited about 9 hours ago

We present KITScenes Multimodal, a European dataset built around a high-fidelity sensor suite together with the most complete HD maps for autonomous driving ever released.

Highlights

European urban focus — recordings from Karlsruhe, Frankfurt, and Sindelfingen
High-fidelity sensor suite — 72.5 MPix global-shutter surround cameras, 7 LiDARs (~906k points/frame on average), 3 Continental ARS548 4D imaging radars, and redundant GNSS/INS
Long-range sensing — effective LiDAR range beyond 400 m with substantially higher return density than common public driving datasets
Production-grade HD maps — Lanelet2 maps with lane topology, regulatory elements, 29 road-feature classes, 120 traffic-sign classes (GTSIGN-220 taxonomy), and 3D-localized signs, traffic lights, and poles
Research benchmarks — relational HD map perception, long-range monocular depth estimation, novel view synthesis, and end-to-end / world-model research

immel-f

Paper author Paper submitter about 9 hours ago

We present KITScenes Multimodal, a European dataset built around a high-fidelity sensor suite together with the most complete HD maps for autonomous driving ever released.

Highlights

European urban focus — recordings from Karlsruhe, Frankfurt, and Sindelfingen
High-fidelity sensor suite — 72.5 MPix global-shutter surround cameras, 7 LiDARs (~906k points/frame on average), 3 Continental ARS548 4D imaging radars, and redundant GNSS/INS
Long-range sensing — effective LiDAR range beyond 400 m with substantially higher return density than common public driving datasets
Production-grade HD maps — Lanelet2 maps with lane topology, regulatory elements, 29 road-feature classes, 120 traffic-sign classes (GTSIGN-220 taxonomy), and 3D-localized signs, traffic lights, and poles
Research benchmarks — relational HD map perception, long-range monocular depth estimation, novel view synthesis, and end-to-end / world-model research