9.41 TB
25,004 files
Updated 6 days ago
NameSize
calibration
camera
.gitattributes2.46 kB
xet
LICENSE.pdf89.6 kB
xet
README.md32.2 kB
xet
features.csv7.08 kB
xet
README.md

PHYSICAL AI AUTONOMOUS VEHICLES

mosaic_4x4

The PhysicalAI-Autonomous-Vehicles dataset provides one of the largest, geographically diverse collections of multi-sensor data empowering AV researchers to build the next generation of Physical AI based end-to-end driving systems. This dataset is ready for commercial/non-commercial AV use per the license agreement.

  • Data Collection Method

    • Automatic/Sensor
  • Labeling Method

    • Automatic/Sensor

This dataset has a total of 1700 hours of driving recorded from planned data-collection drives in 25 countries and 2500+ cities. The data captures diverse traffic, weather conditions, obstacles, and pedestrians in the environment. It consists of 306,152 clips that are each 20 seconds long. The sensor data includes multi-camera coverage for all 306,152 clips, LiDAR coverage for 298,326 clips, and radar coverage for 160,761 clips.

A subset (~41K) of this dataset is immediately explorable through a CDS Preview Experience (requires registration).

Intended Usage

This dataset can be used for autonomous vehicle related use cases only which can be both commercial or non-commercial as long as the mentioned license terms are abided by. The size and diversity of this multi-sensor dataset makes it great for research on end-to-end driving, neural reconstruction, synthetic data generation, scenario mining, and many other autonomous vehicle applications.

Developer Tooling

A python developer kit to support application workflows and additional data format documentation is available at https://github.com/NVlabs/physical_ai_av. On systems using Python >= 3.11, the package can be directly installed using:

pip install physical_ai_av

This package provides direct download capabilities from HuggingFace. To authenticate the system for direct download access,

For data mining and curation, NVIDIA also provides tools like Cosmos Dataset Search (CDS) for multimodal semantic search with text and video queries. A subset of this dataset is explorable through a CDS Preview Experience.

Applications

Developing a model end to end for safe use in the real world requires a combination of massive domain-specific data operations as well as training and testing activities. This dataset enables a number of workflows to bootstrap or expand such development, including but not limited to:

  • Scenario mining and analysis
  • Neural reconstruction
  • Synthetic data generation
  • Supervised fine tuning
  • Reinforcement learning

NVIDIA is providing a set of workflows for these activities built upon NVIDIA’s Physical AI AV Dataset for NuRec-based reconstruction, supervised fine tuning, and reinforcement learning. They can be found here:

Note: We currently support NuRec-based reconstruction only for the Hyperion 8.1 sensor rig. These can be identified through filtering of the data collection parquet for hyperion_8.1 value of the platform_class field.

{
    # Other Metadata Entries
    # Vehicle Platform
    'platform_class': str, # platform (hyperion_8/8.1)
}

Additionally, we do not include open maps data. Scenes are not compatible with CARLA unless the user generates their own XODR data for now. However, we are looking to add XODR to enable simulation for CARLA, AlpaSim, and others in the future.

Version History

Version Notes
26.03 Additional signal processing to enable NCore-based NuRec reconstruction workflow and SFT, RL workflows for majority (97%) of clips. Specific changes:
- Renamed/updated metadata/sensor_presence.parquet -> metadata/feature_presence.parquet to match features.csv; radar_config column now moved to metadata/data_collection.parquet.
- Added offline-optimized features egomotion.offline, obstacle.offline, camera_intrinsics.offline, lidar_intrinsics.offline, sensor_extrinsics.offline for 97% of clips.
- Updated lidar_top_360fov for 97% of clips; per-clip lidar presence is now recorded in metadata/feature_presence.parquet.
25.10 Initial release of dataset

Previous versions are tagged and associated data remains accessible through the HuggingFace repo history.

License/Terms of Use

NVIDIA Autonomous Vehicle Dataset License Agreement

Dataset Owner(s)

NVIDIA Corporation

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns here.

DATASET

output

Dataset Quantification

This dataset has a total of 1700 hours of driving recorded from planned data-collection drives in 25 countries and 2500+ cities. The data captures diverse traffic, weather conditions, obstacles, and pedestrians in the environment. It consists of 306,152 clips that are each 20 seconds long. The sensor data includes multi-camera (7) coverage for all clips, LiDAR (1) coverage for 298,326 clips, and radar (up to 10) coverage for 160,761 clips. Ego motion, calibration, and machine labels are also part of the set.

The total size of the dataset is 133TB.

Dataset Diversity: Regional

data_diversity

country count country count country count
United States 155360 Croatia 4961 Romania 2719
Germany 43900 Netherlands 4932 Luxembourg 2620
France 10364 Denmark 4581 Latvia 2173
Italy 8658 Slovenia 4301 Hungary 1960
Sweden 7330 Estonia 4128 Bulgaria 932
Spain 6459 Slovakia 4122
Portugal 6101 Belgium 3753
Greece 5885 Czechia 3662
Austria 5451 Lithuania 3392
Finland 5176 Poland 3232

Dataset Diversity: Environmental and Traffic

  • Traffic density patterns: no traffic, light traffic, medium traffic, and heavy traffic
  • Road types: highways, urban, residential, and rural roads
  • Weather: clear, rain, snow, fog
  • Surface conditions: dry, wet, snow/ice
  • Time-of-day: daytime, nighttime
  • Infrastructure elements such as tunnels, bridges, roundabouts, railway crossings, toll booths, inclines, and more

Sensor Data Structure

Data Format

We store the data separately for each sensor (camera, LiDAR and radar). Besides these sensors we also provide ego motion, calibration data, autogenerated (non-GT) machine labels, and other metadata.

Because of the significant size of this dataset, we provide all features (sensor data and autolabels) in chunks of up to 100 clips each. The exception to this chunking is clip-level metadata which we intend for researchers to use to identify which subset of chunks they are interested in downloading according to their target application. Significant storage space and bandwidth savings may be achieved by downloading only chunks corresponding to a subset of sensors, country of collection, dataset split, etc.

A python developer kit to support such workflows and additional data format documentation is available at https://github.com/NVlabs/physical_ai_av.

Sensor Data Organization

We detail the general data file organization. Specific schema descriptions within the parquet files are maintained within the Physical AI python library wiki here:

Camera Data

This sensor captures visual RGB data (i.e., videos) from multiple viewpoints around the vehicle. In our dataset the following seven cameras are included:

  • Cross left 120 fov
  • Cross right 120 fov
  • Front wide 120 fov
  • Front tele 30 fov
  • Rear left 70 fov
  • Rear right 70 fov
  • Rear tele 30 fov

Directory structure

camera/
├─ camera_front_wide_120fov/
│  ├─ camera_front_wide_120fov.chunk_0000.zip  
│  └─ ...
└─ camera_cross_left_120fov/
└─ ...

Each chunk_xxxx.zip contains approximately 100 1080p mp4 files recorded at 30fps. Each mp4 will be named <clip_uuid>.camera_<field_of_view>.mp4. Users can use this UUID to map across different corresponding views and sensors (provided there is coverage) under the designated sensor directories. The chunks also contain frame timestamps parquets corresponding to the camera mp4 files with a UUID tag in the name.

LiDAR Data

This directory contains 3D point cloud data recorded using a top 360 degree rotating LiDAR.

├─ lidar/
     └─ lidar_top_360fov/
            ├─ lidar_top_360fov.chunk_0000.zip
            ├─ ...
            └─ lidar_top_360fov.chunk_XXXX.zip

Inside lidar_top_360fov.chunk_0000.zip, there are approximately 100 lidar parquet files. Each parquet will be named <clip_uuid>.lidar_top360_fov.parquet and contains approximately 200 lidar spins (i.e. 10Hz capture rate for a 20sec clip).

The point cloud can be decoded, e.g., by using the DracoPy library.

Radar Data

This folder contains 3D radar point clouds data recorded using (up to) 10 radars located in the front bumper center, front left corner, front right corner, left side, right side, rear left corner, rear right corner, rear left, and rear right.

 radar/
   ├─ radar_corner_front_left_srr_0/
   │  ├─ radar_corner_front_left_srr_0.chunk_0000.zip  
   │  ├─ ...
   │  └─ radar_corner_front_left_srr_0.chunk_xxxx.zip
   ├─ radar_corner_front_right_srr_0/   
   └─ ...

Inside chunk_XXXX.zip, there are approximately 100 radar parquet files. Each parquet will be named <clip_uuid>.radar_<field_of_view>_<configuration>.parquet. The letters srr stand for short range radar, mrr for medium range radar, and lrr for long range radar.

Unlike other sensors, for a clip with radar data coverage, the radar sensors for each field of view can have varying model types, depending on the clip. Therefore, the zip files accompany the numerical reference like srr_0, srr_3 at the end to denote the radar model reference.

Calibration Data

Contains [sensor]_intrinsics, sensor_extrinsics, and vehicle_dimension directories. In addition, offline versions of this data are marked with “.offline”. The directories contain chunked parquets.

calibration/
     └─ camera_intrinsics.offline/
            ├─ camera_intrinsics.offline.chunk_0000.parquet
            ├─ ...
            └─ camera_intrinsics.offline.chunk_xxxx.parquet
     └─ camera_intrinsics/
            ├─ camera_intrinsics.chunk_0000.parquet
            ├─ ...
            └─ camera_intrinsics.chunk_xxxx.parquet
     └─ lidar_intrinsics.offline/
            ├─ lidar_intrinsics.offline.chunk_0000.parquet
            ├─ ...
            └─ lidar_intrinsics.offline.chunk_xxxx.parquet
     └─ sensor_extrinsics.offline/
            ├─ sensor_extrinsics.offline.chunk_0000.parquet
            ├─ ...
            └─ sensor_extrinsics.offline.chunk_xxxx.parquet
     └─ sensor_extrinsics/
            ├─ sensor_extrinsics.chunk_0000.parquet
            ├─ ...
            └─ sensor_extrinsics.chunk_xxxx.parquet
     └─ vehicle_dimensions/
            ├─ vehicle_dimensions.chunk_0000.parquet
            ├─ ...
            └─ vehicle_dimensions.chunk_xxxx.parquet

Labels

Contains egomotion, egomotion.offline, and obstacle.offline directories. This data is in a local coordinate frame consistent across all timestamps with the origin located at the ego vehicle's position at timestamp 0, oriented such that there is 0 yaw at timestamp 0 but otherwise attitude (pitch and roll) are estimated with respect to gravity.

├─ labels/
     └─ egomotion.offline/
            ├─ egomotion.offline.chunk_0000.parquet
            ├─ ...
            └─ egomotion.offline.chunk_xxxx.parquet
     └─ egomotion/
            ├─ egomotion.chunk_0000.parquet
            ├─ ...
            └─ egomotion.chunk_xxxx.parquet
     └─ obstacle.offline/
            ├─ obstacle.offline.chunk_0000.parquet
            ├─ ...
            └─ obstacle.offline.chunk_xxxx.parquet

Metadata

Contains feature_presence.parquet and data_collection.parquet files.

feature_presence.parquet: captures the sensor availability per clip.

data_collection.parquet: contains fields to filter clips by, e.g. country where clip was recorded, the month of the year and time of day.

Reasoning Labels

This directory contains human-verified reasoning labels for a curated subset of Out-of-Distribution (OOD) driving scenarios. This release focuses on a highly specific, high-fidelity subset purely based on OOD driving scenarios. These annotations complement the physical trajectory data described in the Labels section by providing action-grounded, temporally aligned Chain of Causation (CoC) data intended to support Vision-Language-Action (VLA) model evaluation. The labels were processed through a rigorous human-in-the-loop curation and quality-control pipeline to verify OOD validity, temporal alignment, and reasoning quality. More details on this process are documented in the dataset wiki.

The CoC reasoning labels are provided as a single Parquet file for efficient processing.

├─ reasoning/
     └─ ood_reasoning.parquet

Dataset Statistics

Split Clips Reasoning Labels Status Included Annotations
Train 1450 1728 Released Clip UUID, Event Cluster, Human-Refined CoC, Keyframes
Val 290 349 Released Clip UUID, Event Cluster, Human-Refined CoC, Keyframes
Test - - Held Out Reserved for OOD Benchmark Challenge
Total size
9.41 TB
Files
25,004
Last updated
May 21
Pre-warmed CDN
US EU US EU

Contributors