HyperVision: Channel-Adaptive Ground-Based Hyperspectral Vision Foundation Models

This is the official repository for the paper "HyperVision: Channel-Adaptive Ground-Based Hyperspectral Vision Foundation Models". https://arxiv.org/abs/2605.17286


πŸš€ News:


πŸ“– Abstract

While hyperspectral imaging provides rich spatial-spectral information across hundreds of narrow wavelength bands for precise material identification, ground-based hyperspectral pre-trained backbones remain absent, constrained by varying spectral configurations across sensors, the scarcity and inconsistency of labels, and the limited scale and scene diversity of existing datasets. To address these challenges and enable universal perception, we propose HyperVision, the first ground-based hyperspectral pre-trained backbone. First, to handle varying spectral configurations, HyperVision adopts a channel-adaptive dynamic embedding mechanism to map heterogeneous inputs into a unified token space. Second, to address the scarcity and inconsistency of labels, we introduce a multi-source pseudo-labeling method that fuses semantic representations from both spatial structures generated by SAM2 and fine-grained spectral material information extracted by HyperFree. Third, to compensate for limited dataset scale and enrich scene diversity, a cross-modal knowledge distillation mechanism is utilized to transfer rich semantic representations from a pre-trained RGB vision model to our hyperspectral backbone. Pre-trained on a collection of 15k images from 26 diverse ground-based datasets, HyperVision demonstrates exceptional generalization. Requiring only efficient head-only adaptation without adjusting backbone parameters, it achieves state-of-the-art performance compared to task-specific methods across three downstream tasks under varying sensor configurations, yielding up to a 16.3% relative improvement in hyperspectral semantic segmentation $\text{Acc}_{\text{M}}$, a 2.1% relative gain in object tracking AUC, and a 35.5% reduction in salient object detection MAE. The source code and pre-trained model will be publicly available online.

πŸ› οΈ Main Code (HyperVision)

The core architecture and model initialization utilities are located under the HyperVision directory.

To initialize the model, you can use the model builder functions from HyperVision/build_HyperVision.py. The pre-trained checkpoints can be downloaded from Hugging Face Checkpoints.

We support different model configurations:

  • HyperVision-B: build_HyperVision_b
  • HyperVision-L: build_HyperVision_l
  • HyperVision-H (Default): build_HyperVision_h

Python Example:

import torch
from HyperVision import build_HyperVision_h, HyperVision_Predictor
from hyperspectral_image_reader.read_dataset_image import load_hypervision_matrix
from hyperspectral_image_reader.hyperspectral_pipelines import LoadHyperspectralImage

device = "cuda" if torch.cuda.is_available() else "cpu"
image_size = 512

# 1. Build the HyperVision model (HyperVision-H configuration shown below)
model = build_HyperVision_h(
    checkpoint="path/to/hypervision_distilled_h.pth",             # Optional checkpoint path (.pth weights)
    image_size=512,                                         # Input resolution
    vit_patch_size=16,                                      # Patch size of the backbone
    encoder_global_attn_indexes=[15, 23, 31],               # Layers using global attention (can be adjusted for debugging/tuning)
    merge_indexs=[8, 32],                                   # Layers doing patch merging (can be adjusted for debugging/tuning)
    class_number=-1                                         # Number of classes for mask decoder
)
model = model.to(device)
model.eval()

# 2. Use HyperVision_Predictor to manage preprocessing and feature extraction
predictor = HyperVision_Predictor(model)

# 3. Load the preprocessed HSI image as a numpy array using load_hypervision_matrix
# Preprocessed shape: (H_ori, W_ori, C_hsi) with values in range [0, 255]
img_matrix = load_hypervision_matrix("path/to/image.mat", dataset_name="harvard")

# 4. Automatically retrieve the wavelengths list from the dataset pipeline config
loader = LoadHyperspectralImage(dataset_type="harvard")
wavelengths = loader.wavelengths

# 5. Call predictor.set_image to preprocess the image and extract features (calculating embeddings)
# GSD represents Ground Sampling Distance (defaults to 0.01)
with torch.no_grad():
    predictor.set_image(img_matrix, test_mode=False, spectral_lengths=wavelengths, GSD=0.01)

# 6. Extract multi-scale features from the predictor
multi_stage_features = predictor.multi_scale_features

πŸ“Š Dataset Reading & Processing

We provide read_dataset_image.py to read and preprocess hyperspectral images from various datasets. The script utilizes the custom data pipeline class LoadHyperspectralImage from configs/hypervision/hyperspectral_pipelines.py to normalize the images (scaled to [0, 255]) and reshape them into the format expected by HyperVision: (H_ori, W_ori, C_hsi).

Python API Usage:

from hyperspectral_image_reader.read_dataset_image import load_hypervision_matrix

# Load and preprocess HSI matrix
# dataset_name must be one of the supported dataset identifiers (e.g., 'harvard', 'arad_1k_31')
img_matrix = load_hypervision_matrix("path/to/image.mat", dataset_name="harvard")
print("Processed shape (H, W, C):", img_matrix.shape)

Command Line Interface:

You can also run the script from the command line to load an HSI image and optionally save the processed matrix as a .npy file:

python hyperspectral_image_reader/read_dataset_image.py --path /path/to/image.mat --dataset harvard --output output.npy

πŸ—ƒοΈ Supported Datasets

Below is a summary of the ground-based hyperspectral datasets supported by our pipeline, along with their respective keys to be used for the --dataset parameter:

Dataset Name --dataset Key # Bands Wavelengths # Images Expected Extension File Loader Detail / Required Accompanying Files
50 Outdoor fiftyoutdoor 33 400–720 nm 50 .mat Custom Mat loader
Agricultural Plant aphid 237 436–965 nm 361 .npy NumPy array loader
ARAD1K16 arad_1k_16 16 400–1000 nm 950 .mat HDF5 H5PY loader
ARAD1K31 arad_1k_31 31 400–700 nm 949 .mat HDF5 H5PY loader
CAVE cave 31 400–700 nm 32 .mat Mat loader
DeepHS-NIR deephsnir 252 950–1700 nm 718 .bin ENVI binary (requires corresponding .hdr file in the same directory)
DeepHS-VIS deephsvis 224 400–1000 nm 3405 .bin ENVI binary (requires corresponding .hdr file in the same directory)
DeepHS-VISCOR deephsviscor 249 400–1000 nm 1566 .bin ENVI binary (requires corresponding .hdr file in the same directory)
Harvard harvard 31 420–720 nm 77 .mat Mat loader
HOT-2024-NIR hotnir 25 665–960 nm 477 .png PNG frame loader (requires corresponding false-color .jpg file)
HOT-2024-RedNIR hotrednir 15 600–850 nm 348 .png PNG frame loader (requires corresponding false-color .jpg file)
HOT-2024-VIS hotvis 16 470–600 nm 1070 .png PNG frame loader (requires corresponding false-color .jpg file)
HSI Drive v2.0 hsidrive20 25 600–975 nm 752 .npy NumPy loader (requires corresponding pseudocolor .png file)
HSI Road hsiroad 25 600–960 nm 380 .tif TIF image loader
HSODBIT v2 hsodbitv2 200 400–1000 nm 500 .mat HDF5 H5PY loader (requires corresponding color .jpg file)
HSSOD hs_sod 81 380–720 nm 60 .h5 HDF5 H5PY loader (requires corresponding color .jpg file)
HyKo v2-NIR hykov2nir 25 600–975 nm 78 .mat Mat loader
HyKo v2-VIS hykov2vis 16 470–630 nm 163 .mat Mat loader
HyperBlood hyperblood 128 377–1046 nm 14 .mat Custom Mat loader
HyperDrive-VNIR hyperdrivevnir 24 660–900 nm 504 .npz NumPy archive (must contain cube.npy file inside)
HyperspectralCity v2 hyperspectralcityv2 128 450–950 nm 1330 .hsd HSD raw data loader
ICVL icvl 31 400–700 nm 187 .h5 HDF5 H5PY loader
LIB-HSI libhsi 204 400–1000 nm 393 .hdr ENVI header (requires corresponding raw binary data .raw/.dat file)
UM-EMM umemm 33 400–720 nm 3 .mat Mat loader
UM-LD 2015 umld2015 33 400–720 nm 20 .mat Mat loader
UM-NS 2002 umns2002 31 410–710 nm 8 .mat Mat loader
UM-NS 2004 umns2004 33 400–720 nm 10 .mat Mat loader
UM-OS umos 33 400–720 nm 50 .mat Mat loader
UM-RI 2015 umri2015 33 400–720 nm 33 .mat Mat loader
Virginia Tech Tree virginia_tech_tree 420 400–1000 nm 51 .hdr ENVI header (requires corresponding raw binary data file)
Apple Fire Blight vnihdhiatlimafb 204 400–1000 nm 420 .hdr ENVI header (requires corresponding raw binary data file)

πŸ“ Citation

If you find our work or this project helpful, please consider citing our paper:

@misc{fu2026hypervision,
      title={HyperVision: A Channel-Adaptive Ground-Based Hyperspectral Vision Pre-trained Backbone}, 
      author={Guanyiman Fu and Jingtao Li and Zihang Cheng and Zhuanfeng Li and Diqi Chen and Yan Xu and Xiangyu Liu and Fengchao Xiong and Jianfeng Lu and Chengrong Chen and Jun Zhou},
      year={2026},
      eprint={2605.17286},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.17286}, 
}

(Note: Citation details will be updated upon official publication)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for IronKitty/HyperVision