license: apache-2.0
tags:
- vision
- hyperspectral
- foundation-model
- image-segmentation
pipeline_tag: image-feature-extraction
HyperVision: Channel-Adaptive Ground-Based Hyperspectral Vision Foundation Models
This is the official repository for the paper "HyperVision: Channel-Adaptive Ground-Based Hyperspectral Vision Foundation Models". https://arxiv.org/abs/2605.17286
π News:
- [2026-07-02] Pre-trained checkpoints are now available! You can download them from Hugging Face Checkpoints.
π Abstract
While hyperspectral imaging provides rich spatial-spectral information across hundreds of narrow wavelength bands for precise material identification, ground-based hyperspectral pre-trained backbones remain absent, constrained by varying spectral configurations across sensors, the scarcity and inconsistency of labels, and the limited scale and scene diversity of existing datasets. To address these challenges and enable universal perception, we propose HyperVision, the first ground-based hyperspectral pre-trained backbone. First, to handle varying spectral configurations, HyperVision adopts a channel-adaptive dynamic embedding mechanism to map heterogeneous inputs into a unified token space. Second, to address the scarcity and inconsistency of labels, we introduce a multi-source pseudo-labeling method that fuses semantic representations from both spatial structures generated by SAM2 and fine-grained spectral material information extracted by HyperFree. Third, to compensate for limited dataset scale and enrich scene diversity, a cross-modal knowledge distillation mechanism is utilized to transfer rich semantic representations from a pre-trained RGB vision model to our hyperspectral backbone. Pre-trained on a collection of 15k images from 26 diverse ground-based datasets, HyperVision demonstrates exceptional generalization. Requiring only efficient head-only adaptation without adjusting backbone parameters, it achieves state-of-the-art performance compared to task-specific methods across three downstream tasks under varying sensor configurations, yielding up to a 16.3% relative improvement in hyperspectral semantic segmentation $\text{Acc}_{\text{M}}$, a 2.1% relative gain in object tracking AUC, and a 35.5% reduction in salient object detection MAE. The source code and pre-trained model will be publicly available online.
π οΈ Main Code (HyperVision)
The core architecture and model initialization utilities are located under the HyperVision directory.
To initialize the model, you can use the model builder functions from HyperVision/build_HyperVision.py. The pre-trained checkpoints can be downloaded from Hugging Face Checkpoints.
We support different model configurations:
- HyperVision-B:
build_HyperVision_b - HyperVision-L:
build_HyperVision_l - HyperVision-H (Default):
build_HyperVision_h
Python Example:
import torch
from HyperVision import build_HyperVision_h, HyperVision_Predictor
from hyperspectral_image_reader.read_dataset_image import load_hypervision_matrix
from hyperspectral_image_reader.hyperspectral_pipelines import LoadHyperspectralImage
device = "cuda" if torch.cuda.is_available() else "cpu"
image_size = 512
# 1. Build the HyperVision model (HyperVision-H configuration shown below)
model = build_HyperVision_h(
checkpoint="path/to/hypervision_distilled_h.pth", # Optional checkpoint path (.pth weights)
image_size=512, # Input resolution
vit_patch_size=16, # Patch size of the backbone
encoder_global_attn_indexes=[15, 23, 31], # Layers using global attention (can be adjusted for debugging/tuning)
merge_indexs=[8, 32], # Layers doing patch merging (can be adjusted for debugging/tuning)
class_number=-1 # Number of classes for mask decoder
)
model = model.to(device)
model.eval()
# 2. Use HyperVision_Predictor to manage preprocessing and feature extraction
predictor = HyperVision_Predictor(model)
# 3. Load the preprocessed HSI image as a numpy array using load_hypervision_matrix
# Preprocessed shape: (H_ori, W_ori, C_hsi) with values in range [0, 255]
img_matrix = load_hypervision_matrix("path/to/image.mat", dataset_name="harvard")
# 4. Automatically retrieve the wavelengths list from the dataset pipeline config
loader = LoadHyperspectralImage(dataset_type="harvard")
wavelengths = loader.wavelengths
# 5. Call predictor.set_image to preprocess the image and extract features (calculating embeddings)
# GSD represents Ground Sampling Distance (defaults to 0.01)
with torch.no_grad():
predictor.set_image(img_matrix, test_mode=False, spectral_lengths=wavelengths, GSD=0.01)
# 6. Extract multi-scale features from the predictor
multi_stage_features = predictor.multi_scale_features
π Dataset Reading & Processing
We provide read_dataset_image.py to read and preprocess hyperspectral images from various datasets. The script utilizes the custom data pipeline class LoadHyperspectralImage from configs/hypervision/hyperspectral_pipelines.py to normalize the images (scaled to [0, 255]) and reshape them into the format expected by HyperVision: (H_ori, W_ori, C_hsi).
Python API Usage:
from hyperspectral_image_reader.read_dataset_image import load_hypervision_matrix
# Load and preprocess HSI matrix
# dataset_name must be one of the supported dataset identifiers (e.g., 'harvard', 'arad_1k_31')
img_matrix = load_hypervision_matrix("path/to/image.mat", dataset_name="harvard")
print("Processed shape (H, W, C):", img_matrix.shape)
Command Line Interface:
You can also run the script from the command line to load an HSI image and optionally save the processed matrix as a .npy file:
python hyperspectral_image_reader/read_dataset_image.py --path /path/to/image.mat --dataset harvard --output output.npy
ποΈ Supported Datasets
Below is a summary of the ground-based hyperspectral datasets supported by our pipeline, along with their respective keys to be used for the --dataset parameter:
| Dataset Name | --dataset Key |
# Bands | Wavelengths | # Images | Expected Extension | File Loader Detail / Required Accompanying Files |
|---|---|---|---|---|---|---|
| 50 Outdoor | fiftyoutdoor |
33 | 400β720 nm | 50 | .mat |
Custom Mat loader |
| Agricultural Plant | aphid |
237 | 436β965 nm | 361 | .npy |
NumPy array loader |
| ARAD1K16 | arad_1k_16 |
16 | 400β1000 nm | 950 | .mat |
HDF5 H5PY loader |
| ARAD1K31 | arad_1k_31 |
31 | 400β700 nm | 949 | .mat |
HDF5 H5PY loader |
| CAVE | cave |
31 | 400β700 nm | 32 | .mat |
Mat loader |
| DeepHS-NIR | deephsnir |
252 | 950β1700 nm | 718 | .bin |
ENVI binary (requires corresponding .hdr file in the same directory) |
| DeepHS-VIS | deephsvis |
224 | 400β1000 nm | 3405 | .bin |
ENVI binary (requires corresponding .hdr file in the same directory) |
| DeepHS-VISCOR | deephsviscor |
249 | 400β1000 nm | 1566 | .bin |
ENVI binary (requires corresponding .hdr file in the same directory) |
| Harvard | harvard |
31 | 420β720 nm | 77 | .mat |
Mat loader |
| HOT-2024-NIR | hotnir |
25 | 665β960 nm | 477 | .png |
PNG frame loader (requires corresponding false-color .jpg file) |
| HOT-2024-RedNIR | hotrednir |
15 | 600β850 nm | 348 | .png |
PNG frame loader (requires corresponding false-color .jpg file) |
| HOT-2024-VIS | hotvis |
16 | 470β600 nm | 1070 | .png |
PNG frame loader (requires corresponding false-color .jpg file) |
| HSI Drive v2.0 | hsidrive20 |
25 | 600β975 nm | 752 | .npy |
NumPy loader (requires corresponding pseudocolor .png file) |
| HSI Road | hsiroad |
25 | 600β960 nm | 380 | .tif |
TIF image loader |
| HSODBIT v2 | hsodbitv2 |
200 | 400β1000 nm | 500 | .mat |
HDF5 H5PY loader (requires corresponding color .jpg file) |
| HSSOD | hs_sod |
81 | 380β720 nm | 60 | .h5 |
HDF5 H5PY loader (requires corresponding color .jpg file) |
| HyKo v2-NIR | hykov2nir |
25 | 600β975 nm | 78 | .mat |
Mat loader |
| HyKo v2-VIS | hykov2vis |
16 | 470β630 nm | 163 | .mat |
Mat loader |
| HyperBlood | hyperblood |
128 | 377β1046 nm | 14 | .mat |
Custom Mat loader |
| HyperDrive-VNIR | hyperdrivevnir |
24 | 660β900 nm | 504 | .npz |
NumPy archive (must contain cube.npy file inside) |
| HyperspectralCity v2 | hyperspectralcityv2 |
128 | 450β950 nm | 1330 | .hsd |
HSD raw data loader |
| ICVL | icvl |
31 | 400β700 nm | 187 | .h5 |
HDF5 H5PY loader |
| LIB-HSI | libhsi |
204 | 400β1000 nm | 393 | .hdr |
ENVI header (requires corresponding raw binary data .raw/.dat file) |
| UM-EMM | umemm |
33 | 400β720 nm | 3 | .mat |
Mat loader |
| UM-LD 2015 | umld2015 |
33 | 400β720 nm | 20 | .mat |
Mat loader |
| UM-NS 2002 | umns2002 |
31 | 410β710 nm | 8 | .mat |
Mat loader |
| UM-NS 2004 | umns2004 |
33 | 400β720 nm | 10 | .mat |
Mat loader |
| UM-OS | umos |
33 | 400β720 nm | 50 | .mat |
Mat loader |
| UM-RI 2015 | umri2015 |
33 | 400β720 nm | 33 | .mat |
Mat loader |
| Virginia Tech Tree | virginia_tech_tree |
420 | 400β1000 nm | 51 | .hdr |
ENVI header (requires corresponding raw binary data file) |
| Apple Fire Blight | vnihdhiatlimafb |
204 | 400β1000 nm | 420 | .hdr |
ENVI header (requires corresponding raw binary data file) |
π Citation
If you find our work or this project helpful, please consider citing our paper:
@misc{fu2026hypervision,
title={HyperVision: A Channel-Adaptive Ground-Based Hyperspectral Vision Pre-trained Backbone},
author={Guanyiman Fu and Jingtao Li and Zihang Cheng and Zhuanfeng Li and Diqi Chen and Yan Xu and Xiangyu Liu and Fengchao Xiong and Jianfeng Lu and Chengrong Chen and Jun Zhou},
year={2026},
eprint={2605.17286},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.17286},
}
(Note: Citation details will be updated upon official publication)