File size: 10,667 Bytes

---
license: apache-2.0
tags:
- vision
- hyperspectral
- foundation-model
- image-segmentation
pipeline_tag: image-feature-extraction
---

# HyperVision: Channel-Adaptive Ground-Based Hyperspectral Vision Foundation Models

*This is the official repository for the paper "HyperVision: Channel-Adaptive Ground-Based Hyperspectral Vision Foundation Models".* 
https://arxiv.org/abs/2605.17286

---

🚀 **News:**
- **[2026-07-02]** Pre-trained checkpoints are now available! You can download them from [Hugging Face Checkpoints](https://huggingface.co/IronKitty/HyperVision/tree/main/checkpoints).

---

## 📖 Abstract

While hyperspectral imaging provides rich spatial-spectral information across hundreds of narrow wavelength bands for precise material identification, ground-based hyperspectral pre-trained backbones remain absent, constrained by varying spectral configurations across sensors, the scarcity and inconsistency of labels, and the limited scale and scene diversity of existing datasets. To address these challenges and enable universal perception, we propose HyperVision, the first ground-based hyperspectral pre-trained backbone. First, to handle varying spectral configurations, HyperVision adopts a channel-adaptive dynamic embedding mechanism to map heterogeneous inputs into a unified token space. Second, to address the scarcity and inconsistency of labels, we introduce a multi-source pseudo-labeling method that fuses semantic representations from both spatial structures generated by SAM2 and fine-grained spectral material information extracted by HyperFree. Third, to compensate for limited dataset scale and enrich scene diversity, a cross-modal knowledge distillation mechanism is utilized to transfer rich semantic representations from a pre-trained RGB vision model to our hyperspectral backbone. Pre-trained on a collection of 15k images from 26 diverse ground-based datasets, HyperVision demonstrates exceptional generalization. Requiring only efficient head-only adaptation without adjusting backbone parameters, it achieves state-of-the-art performance compared to task-specific methods across three downstream tasks under varying sensor configurations, yielding up to a 16.3% relative improvement in hyperspectral semantic segmentation $\text{Acc}_{\text{M}}$, a 2.1% relative gain in object tracking AUC, and a 35.5% reduction in salient object detection MAE. The source code and pre-trained model will be publicly available online.

## 🛠️ Main Code (HyperVision)

The core architecture and model initialization utilities are located under the [HyperVision](./HyperVision) directory. 

To initialize the model, you can use the model builder functions from [HyperVision/build_HyperVision.py](./HyperVision/build_HyperVision.py). The pre-trained checkpoints can be downloaded from [Hugging Face Checkpoints](https://huggingface.co/IronKitty/HyperVision/tree/main/checkpoints).

We support different model configurations:
- **HyperVision-B**: `build_HyperVision_b`
- **HyperVision-L**: `build_HyperVision_l`
- **HyperVision-H** (Default): `build_HyperVision_h`

### Python Example:
```python
import torch
from HyperVision import build_HyperVision_h, HyperVision_Predictor
from hyperspectral_image_reader.read_dataset_image import load_hypervision_matrix
from hyperspectral_image_reader.hyperspectral_pipelines import LoadHyperspectralImage

device = "cuda" if torch.cuda.is_available() else "cpu"
image_size = 512

# 1. Build the HyperVision model (HyperVision-H configuration shown below)
model = build_HyperVision_h(
    checkpoint="path/to/hypervision_distilled_h.pth",             # Optional checkpoint path (.pth weights)
    image_size=512,                                         # Input resolution
    vit_patch_size=16,                                      # Patch size of the backbone
    encoder_global_attn_indexes=[15, 23, 31],               # Layers using global attention (can be adjusted for debugging/tuning)
    merge_indexs=[8, 32],                                   # Layers doing patch merging (can be adjusted for debugging/tuning)
    class_number=-1                                         # Number of classes for mask decoder
)
model = model.to(device)
model.eval()

# 2. Use HyperVision_Predictor to manage preprocessing and feature extraction
predictor = HyperVision_Predictor(model)

# 3. Load the preprocessed HSI image as a numpy array using load_hypervision_matrix
# Preprocessed shape: (H_ori, W_ori, C_hsi) with values in range [0, 255]
img_matrix = load_hypervision_matrix("path/to/image.mat", dataset_name="harvard")

# 4. Automatically retrieve the wavelengths list from the dataset pipeline config
loader = LoadHyperspectralImage(dataset_type="harvard")
wavelengths = loader.wavelengths

# 5. Call predictor.set_image to preprocess the image and extract features (calculating embeddings)
# GSD represents Ground Sampling Distance (defaults to 0.01)
with torch.no_grad():
    predictor.set_image(img_matrix, test_mode=False, spectral_lengths=wavelengths, GSD=0.01)

# 6. Extract multi-scale features from the predictor
multi_stage_features = predictor.multi_scale_features
```

---

## 📊 Dataset Reading & Processing

We provide [read_dataset_image.py](./hyperspectral_image_reader/read_dataset_image.py) to read and preprocess hyperspectral images from various datasets. The script utilizes the custom data pipeline class `LoadHyperspectralImage` from [configs/hypervision/hyperspectral_pipelines.py](./configs/hypervision/hyperspectral_pipelines.py) to normalize the images (scaled to `[0, 255]`) and reshape them into the format expected by HyperVision: `(H_ori, W_ori, C_hsi)`.

### Python API Usage:
```python
from hyperspectral_image_reader.read_dataset_image import load_hypervision_matrix

# Load and preprocess HSI matrix
# dataset_name must be one of the supported dataset identifiers (e.g., 'harvard', 'arad_1k_31')
img_matrix = load_hypervision_matrix("path/to/image.mat", dataset_name="harvard")
print("Processed shape (H, W, C):", img_matrix.shape)
```

### Command Line Interface:
You can also run the script from the command line to load an HSI image and optionally save the processed matrix as a `.npy` file:
```bash
python hyperspectral_image_reader/read_dataset_image.py --path /path/to/image.mat --dataset harvard --output output.npy
```

---

## 🗃️ Supported Datasets

Below is a summary of the ground-based hyperspectral datasets supported by our pipeline, along with their respective keys to be used for the `--dataset` parameter:

| Dataset Name | `--dataset` Key | # Bands | Wavelengths | # Images | Expected Extension | File Loader Detail / Required Accompanying Files |
| :--- | :--- | :---: | :---: | :---: | :---: | :--- |
| **50 Outdoor** | `fiftyoutdoor` | 33 | 400–720 nm | 50 | `.mat` | Custom Mat loader |
| **Agricultural Plant** | `aphid` | 237 | 436–965 nm | 361 | `.npy` | NumPy array loader |
| **ARAD1K16** | `arad_1k_16` | 16 | 400–1000 nm | 950 | `.mat` | HDF5 H5PY loader |
| **ARAD1K31** | `arad_1k_31` | 31 | 400–700 nm | 949 | `.mat` | HDF5 H5PY loader |
| **CAVE** | `cave` | 31 | 400–700 nm | 32 | `.mat` | Mat loader |
| **DeepHS-NIR** | `deephsnir` | 252 | 950–1700 nm | 718 | `.bin` | ENVI binary (requires corresponding `.hdr` file in the same directory) |
| **DeepHS-VIS** | `deephsvis` | 224 | 400–1000 nm | 3405 | `.bin` | ENVI binary (requires corresponding `.hdr` file in the same directory) |
| **DeepHS-VISCOR** | `deephsviscor` | 249 | 400–1000 nm | 1566 | `.bin` | ENVI binary (requires corresponding `.hdr` file in the same directory) |
| **Harvard** | `harvard` | 31 | 420–720 nm | 77 | `.mat` | Mat loader |
| **HOT-2024-NIR** | `hotnir` | 25 | 665–960 nm | 477 | `.png` | PNG frame loader (requires corresponding false-color `.jpg` file) |
| **HOT-2024-RedNIR** | `hotrednir` | 15 | 600–850 nm | 348 | `.png` | PNG frame loader (requires corresponding false-color `.jpg` file) |
| **HOT-2024-VIS** | `hotvis` | 16 | 470–600 nm | 1070 | `.png` | PNG frame loader (requires corresponding false-color `.jpg` file) |
| **HSI Drive v2.0** | `hsidrive20` | 25 | 600–975 nm | 752 | `.npy` | NumPy loader (requires corresponding pseudocolor `.png` file) |
| **HSI Road** | `hsiroad` | 25 | 600–960 nm | 380 | `.tif` | TIF image loader |
| **HSODBIT v2** | `hsodbitv2` | 200 | 400–1000 nm | 500 | `.mat` | HDF5 H5PY loader (requires corresponding color `.jpg` file) |
| **HSSOD** | `hs_sod` | 81 | 380–720 nm | 60 | `.h5` | HDF5 H5PY loader (requires corresponding color `.jpg` file) |
| **HyKo v2-NIR** | `hykov2nir` | 25 | 600–975 nm | 78 | `.mat` | Mat loader |
| **HyKo v2-VIS** | `hykov2vis` | 16 | 470–630 nm | 163 | `.mat` | Mat loader |
| **HyperBlood** | `hyperblood` | 128 | 377–1046 nm | 14 | `.mat` | Custom Mat loader |
| **HyperDrive-VNIR** | `hyperdrivevnir` | 24 | 660–900 nm | 504 | `.npz` | NumPy archive (must contain `cube.npy` file inside) |
| **HyperspectralCity v2** | `hyperspectralcityv2` | 128 | 450–950 nm | 1330 | `.hsd` | HSD raw data loader |
| **ICVL** | `icvl` | 31 | 400–700 nm | 187 | `.h5` | HDF5 H5PY loader |
| **LIB-HSI** | `libhsi` | 204 | 400–1000 nm | 393 | `.hdr` | ENVI header (requires corresponding raw binary data `.raw`/`.dat` file) |
| **UM-EMM** | `umemm` | 33 | 400–720 nm | 3 | `.mat` | Mat loader |
| **UM-LD 2015** | `umld2015` | 33 | 400–720 nm | 20 | `.mat` | Mat loader |
| **UM-NS 2002** | `umns2002` | 31 | 410–710 nm | 8 | `.mat` | Mat loader |
| **UM-NS 2004** | `umns2004` | 33 | 400–720 nm | 10 | `.mat` | Mat loader |
| **UM-OS** | `umos` | 33 | 400–720 nm | 50 | `.mat` | Mat loader |
| **UM-RI 2015** | `umri2015` | 33 | 400–720 nm | 33 | `.mat` | Mat loader |
| **Virginia Tech Tree** | `virginia_tech_tree` | 420 | 400–1000 nm | 51 | `.hdr` | ENVI header (requires corresponding raw binary data file) |
| **Apple Fire Blight** | `vnihdhiatlimafb` | 204 | 400–1000 nm | 420 | `.hdr` | ENVI header (requires corresponding raw binary data file) |

---

## 📝 Citation

If you find our work or this project helpful, please consider citing our paper:

```bibtex
@misc{fu2026hypervision,
      title={HyperVision: A Channel-Adaptive Ground-Based Hyperspectral Vision Pre-trained Backbone}, 
      author={Guanyiman Fu and Jingtao Li and Zihang Cheng and Zhuanfeng Li and Diqi Chen and Yan Xu and Xiangyu Liu and Fengchao Xiong and Jianfeng Lu and Chengrong Chen and Jun Zhou},
      year={2026},
      eprint={2605.17286},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.17286}, 
}
```
*(Note: Citation details will be updated upon official publication)*