File size: 10,667 Bytes
718111c
 
 
 
 
 
 
f4ae520
718111c
 
d9375f3
 
 
 
 
a37133d
d9375f3
 
3999fc0
d9375f3
a37133d
d9375f3
 
 
 
 
 
 
 
 
3999fc0
 
 
d9375f3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c2ef920
d9375f3
f7a38cb
d9375f3
 
 
 
 
 
 
f4ae520
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
license: apache-2.0
tags:
- vision
- hyperspectral
- foundation-model
- image-segmentation
pipeline_tag: image-feature-extraction
---

# HyperVision: Channel-Adaptive Ground-Based Hyperspectral Vision Foundation Models

*This is the official repository for the paper "HyperVision: Channel-Adaptive Ground-Based Hyperspectral Vision Foundation Models".* 
https://arxiv.org/abs/2605.17286

---

πŸš€ **News:**
- **[2026-07-02]** Pre-trained checkpoints are now available! You can download them from [Hugging Face Checkpoints](https://huggingface.co/IronKitty/HyperVision/tree/main/checkpoints).

---

## πŸ“– Abstract

While hyperspectral imaging provides rich spatial-spectral information across hundreds of narrow wavelength bands for precise material identification, ground-based hyperspectral pre-trained backbones remain absent, constrained by varying spectral configurations across sensors, the scarcity and inconsistency of labels, and the limited scale and scene diversity of existing datasets. To address these challenges and enable universal perception, we propose HyperVision, the first ground-based hyperspectral pre-trained backbone. First, to handle varying spectral configurations, HyperVision adopts a channel-adaptive dynamic embedding mechanism to map heterogeneous inputs into a unified token space. Second, to address the scarcity and inconsistency of labels, we introduce a multi-source pseudo-labeling method that fuses semantic representations from both spatial structures generated by SAM2 and fine-grained spectral material information extracted by HyperFree. Third, to compensate for limited dataset scale and enrich scene diversity, a cross-modal knowledge distillation mechanism is utilized to transfer rich semantic representations from a pre-trained RGB vision model to our hyperspectral backbone. Pre-trained on a collection of 15k images from 26 diverse ground-based datasets, HyperVision demonstrates exceptional generalization. Requiring only efficient head-only adaptation without adjusting backbone parameters, it achieves state-of-the-art performance compared to task-specific methods across three downstream tasks under varying sensor configurations, yielding up to a 16.3% relative improvement in hyperspectral semantic segmentation $\text{Acc}_{\text{M}}$, a 2.1% relative gain in object tracking AUC, and a 35.5% reduction in salient object detection MAE. The source code and pre-trained model will be publicly available online.

## πŸ› οΈ Main Code (HyperVision)

The core architecture and model initialization utilities are located under the [HyperVision](./HyperVision) directory. 

To initialize the model, you can use the model builder functions from [HyperVision/build_HyperVision.py](./HyperVision/build_HyperVision.py). The pre-trained checkpoints can be downloaded from [Hugging Face Checkpoints](https://huggingface.co/IronKitty/HyperVision/tree/main/checkpoints).

We support different model configurations:
- **HyperVision-B**: `build_HyperVision_b`
- **HyperVision-L**: `build_HyperVision_l`
- **HyperVision-H** (Default): `build_HyperVision_h`

### Python Example:
```python
import torch
from HyperVision import build_HyperVision_h, HyperVision_Predictor
from hyperspectral_image_reader.read_dataset_image import load_hypervision_matrix
from hyperspectral_image_reader.hyperspectral_pipelines import LoadHyperspectralImage

device = "cuda" if torch.cuda.is_available() else "cpu"
image_size = 512

# 1. Build the HyperVision model (HyperVision-H configuration shown below)
model = build_HyperVision_h(
    checkpoint="path/to/hypervision_distilled_h.pth",             # Optional checkpoint path (.pth weights)
    image_size=512,                                         # Input resolution
    vit_patch_size=16,                                      # Patch size of the backbone
    encoder_global_attn_indexes=[15, 23, 31],               # Layers using global attention (can be adjusted for debugging/tuning)
    merge_indexs=[8, 32],                                   # Layers doing patch merging (can be adjusted for debugging/tuning)
    class_number=-1                                         # Number of classes for mask decoder
)
model = model.to(device)
model.eval()

# 2. Use HyperVision_Predictor to manage preprocessing and feature extraction
predictor = HyperVision_Predictor(model)

# 3. Load the preprocessed HSI image as a numpy array using load_hypervision_matrix
# Preprocessed shape: (H_ori, W_ori, C_hsi) with values in range [0, 255]
img_matrix = load_hypervision_matrix("path/to/image.mat", dataset_name="harvard")

# 4. Automatically retrieve the wavelengths list from the dataset pipeline config
loader = LoadHyperspectralImage(dataset_type="harvard")
wavelengths = loader.wavelengths

# 5. Call predictor.set_image to preprocess the image and extract features (calculating embeddings)
# GSD represents Ground Sampling Distance (defaults to 0.01)
with torch.no_grad():
    predictor.set_image(img_matrix, test_mode=False, spectral_lengths=wavelengths, GSD=0.01)

# 6. Extract multi-scale features from the predictor
multi_stage_features = predictor.multi_scale_features
```

---

## πŸ“Š Dataset Reading & Processing

We provide [read_dataset_image.py](./hyperspectral_image_reader/read_dataset_image.py) to read and preprocess hyperspectral images from various datasets. The script utilizes the custom data pipeline class `LoadHyperspectralImage` from [configs/hypervision/hyperspectral_pipelines.py](./configs/hypervision/hyperspectral_pipelines.py) to normalize the images (scaled to `[0, 255]`) and reshape them into the format expected by HyperVision: `(H_ori, W_ori, C_hsi)`.

### Python API Usage:
```python
from hyperspectral_image_reader.read_dataset_image import load_hypervision_matrix

# Load and preprocess HSI matrix
# dataset_name must be one of the supported dataset identifiers (e.g., 'harvard', 'arad_1k_31')
img_matrix = load_hypervision_matrix("path/to/image.mat", dataset_name="harvard")
print("Processed shape (H, W, C):", img_matrix.shape)
```

### Command Line Interface:
You can also run the script from the command line to load an HSI image and optionally save the processed matrix as a `.npy` file:
```bash
python hyperspectral_image_reader/read_dataset_image.py --path /path/to/image.mat --dataset harvard --output output.npy
```

---

## πŸ—ƒοΈ Supported Datasets

Below is a summary of the ground-based hyperspectral datasets supported by our pipeline, along with their respective keys to be used for the `--dataset` parameter:

| Dataset Name | `--dataset` Key | # Bands | Wavelengths | # Images | Expected Extension | File Loader Detail / Required Accompanying Files |
| :--- | :--- | :---: | :---: | :---: | :---: | :--- |
| **50 Outdoor** | `fiftyoutdoor` | 33 | 400–720 nm | 50 | `.mat` | Custom Mat loader |
| **Agricultural Plant** | `aphid` | 237 | 436–965 nm | 361 | `.npy` | NumPy array loader |
| **ARAD1K16** | `arad_1k_16` | 16 | 400–1000 nm | 950 | `.mat` | HDF5 H5PY loader |
| **ARAD1K31** | `arad_1k_31` | 31 | 400–700 nm | 949 | `.mat` | HDF5 H5PY loader |
| **CAVE** | `cave` | 31 | 400–700 nm | 32 | `.mat` | Mat loader |
| **DeepHS-NIR** | `deephsnir` | 252 | 950–1700 nm | 718 | `.bin` | ENVI binary (requires corresponding `.hdr` file in the same directory) |
| **DeepHS-VIS** | `deephsvis` | 224 | 400–1000 nm | 3405 | `.bin` | ENVI binary (requires corresponding `.hdr` file in the same directory) |
| **DeepHS-VISCOR** | `deephsviscor` | 249 | 400–1000 nm | 1566 | `.bin` | ENVI binary (requires corresponding `.hdr` file in the same directory) |
| **Harvard** | `harvard` | 31 | 420–720 nm | 77 | `.mat` | Mat loader |
| **HOT-2024-NIR** | `hotnir` | 25 | 665–960 nm | 477 | `.png` | PNG frame loader (requires corresponding false-color `.jpg` file) |
| **HOT-2024-RedNIR** | `hotrednir` | 15 | 600–850 nm | 348 | `.png` | PNG frame loader (requires corresponding false-color `.jpg` file) |
| **HOT-2024-VIS** | `hotvis` | 16 | 470–600 nm | 1070 | `.png` | PNG frame loader (requires corresponding false-color `.jpg` file) |
| **HSI Drive v2.0** | `hsidrive20` | 25 | 600–975 nm | 752 | `.npy` | NumPy loader (requires corresponding pseudocolor `.png` file) |
| **HSI Road** | `hsiroad` | 25 | 600–960 nm | 380 | `.tif` | TIF image loader |
| **HSODBIT v2** | `hsodbitv2` | 200 | 400–1000 nm | 500 | `.mat` | HDF5 H5PY loader (requires corresponding color `.jpg` file) |
| **HSSOD** | `hs_sod` | 81 | 380–720 nm | 60 | `.h5` | HDF5 H5PY loader (requires corresponding color `.jpg` file) |
| **HyKo v2-NIR** | `hykov2nir` | 25 | 600–975 nm | 78 | `.mat` | Mat loader |
| **HyKo v2-VIS** | `hykov2vis` | 16 | 470–630 nm | 163 | `.mat` | Mat loader |
| **HyperBlood** | `hyperblood` | 128 | 377–1046 nm | 14 | `.mat` | Custom Mat loader |
| **HyperDrive-VNIR** | `hyperdrivevnir` | 24 | 660–900 nm | 504 | `.npz` | NumPy archive (must contain `cube.npy` file inside) |
| **HyperspectralCity v2** | `hyperspectralcityv2` | 128 | 450–950 nm | 1330 | `.hsd` | HSD raw data loader |
| **ICVL** | `icvl` | 31 | 400–700 nm | 187 | `.h5` | HDF5 H5PY loader |
| **LIB-HSI** | `libhsi` | 204 | 400–1000 nm | 393 | `.hdr` | ENVI header (requires corresponding raw binary data `.raw`/`.dat` file) |
| **UM-EMM** | `umemm` | 33 | 400–720 nm | 3 | `.mat` | Mat loader |
| **UM-LD 2015** | `umld2015` | 33 | 400–720 nm | 20 | `.mat` | Mat loader |
| **UM-NS 2002** | `umns2002` | 31 | 410–710 nm | 8 | `.mat` | Mat loader |
| **UM-NS 2004** | `umns2004` | 33 | 400–720 nm | 10 | `.mat` | Mat loader |
| **UM-OS** | `umos` | 33 | 400–720 nm | 50 | `.mat` | Mat loader |
| **UM-RI 2015** | `umri2015` | 33 | 400–720 nm | 33 | `.mat` | Mat loader |
| **Virginia Tech Tree** | `virginia_tech_tree` | 420 | 400–1000 nm | 51 | `.hdr` | ENVI header (requires corresponding raw binary data file) |
| **Apple Fire Blight** | `vnihdhiatlimafb` | 204 | 400–1000 nm | 420 | `.hdr` | ENVI header (requires corresponding raw binary data file) |

---

## πŸ“ Citation

If you find our work or this project helpful, please consider citing our paper:

```bibtex
@misc{fu2026hypervision,
      title={HyperVision: A Channel-Adaptive Ground-Based Hyperspectral Vision Pre-trained Backbone}, 
      author={Guanyiman Fu and Jingtao Li and Zihang Cheng and Zhuanfeng Li and Diqi Chen and Yan Xu and Xiangyu Liu and Fengchao Xiong and Jianfeng Lu and Chengrong Chen and Jun Zhou},
      year={2026},
      eprint={2605.17286},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.17286}, 
}
```
*(Note: Citation details will be updated upon official publication)*