YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
UniSHARP:
Universal Sharp Monocular View Synthesis
Meixi Song1 Β·
Dizhe Zhang1,* Β·
Hao Ren1 Β·
Ruiyang Zhang1 Β·
Bo Du2 Β·
Ming-Hsuan Yang3 Β·
Lu Qi1,2,*
1Insta360 Research Β· 2Wuhan University Β· 3University of California, Merced
UniSHARP extends SHARP-style photorealistic monocular view synthesis to universal camera systems. Given a single image from a perspective, wide-FoV, fisheye, or panoramic camera, UniSHARP predicts a 3D Gaussian representation and renders high-quality novel views.
π¨ Installation
Clone this repository and enter the project directory:
git clone https://github.com/Insta360-Research-Team/UniSHARP.git
cd Unisharp
Create a fresh conda environment:
conda create -n unisharp python=3.12 -y
conda activate unisharp
Install PyTorch for your CUDA version. The code was smoke-tested with PyTorch 2.8 and torchvision 0.23:
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0
Install the remaining Python dependencies:
pip install -r requirements.txt
π§© External Dependencies
UniK3D
UniSHARP uses UniK3D for universal camera ray and feature prediction. Clone the official repository into Unisharp/UniK3D:
git clone https://github.com/lpiccinelli-eth/UniK3D.git UniK3D
3DGEER
Fisheye rendering depends on the GEER CUDA rasterizer from 3DGEER. Clone the repository into Unisharp/3dgeer:
git clone https://github.com/boschresearch/3dgeer.git 3dgeer
If you only use perspective or panoramic inference, the GEER rasterizer may not be needed. It is required for fisheye rendering paths.
πΌοΈ Dataset
The released dataset is hosted on Hugging Face:
- Dataset: Insta360-Research/OmniRooms
- Training manifests: Insta360-Research/OmniRooms/manifests/train
- Validation manifests: Insta360-Research/OmniRooms/manifests/validation
OmniRooms is a panoramic simulation dataset highly suitable for 3D reconstruction, especially for 3DGS tasks. It consists of 16 large indoor scenes, each containing multiple rooms, and 300k RGB images covering both small and large pose movements with corresponding depth information. OmniRooms is collected via AirSim, with OmniRooms-Wide derived by projecting these panoramas into 130-degree equidistant fisheye views. For each anchor point on a 0.5 m voxel grid, we render one central camera and 29 cameras randomly sampled within a local axis-aligned 30 cm cube centered on the source camera. To isolate translation-induced synthesis, all cameras share a fixed orientation. Each frame is rendered as a 1024 x 2048 ERP image.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
The code supports the following data sources and manifest aliases:
RealEstate10KHM3DOmniRoomsOmniRooms-WideWildRGB-DDL3DVScanNet++ FisheyeReplica, andTanks and Templesfor validation-only protocols
Training manifests use the names released under manifests/train:
dataset_manifests/
βββ re10k_train_chunks.txt
βββ hm3d_train_scenes.txt
βββ omnirooms.txt
βββ wildrgbd_train_scenes.txt
βββ dl3dv_train_scenes.txt
βββ scanetpp_fisheye_train_scenes.txt
Validation manifests use the names released under manifests/validation:
validation_manifests/
βββ re10k.txt
βββ dl3dv.txt
βββ hm3d.txt
βββ omnirooms.txt
βββ omnirooms_wide.txt
βββ wildrgbd.txt
βββ scanetpp_fisheye.txt
βββ replica.txt
βββ tat.txt
π€ Checkpoints
Training starts UniSHARP heads from scratch and loads the original pretrained UniK3D weights through the UniK3D loader. The official launcher does not resume from a previous UniSHARP checkpoint by default.
Released UniSHARP checkpoints are available at Insta360-Research/Unisharp. Place a checkpoint anywhere on disk and pass the path to validation or inference:
CHECKPOINT=/path/to/pretained_model.pt
π Training
Use the official gt-override training launcher:
bash scripts/train.sh
Training outputs are saved under:
outputs/<run_name>/
βββ config.json
βββ losses.csv
βββ step_XXXXXXX.pt
βββ vis/
π Validation
Run validation with a checkpoint:
bash scripts/validate_unisharp.sh /path/to/step_XXXXXXX.pt
π Inference
Run single-image inference:
python scripts/infer_unisharp.py \
--checkpoint /path/to/step_XXXXXXX.pt \
--image /path/to/image.jpg \
--out-dir outputs/inference
Run a directory or image list:
python scripts/infer_unisharp.py \
--checkpoint /path/to/step_XXXXXXX.pt \
--image-dir /path/to/images \
--out-dir outputs/inference
If calibrated camera parameters are available, pass them through a JSON file. Without this file, the script predicts rays with UniK3D and fits the camera parameters automatically.
Example perspective camera JSON:
{
"camera": "perspective",
"intrinsics": {
"fx": 820.0,
"fy": 820.0,
"cx": 512.0,
"cy": 384.0
}
}
python scripts/infer_unisharp.py \
--checkpoint /path/to/step_XXXXXXX.pt \
--image /path/to/perspective.jpg \
--camera-json /path/to/perspective_camera.json
Example Fisheye624 camera JSON:
{
"camera": "fisheye",
"camera_params": [820.0, 820.0, 512.0, 384.0, 0.01, -0.001, 0.0, 0.0]
}
python scripts/infer_unisharp.py \
--checkpoint /path/to/step_XXXXXXX.pt \
--image /path/to/fisheye.jpg \
--camera-json /path/to/fisheye_camera.json
For batched inference, the JSON can also contain per-image entries:
{
"default": {
"camera": "perspective",
"intrinsics": [820.0, 820.0, 512.0, 384.0]
},
"images": {
"panorama.jpg": {
"camera": "panorama"
},
"fisheye.jpg": {
"camera": "fisheye",
"camera_params": [820.0, 820.0, 512.0, 384.0, 0.01, -0.001, 0.0, 0.0]
}
}
}
π Acknowledgement
This project builds on open-source work from:
- SHARP for monocular Gaussian view synthesis
- UniK3D for universal camera geometry and features
- 3DGEER for generic-camera Gaussian rasterization
- gsplat for Gaussian splatting utilities
π Citation
@article{song2026unisharp,
title={UniSHARP: Universal Sharp Monocular View Synthesis},
author={Song, Meixi and Zhang, Dizhe and Ren, Hao and Zhang, Ruiyang and Du, Bo and Yang, Ming-Hsuan and Qi, Lu},
journal={arXiv},
year={2026}
}















