File size: 5,902 Bytes
8f4e5b9 64c3b50 6298130 8f4e5b9 71513cb f9eadb2 71513cb 8f4e5b9 222d18b 71513cb 6298130 71513cb 222d18b 01616ee 6298130 71513cb 222d18b 6298130 8f4e5b9 71513cb 6298130 f9eadb2 8f4e5b9 6298130 8f4e5b9 222d18b 8f4e5b9 6298130 8f4e5b9 71513cb 8f4e5b9 222d18b 8f4e5b9 222d18b 8f4e5b9 6298130 8f4e5b9 222d18b 71513cb 01616ee 71513cb 222d18b 01616ee 222d18b 8f4e5b9 01616ee f9eadb2 01616ee 8f4e5b9 01616ee f9eadb2 8f4e5b9 6298130 8f4e5b9 01616ee 8f4e5b9 01616ee 8f4e5b9 01616ee 8f4e5b9 6298130 71513cb 8f4e5b9 222d18b 8f4e5b9 222d18b 8f4e5b9 222d18b 8f4e5b9 6298130 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | ---
license: mit
tags:
- image-to-image
- reflection-removal
- highlight-removal
- computer-vision
- dinov3
- surgical-imaging
---
# UnReflectAnything
[](https://alberto-rota.github.io/UnReflectAnything/)
[](https://pypi.org/project/unreflectanything/)
[](https://arxiv.org/abs/2512.09583)
[](https://huggingface.co/spaces/AlbeRota/UnReflectAnything)
[](https://huggingface.co/AlbeRota/UnReflectAnything)
[](https://github.com/alberto-rota/UnReflectAnything/wiki)
[](https://mit-license.org/)
UnReflectAnything inputs any RGB image and removes specular highlights, returning a clean diffuse-only outputs. We trained UnReflectAnything by synthetizing specularities and supervising in DINOv3 feature space.
UnReflectAnything works on both natural indoor and surgical/endoscopic domain data.
---
## Architecture

* **<font color="#a001e0">Encoder</font> ($\mathit{\textcolor{a001e0}{E}}$ )**: Processes the input image $\mathbf{I}$ to extract a rich latent representation, $\mathbf{F}_\ell$. This is the off-the-shelf pretrained [DINOv3-large](https://huggingface.co/facebook/dinov3-vitl16-pretrain-lvd1689m)
* **<font color="#0167ff">Reflection Predictor</font> ($\mathit{\textcolor{0167ff}{H}}$ )**: Predicts a soft highlight mask (**H**), identifying areas of specular highlights.
* **Masking Operation</font> ($\mathit{P}$ )**: A binary mask **P** is derived from the prediction and applied to the feature map: $(1-\mathbf{P}) \odot \mathbf{F}_\ell$. This removes features contaminated by reflections, leaving "holes" in the data.
* **<font color="#23ac2c">Token Inpainter</font> ($\mathit{\textcolor{23ac2c}{T}}$ )**: Acts as a neural in-painter. It processes the masked features and uses the surrounding clean context prior and a learned mask token to synthesize the missing information in embedding space, producing the completed feature map $\mathbf{F}_{\text{comp}}$.
* **<font color="#ff7700">Decoder</font> ($\mathit{\textcolor{ff7700}{D}}$ )**: Project the completed features back into the pixel space to generate the final, reflection-free image $\mathbf{I}_{\text{diff}}$.
---
## Training Strategy
We train UnReflectAnything with **Synthetic Specular Supervision** by inferring 3D geometry from [MoGe-2](https://wangrc.site/MoGe2Page/) and rendering highlights with a Blinn-Phong reflection model. We randomly sample the light source position in 3D space at every training iteration enhance etherogeneity.

We train the model in two stages
1. **DPT Decoder Pre-Training**: The **<font color="#ff7700">Decoder</font>** is first pre-trained in an autoencoder configuration ($\min_{\theta} \mathcal{L}(M_{\theta}(\mathbf{I}), \mathbf{I})$) to ensure it can reconstruct realistic RGB textures from the DINOV3 latent space.
2. **End-to-End Refinement**: The full pipeline is then trained to predict reflection masks from $\mathit{\textcolor{0167ff}{H}}$, and fill them using the **<font color="#38761D">Token Inpainter</font>**, ensuring the final output is both visually consistent and physically accurate. The decoder is also fine-tuned at this stage
## Weights
Install the API and CLI on a **Python>=3.11** environment with
```bash
pip install unreflectanything
```
then run
```bash
unreflectanything download --weights
```
to download the `.pth` weights in the package cache dir. The cache dir is usually at `.cache/unreflectanything`
---
### Basic Python Usage
```python
import unreflectanything
import torch
# Load the pretrained model (uses cached weights)
unreflect_model = unreflectanything.model()
# Run inference on a tensor [B, 3, H, W] in range [0, 1]
images = torch.rand(2, 3, 448, 448).cuda()
diffuse_output = unreflect_model(images)
# Simple file-based inference
unreflectanything.inference("input_with_highlights.png", output="diffuse_result.png")
```
Refer to the [Wiki](https://github.com/alberto-rota/UnReflectAnything/wiki) for all details on the API endpoints
---
### CLI Overview
The package provides a comprehensive command-line interface via `ura`, `unreflect`, or `unreflectanything`.
* **Inference**: `ura inference --input /path/to/images --output /path/to/output`
* **Evaluation**: `ura evaluate --output /path/to/results --gt /path/to/groundtruth`
* **Verification**: `ura verify --dataset /path/to/dataset`
Refer to the [Wiki](https://github.com/alberto-rota/UnReflectAnything/wiki) for all details on the CLI endpoints
---
## Citation
If you use UnReflectAnything in your research or pipeline, please cite our paper:
```bibtex
@misc{rota2025unreflectanythingrgbonlyhighlightremoval,
title={UnReflectAnything: RGB-Only Highlight Removal by Rendering Synthetic Specular Supervision},
author={Alberto Rota and Mert Kiray and Mert Asim Karaoglu and Patrick Ruhkamp and Elena De Momi and Nassir Navab and Benjamin Busam},
year={2025},
eprint={2512.09583},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={[https://arxiv.org/abs/2512.09583](https://arxiv.org/abs/2512.09583)},
}
```
---
|