Add pipeline tag and library name, copy README content
#1
by
nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,3 +1,131 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
-
--
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-to-image
|
| 4 |
+
library_name: diffusers
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# RelationAdapter
|
| 8 |
+
|
| 9 |
+
> **RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers**
|
| 10 |
+
> <br>
|
| 11 |
+
> Yan Gong,
|
| 12 |
+
> Yiren Song,
|
| 13 |
+
> Yicheng Li,
|
| 14 |
+
> Chenglin Li,
|
| 15 |
+
> and
|
| 16 |
+
> Yin Zhang
|
| 17 |
+
> <br>
|
| 18 |
+
> Zhejiang University, National University of Singapore
|
| 19 |
+
> <br>
|
| 20 |
+
|
| 21 |
+
<a href="https://arxiv.org/abs/2506.02528"><img src="https://img.shields.io/badge/ariXv-2506.02528-A42C25.svg" alt="arXiv"></a>
|
| 22 |
+
<a href="https://huggingface.co/handsomeWilliam/RelationAdapter/tree/main"><img src="https://img.shields.io/badge/🤗_HuggingFace-Model-ffbd45.svg" alt="HuggingFace"></a>
|
| 23 |
+
<a href="https://huggingface.co/datasets/handsomeWilliam/Relation252K"><img src="https://img.shields.io/badge/🤗_HuggingFace-Dataset-ffbd45.svg" alt="HuggingFace"></a>
|
| 24 |
+
|
| 25 |
+
<br>
|
| 26 |
+
|
| 27 |
+
<img src='./assets/teaser.png' width='100%' />
|
| 28 |
+
|
| 29 |
+
## Quick Start
|
| 30 |
+
### Configuration
|
| 31 |
+
#### 1. **Environment setup**
|
| 32 |
+
```bash
|
| 33 |
+
git clone git@github.com:gy8888/RelationAdapter.git
|
| 34 |
+
cd RelationAdapter
|
| 35 |
+
|
| 36 |
+
conda create -n RelationAdapter python=3.11.10
|
| 37 |
+
conda activate RelationAdapter
|
| 38 |
+
```
|
| 39 |
+
#### 2. **Requirements installation**
|
| 40 |
+
```bash
|
| 41 |
+
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
|
| 42 |
+
pip install --upgrade -r requirements.txt
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
|
| 46 |
+
### 2. Inference
|
| 47 |
+
We provided the integration of FluxPipeline pipeline with our model and uploaded the model weights to huggingface, it's easy to use the our model as example below:
|
| 48 |
+
|
| 49 |
+
simply run the inference script:
|
| 50 |
+
```
|
| 51 |
+
python infer_single.py
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
### 3. Weights
|
| 56 |
+
You can download the trained checkpoints of RelationAdapter and LoRA for inference. Below are the details of available models.
|
| 57 |
+
|
| 58 |
+
You would need to load the `RelationAdapter` checkpoints model in order to fuse the `LoRA` checkpoints.
|
| 59 |
+
|
| 60 |
+
| **Model** | **Description** |
|
| 61 |
+
| :----------------------------------------------------------: | :---------------------------------------------------------: |
|
| 62 |
+
| [RelationAdapter](https://huggingface.co/handsomeWilliam/RelationAdapter/blob/main/ip_adapter-100000.bin) | Additional parameters from the RelationAdapter module are trained on the `Relation252K` dataset |
|
| 63 |
+
| [LoRA](https://huggingface.co/handsomeWilliam/RelationAdapter/blob/main/pytorch_lora_weights.safetensors) | LoRA parameters are trained on the `Relation252K` dataset |
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
### 4. Dataset
|
| 67 |
+
<span id="dataset_setting"></span>
|
| 68 |
+
#### 4.1 Paired Dataset Format
|
| 69 |
+
The paired dataset is stored in a .jsonl file, where each entry contains image file paths and corresponding text descriptions. Each entry includes source caption, target caption, and edit instruction describing the transformation from source image to target image.
|
| 70 |
+
|
| 71 |
+
Example format:
|
| 72 |
+
|
| 73 |
+
```json
|
| 74 |
+
{
|
| 75 |
+
"left_image_description": "Description of the left image",
|
| 76 |
+
"right_image_description": "Description of the right image",
|
| 77 |
+
"edit_instruction": "Instructions for the desired modifications",
|
| 78 |
+
"img_name": "path/to/image_pair.jpg"
|
| 79 |
+
},
|
| 80 |
+
{
|
| 81 |
+
"left_image_description": "Description of the left image2",
|
| 82 |
+
"right_image_description": "Description of the right image2",
|
| 83 |
+
"edit_instruction": "Another instruction",
|
| 84 |
+
"img_name": "path/to/image_pair2.jpg"
|
| 85 |
+
}
|
| 86 |
+
```
|
| 87 |
+
We have uploaded our datasets to [Hugging Face](https://huggingface.co/datasets/handsomeWilliam/Relation252K).
|
| 88 |
+
|
| 89 |
+
#### 4.2 Run-Ready Dataset Generation
|
| 90 |
+
To prepare the dataset for relational learning tasks such as analogy-based instruction scenarios, use the provided script
|
| 91 |
+
```
|
| 92 |
+
python dataset-All-2000-turn-5test.py
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
This script takes the original paired image dataset and converts it into a structured format where each entry includes:
|
| 96 |
+
Example format:
|
| 97 |
+
|
| 98 |
+
```json
|
| 99 |
+
{
|
| 100 |
+
"cond1": "path/to/prompt_image.jpg",
|
| 101 |
+
"cond2": "path/to/reference_image.jpg",
|
| 102 |
+
"source": "path/to/source_image.jpg",
|
| 103 |
+
"target": "path/to/target_image.jpg",
|
| 104 |
+
"text": "Instruction for the intended modifications"
|
| 105 |
+
},
|
| 106 |
+
{
|
| 107 |
+
"cond1": "path/to/prompt_image2.jpg",
|
| 108 |
+
"cond2": "path/to/reference_image2.jpg",
|
| 109 |
+
"source": "path/to/source_image2.jpg",
|
| 110 |
+
"target": "path/to/target_image2.jpg",
|
| 111 |
+
"text": "Instruction for the second modification"
|
| 112 |
+
}
|
| 113 |
+
```
|
| 114 |
+
|
| 115 |
+
### 5. Results
|
| 116 |
+
|
| 117 |
+

|
| 118 |
+
|
| 119 |
+
|
| 120 |
+
## Citation
|
| 121 |
+
```
|
| 122 |
+
@misc{gong2025relationadapterlearningtransferringvisual,
|
| 123 |
+
title={RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers},
|
| 124 |
+
author={Yan Gong and Yiren Song and Yicheng Li and Chenglin Li and Yin Zhang},
|
| 125 |
+
year={2025},
|
| 126 |
+
eprint={2506.02528},
|
| 127 |
+
archivePrefix={arXiv},
|
| 128 |
+
primaryClass={cs.CV},
|
| 129 |
+
url={https://arxiv.org/abs/2506.02528},
|
| 130 |
+
}
|
| 131 |
+
```
|