Update README.md

6c8c2d1 verified 6 months ago

5.52 kB

	---
	base_model:
	- stabilityai/stable-diffusion-2-1
	datasets:
	- manycore-research/SpatialGen-Testset
	license: creativeml-openrail-m
	pipeline_tag: image-to-3d
	---

	# SpatialGen: Layout-guided 3D Indoor Scene Generation

	<!-- markdownlint-disable first-line-h1 -->
	<!-- markdownlint-disable html -->
	<!-- markdownlint-disable no-duplicate-header -->

	<div align="center">
	<picture>
	<source srcset="https://cdn-uploads.huggingface.co/production/uploads/6437c0ead38ce48bdd4b0067/myrWYVNd4m-DuxV39VQZ0.png" media="(prefers-color-scheme: dark)">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/6437c0ead38ce48bdd4b0067/QQvDtmokH4ZjwH0wppqFC.png" width="60%" alt="SpatialLM""/>
	</picture>
	</div>
	<hr style="margin-top: 0; margin-bottom: 8px;">
	<div align="center" style="margin-top: 0; padding-top: 0; line-height: 1;">
	<a href="https://manycore-research.github.io/SpatialGen" target="_blank" style="margin: 2px;"><img alt="Project"
	src="https://img.shields.io/badge/🌐%20Project-SpatialGen-ffc107?color=42a5f5&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
	<a href="https://arxiv.org/abs/2509.14981" target="_blank" style="margin: 2px;"><img alt="arXiv"
	src="https://img.shields.io/badge/arXiv-SpatialGen-b31b1b?logo=arxiv&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
	<a href="https://github.com/manycore-research/SpatialGen" target="_blank" style="margin: 2px;"><img alt="GitHub"
	src="https://img.shields.io/badge/GitHub-SpatialGen-24292e?logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
	<a href="https://huggingface.co/manycore-research/SpatialGen-1.0" target="_blank" style="margin: 2px;"><img alt="Hugging Face"
	src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SpatialGen-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
	</div>

	<div align="center">

	\| Image-to-Scene Results \| Text-to-Scene Results \|
	\| :--------------------------------------: \| :----------------------------------------: \|
	\| ![Img2Scene](https://cdn-uploads.huggingface.co/production/uploads/6437c0ead38ce48bdd4b0067/ksN5t8QEu3Iv6KhpsYsk6.png) \| ![Text2Scene](https://cdn-uploads.huggingface.co/production/uploads/6437c0ead38ce48bdd4b0067/waCRa3kp01KAsKgmqS1bb.png) \|

	<p>TL;DR: Given a 3D semantic layout, SpatialGen can generate a 3D indoor scene conditioned on either a reference image (left) or a textual description (right) using a multi-view, multi-modal diffusion model.</p>
	</div>

	## ✨ News

	- [Sep, 2025] We released the paper of SpatialGen!
	- [Aug, 2025] Initial release of SpatialGen-1.0!

	## 📋 Release Plan

	- [x] Provide inference code of SpatialGen.
	- [ ] Provide training instruction for SpatialGen.
	- [ ] Release SpatialGen dataset.

	## SpatialGen Models

	<div align="center">

	\| Model \| Download \|
	\| :-----------------------: \| -------------------------------------------------------------------------------------\|
	\| SpatialGen-1.0 \| [🤗 HuggingFace](https://huggingface.co/manycore-research/SpatialGen-1.0) \|
	\| FLUX.1-Wireframe-dev-lora \| [🤗 HuggingFace](https://huggingface.co/manycore-research/FLUX.1-Wireframe-dev-lora) \|

	</div>

	## Usage

	### 🔧 Installation

	Tested with the following environment:
	* Python 3.10
	* PyTorch 2.3.1
	* CUDA Version 12.1

	```bash
	# clone the repository
	git clone https://github.com/manycore-research/SpatialGen.git
	cd SpatialGen

	python -m venv .venv
	source .venv/bin/activate

	pip install -r requirements.txt
	# Optional: fix the [flux inference bug](https://github.com/vllm-project/vllm/issues/4392)
	pip install nvidia-cublas-cu12==12.4.5.8
	```

	### 📊 Dataset

	We provide [SpatialGen-Testset](https://huggingface.co/datasets/manycore-research/SpatialGen-Testset) with 48 rooms, which labeled with 3D layout and 4.8K rendered images (48 x 100 views, including RGB, normal, depth maps and semantic maps) for MVD inference.

	### Inference

	```bash
	# Single image-to-3D Scene
	bash scripts/infer_spatialgen_i2s.sh

	# Text-to-image-to-3D Scene
	# in captions/spatialgen_testset_captions.jsonl, we provide text prompts of different styles for each room,
	# choose a pair of scene_id and prompt to run the text2scene experiment
	bash scripts/infer_spatialgen_t2s.sh
	```

	## License

	[SpatialGen-1.0](https://huggingface.co/manycore-research/SpatialGen-1.0) is derived from [Stable-Diffusion-v2.1](https://github.com/Stability-AI/stablediffusion), which is licensed under the [CreativeML Open RAIL++-M License](https://github.com/Stability-AI/stablediffusion/blob/main/LICENSE-MODEL).

	## Citation

	```bibtex
	@inproceedings{SpatialGen,
	title = {SpatialGen: Layout-guided 3D Indoor Scene Generation},
	author = {Fang, Chuan and Li, Heng and Liang, Yixu and Zheng, Jia and Mao, Yongsen and Liu, Yuan and Tang, Rui and Zhou, Zihan and Tan, Ping},
	booktitle = {International Conference on 3D Vision},
	year = {2026}
	}
	```

	## Acknowledgements

	We would like to thank the following projects that made this work possible:

	[DiffSplat](https://github.com/chenguolin/DiffSplat) \| [SD 2.1](https://github.com/Stability-AI/stablediffusion) \| [TAESD](https://github.com/madebyollin/taesd) \| [FLUX](https://github.com/black-forest-labs/flux/) \| [SpatialLM](https://github.com/manycore-research/SpatialLM)