Request access to SceneReVis-7B

Please fill out the form below. Access will be granted automatically after submission.

By requesting access to SceneReVis-7B, you agree to the following terms: 1. You will use this model only for academic research purposes. 2. You will not redistribute the model weights without permission. 3. You will cite our paper in any published work that uses this model.

Log in or Sign Up to review the conditions and access this model content.

SceneReVis-7B

SceneReVis-7B is a vision-language model fine-tuned for iterative 3D indoor scene generation and editing.

Model Details

  • Base Model: Qwen2.5-VL-7B-Instruct
  • Training: SFT on SceneChain-12K + GRPO reinforcement learning with voxel-based physics rewards
  • Architecture: Vision-Language Model with tool-calling capabilities

Usage

See the SceneReVis repository for inference instructions.

Citation

@article{zhao2026scenerevis,
  title={SceneReVis: A Self-Reflective Vision-Grounded Framework for 3D Indoor Scene Synthesis via Multi-turn RL},
  author={Yang Zhao and Shizhao Sun and Meisheng Zhang and Yingdong Shi and Xubo Yang and Jiang Bian},
  journal={arXiv preprint arXiv:2602.09432},
  year={2026}
}
Downloads last month
8
Safetensors
Model size
8B params
Tensor type
BF16
·
Video Preview
loading

Model tree for runder1/SceneReVis-7B

Finetuned
(988)
this model

Paper for runder1/SceneReVis-7B