Papers
arxiv:2606.23514

Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation

Published on Jun 22
· Submitted by
Jan-Niklas Dihlmann
on Jun 23
Authors:
,
,
,
,

Abstract

Arbor enables explicit 3D spatial control in text-conditioned latent generation through constraint meshes that define occupancy, avoidance, and contact regions, maintaining object quality while improving constraint adherence.

Text and image conditioned 3D models now generate convincing assets, but they still offer little direct control over the space an object should occupy or avoid. In authoring, this spatial intent is often known before generation starts. A chair should fit a seating envelope, a prop should leave clearance for motion, or a part should expose a contact surface. Prompts and image views are poor carriers for such constraints, requiring the need for an explicit control interface. We present Arbor, a trainable attachment for text conditioned latent 3D generation. Arbor introduces constraint meshes as a native 3D control interface. The interface uses hull regions where geometry should exist, avoidance regions that should remain empty, and touch regions the object should contact. Unlike completion or whole object scaffold control, these meshes are not target evidence. They are local typed requirements and can include regions where no surface should appear. Arbor keeps this signal as geometry by converting constraint meshes into tokens and learning a routed attachment inside a frozen denoiser. Each latent region can therefore receive the part of the constraint that matters for its spatial location. We evaluate Arbor on automatic and artist curated control benchmarks with hull, avoidance, and touch constraints, and compare the metric trends to a user preference study. Even without dedicated compliance losses, Arbor improves constraint obedience while preserving object quality and variation under fixed constraints.

Community

Paper submitter

Hey everyone, today we share our newest work:

Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation

Current 3D generation methods can create 3D objects from text prompts, but this often behaves like a slot machine. You ask for an object, but you do not know whether it will satisfy the spatial requirements needed for production. For movies, games, animation, or asset design, this is a problem: an object may need to fit a fixed envelope, leave space for motion, or touch a specific surface.

Arbor addresses this by adding explicit geometry constraints to text-to-3D generation. Users provide constraint meshes that mark:

  • HULL regions where geometry should exist
  • AVOID regions that should remain empty
  • TOUCH regions the object should contact

The method builds on the TRELLIS family. It keeps the text generator and geometry encoders frozen, turns the constraint meshes into compact geometry tokens, and routes local constraint evidence into the generator.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.23514
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.23514 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.23514 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.23514 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.