PSI-0.5

A richly controllable physical world model

Prompt PSI with images, motion, depth, camera pose, or partial future states, and ask it to complete the missing pieces of a physical scene.

Paper · Release gallery · Usage guide

_{A back-to-back reel of PSI-0.5 gallery examples.}

PSI treats visual prediction as a promptable modeling problem. A prompt can be as simple as rgb0->rgb1, or it can include explicit control handles such as optical flow, depth, camera motion, and partially specified future frames. The same predictor handles all of these notations.

What You Can Prompt

Prompt	What PSI Does
`rgb0->rgb1`	continue a scene one frame forward
`rgb0->f01,rgb1`	imagine motion and render the next frame
`rgb0,f01->f01,rgb1`	densify a sparse flow prompt, then render
`rgb0,d0,f01->f01,d1,rgb1`	use depth and motion to predict flow, depth, and RGB
`rgb0,c01->rgb1`	synthesize a new camera view

Quick Start

from PIL import Image
from transformers import AutoModel

predictor = AutoModel.from_pretrained(
    "StanfordNeuroAILab/psi0_5",
    trust_remote_code=True,
    device="cuda:0",
)
rgb1 = predictor.generate("rgb0->rgb1", rgb0=Image.open("scene.png"))
rgb1.save("scene_next.png")

A Sparse Motion Prompt

f01 = predictor.sparse_flow_prompt([((70, 221), (168, 221))], rgb0.size)

dense_flow, rgb1 = predictor.generate(
    "rgb0,f01->f01,rgb1",
    rgb0=rgb0,
    f01=f01,
    num_seq_patches=256,
)

Novel View Synthesis

camera = {
    "fov_x": 60.0,
    "fov_y": 60.0,
    "euler_angles": [0.0, -0.12, 0.0],
    "translation": [0.10, 0.0, 0.04],
}

rgb1 = predictor.generate(
    "rgb0,c01->rgb1",
    rgb0=Image.open("coffee_mug_000.png"),
    c01=camera,
)

More Examples

The full usage guide includes sparse flow construction, depth/flow prompting, camera-conditioned NVS, visual statistics, and scriptable demos:

docs/usage.md

The release gallery shows many prompt patterns in action:

https://neuroailab.github.io/psi-website/blog/psi-generations.html

PSIv0.5 is a modestly sized model that has not undergone any post-training yet. Some of its rollouts diverge. We recommend unrestricted sampling for flow prediction and top_p=0.9, top_k=1000 for RGB rendering. Correct prompting can significantly improve generations, and simple harnesses such as those in the provided Gradio app can be used to steer the model much more effectively. We believe this direction has great potential for scaling to create even more comprehensive models of the world while maintaining this highly controllable API.

Downloads last month: 244

Paper for StanfordNeuroAILab/psi0_5

World Modeling with Probabilistic Structure Integration

Paper • 2509.09737 • Published Sep 10, 2025 • 14