PSI-0.5

A richly controllable physical world model

Prompt PSI with images, motion, depth, camera pose, or partial future states, and ask it to complete the missing pieces of a physical scene.

Paper · Release gallery · Usage guide

Back-to-back PSI-0.5 gallery examples

A back-to-back reel of PSI-0.5 gallery examples.

PSI treats visual prediction as a promptable modeling problem. A prompt can be as simple as rgb0->rgb1, or it can include explicit control handles such as optical flow, depth, camera motion, and partially specified future frames. The same predictor handles all of these notations.

What You Can Prompt

Prompt What PSI Does
rgb0->rgb1 continue a scene one frame forward
rgb0->f01,rgb1 imagine motion and render the next frame
rgb0,f01->f01,rgb1 densify a sparse flow prompt, then render
rgb0,d0,f01->f01,d1,rgb1 use depth and motion to predict flow, depth, and RGB
rgb0,c01->rgb1 synthesize a new camera view

Quick Start

from PIL import Image
from transformers import AutoModel

predictor = AutoModel.from_pretrained(
    "StanfordNeuroAILab/psi0_5",
    trust_remote_code=True,
    device="cuda:0",
)
rgb1 = predictor.generate("rgb0->rgb1", rgb0=Image.open("scene.png"))
rgb1.save("scene_next.png")

A Sparse Motion Prompt

f01 = predictor.sparse_flow_prompt([((70, 221), (168, 221))], rgb0.size)

dense_flow, rgb1 = predictor.generate(
    "rgb0,f01->f01,rgb1",
    rgb0=rgb0,
    f01=f01,
    num_seq_patches=256,
)

Novel View Synthesis

camera = {
    "fov_x": 60.0,
    "fov_y": 60.0,
    "euler_angles": [0.0, -0.12, 0.0],
    "translation": [0.10, 0.0, 0.04],
}

rgb1 = predictor.generate(
    "rgb0,c01->rgb1",
    rgb0=Image.open("coffee_mug_000.png"),
    c01=camera,
)

More Examples

The full usage guide includes sparse flow construction, depth/flow prompting, camera-conditioned NVS, visual statistics, and scriptable demos:

docs/usage.md

The release gallery shows many prompt patterns in action:

https://neuroailab.github.io/psi-website/blog/psi-generations.html

PSIv0.5 is a modestly sized model that has not undergone any post-training yet. Some of its rollouts diverge. We recommend unrestricted sampling for flow prediction and top_p=0.9, top_k=1000 for RGB rendering. Correct prompting can significantly improve generations, and simple harnesses such as those in the provided Gradio app can be used to steer the model much more effectively. We believe this direction has great potential for scaling to create even more comprehensive models of the world while maintaining this highly controllable API.

Downloads last month
244
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for StanfordNeuroAILab/psi0_5