Instructions to use StanfordNeuroAILab/psi0_5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use StanfordNeuroAILab/psi0_5 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("StanfordNeuroAILab/psi0_5", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
PSI-0.5
A richly controllable physical world model
Prompt PSI with images, motion, depth, camera pose, or partial future states, and ask it to complete the missing pieces of a physical scene.
Paper · Release gallery · Usage guide
A back-to-back reel of PSI-0.5 gallery examples.
PSI treats visual prediction as a promptable modeling problem. A prompt can be
as simple as rgb0->rgb1, or it can include explicit control handles such as
optical flow, depth, camera motion, and partially specified future frames. The
same predictor handles all of these notations.
What You Can Prompt
| Prompt | What PSI Does |
|---|---|
rgb0->rgb1 |
continue a scene one frame forward |
rgb0->f01,rgb1 |
imagine motion and render the next frame |
rgb0,f01->f01,rgb1 |
densify a sparse flow prompt, then render |
rgb0,d0,f01->f01,d1,rgb1 |
use depth and motion to predict flow, depth, and RGB |
rgb0,c01->rgb1 |
synthesize a new camera view |
Quick Start
from PIL import Image
from transformers import AutoModel
predictor = AutoModel.from_pretrained(
"StanfordNeuroAILab/psi0_5",
trust_remote_code=True,
device="cuda:0",
)
rgb1 = predictor.generate("rgb0->rgb1", rgb0=Image.open("scene.png"))
rgb1.save("scene_next.png")
A Sparse Motion Prompt
f01 = predictor.sparse_flow_prompt([((70, 221), (168, 221))], rgb0.size)
dense_flow, rgb1 = predictor.generate(
"rgb0,f01->f01,rgb1",
rgb0=rgb0,
f01=f01,
num_seq_patches=256,
)
Novel View Synthesis
camera = {
"fov_x": 60.0,
"fov_y": 60.0,
"euler_angles": [0.0, -0.12, 0.0],
"translation": [0.10, 0.0, 0.04],
}
rgb1 = predictor.generate(
"rgb0,c01->rgb1",
rgb0=Image.open("coffee_mug_000.png"),
c01=camera,
)
More Examples
The full usage guide includes sparse flow construction, depth/flow prompting, camera-conditioned NVS, visual statistics, and scriptable demos:
docs/usage.md
The release gallery shows many prompt patterns in action:
https://neuroailab.github.io/psi-website/blog/psi-generations.html
PSIv0.5 is a modestly sized model that has not undergone any post-training yet.
Some of its rollouts diverge. We recommend unrestricted sampling for flow
prediction and top_p=0.9, top_k=1000 for RGB rendering. Correct prompting
can significantly improve generations, and simple harnesses such as those in the
provided Gradio app can be used to steer the model much more effectively. We
believe this direction has great potential for scaling to create even more
comprehensive models of the world while maintaining this highly controllable
API.
- Downloads last month
- 244