Instructions to use StanfordNeuroAILab/psi0_5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use StanfordNeuroAILab/psi0_5 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("StanfordNeuroAILab/psi0_5", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
PSI-0.5 Usage Guide
PSI-0.5 is a promptable physical world model. It accepts notation strings such
as rgb0->rgb1, rgb0,f01->f01,rgb1, and rgb0,c01->rgb1, then fills in the
requested missing visual variables.
Install
conda create -n psi-demos python=3.10 -y
conda activate psi-demos
pip install torch==2.7.1 --index-url https://download.pytorch.org/whl/cu126
pip install transformers huggingface-hub einops h5py tiktoken numpy pillow opencv-python gradio matplotlib scipy
The PyTorch command above installs the CUDA 12.6 wheel used on the ccn2 A40 nodes. For other machines, install the PyTorch build recommended for your driver/platform first.
Load With Transformers
from PIL import Image
from transformers import AutoModel
predictor = AutoModel.from_pretrained(
"StanfordNeuroAILab/psi0_5",
trust_remote_code=True,
device="cuda:0",
)
rgb1 = predictor.generate(
"rgb0->rgb1",
rgb0=Image.open("scene.png").convert("RGB"),
seed=1110,
temp=1.0,
top_k=1000,
top_p=1.0,
)
rgb1.save("scene_next.png")
Sparse Flow Prompt
from PIL import Image
from transformers import AutoModel
predictor = AutoModel.from_pretrained(
"StanfordNeuroAILab/psi0_5",
trust_remote_code=True,
device="cuda:0",
)
rgb0 = Image.open("block_slide_rgb0.png").convert("RGB")
f01 = predictor.sparse_flow_prompt([((70, 221), (168, 221))], rgb0.size)
dense_flow, rgb1 = predictor.generate(
"rgb0,f01->f01,rgb1",
rgb0=rgb0,
f01=f01,
seed=1110,
num_seq_patches=256,
)
Depth, Flow, And RGB
import numpy as np
from PIL import Image
rgb0 = Image.open("billiards_rgb0.png").convert("RGB")
depth0 = np.load("billiards_d0_meters.npy").astype(np.float32)
f01 = predictor.sparse_flow_prompt([((392, 171), (238, 94))], rgb0.size)
dense_flow, depth1, rgb1 = predictor.generate(
"rgb0,d0,f01->f01,d1,rgb1",
rgb0=rgb0,
d0=depth0,
f01=f01,
seed=1110,
num_seq_patches=256,
)
Camera-Conditioned Novel View Synthesis
camera = {
"fov_x": 60.0,
"fov_y": 60.0,
"euler_angles": [0.0, -0.12, 0.0],
"translation": [0.10, 0.0, 0.04],
}
rgb1 = predictor.generate(
"rgb0,c01->rgb1",
rgb0=Image.open("coffee_mug_000.png").convert("RGB"),
c01=camera,
seed=1110,
)
Advanced Paths
All runtime files needed by Transformers remote code live at the repository root. The release manifest lists the default checkpoint and tokenizer assets for reproducibility.
PSIv0.5 is a modestly sized model that has not undergone any post-training yet.
Some of its rollouts diverge. We recommend unrestricted sampling for flow
prediction and top_p=0.9, top_k=1000 for RGB rendering. Correct prompting
can significantly improve generations, and simple harnesses such as those in the
provided Gradio app can be used to steer the model much more effectively. We
believe this direction has great potential for scaling to create even more
comprehensive models of the world while maintaining this highly controllable
API.