# PSI-0.5 Usage Guide PSI-0.5 is a promptable physical world model. It accepts notation strings such as `rgb0->rgb1`, `rgb0,f01->f01,rgb1`, and `rgb0,c01->rgb1`, then fills in the requested missing visual variables. ## Install ```bash conda create -n psi-demos python=3.10 -y conda activate psi-demos pip install torch==2.7.1 --index-url https://download.pytorch.org/whl/cu126 pip install transformers huggingface-hub einops h5py tiktoken numpy pillow opencv-python gradio matplotlib scipy ``` The PyTorch command above installs the CUDA 12.6 wheel used on the ccn2 A40 nodes. For other machines, install the PyTorch build recommended for your driver/platform first. ## Load With Transformers ```python from PIL import Image from transformers import AutoModel predictor = AutoModel.from_pretrained( "StanfordNeuroAILab/psi0_5", trust_remote_code=True, device="cuda:0", ) rgb1 = predictor.generate( "rgb0->rgb1", rgb0=Image.open("scene.png").convert("RGB"), seed=1110, temp=1.0, top_k=1000, top_p=1.0, ) rgb1.save("scene_next.png") ``` ## Sparse Flow Prompt ```python from PIL import Image from transformers import AutoModel predictor = AutoModel.from_pretrained( "StanfordNeuroAILab/psi0_5", trust_remote_code=True, device="cuda:0", ) rgb0 = Image.open("block_slide_rgb0.png").convert("RGB") f01 = predictor.sparse_flow_prompt([((70, 221), (168, 221))], rgb0.size) dense_flow, rgb1 = predictor.generate( "rgb0,f01->f01,rgb1", rgb0=rgb0, f01=f01, seed=1110, num_seq_patches=256, ) ``` ## Depth, Flow, And RGB ```python import numpy as np from PIL import Image rgb0 = Image.open("billiards_rgb0.png").convert("RGB") depth0 = np.load("billiards_d0_meters.npy").astype(np.float32) f01 = predictor.sparse_flow_prompt([((392, 171), (238, 94))], rgb0.size) dense_flow, depth1, rgb1 = predictor.generate( "rgb0,d0,f01->f01,d1,rgb1", rgb0=rgb0, d0=depth0, f01=f01, seed=1110, num_seq_patches=256, ) ``` ## Camera-Conditioned Novel View Synthesis ```python camera = { "fov_x": 60.0, "fov_y": 60.0, "euler_angles": [0.0, -0.12, 0.0], "translation": [0.10, 0.0, 0.04], } rgb1 = predictor.generate( "rgb0,c01->rgb1", rgb0=Image.open("coffee_mug_000.png").convert("RGB"), c01=camera, seed=1110, ) ``` ## Advanced Paths All runtime files needed by Transformers remote code live at the repository root. The release manifest lists the default checkpoint and tokenizer assets for reproducibility. PSIv0.5 is a modestly sized model that has not undergone any post-training yet. Some of its rollouts diverge. We recommend unrestricted sampling for flow prediction and `top_p=0.9`, `top_k=1000` for RGB rendering. Correct prompting can significantly improve generations, and simple harnesses such as those in the provided Gradio app can be used to steer the model much more effectively. We believe this direction has great potential for scaling to create even more comprehensive models of the world while maintaining this highly controllable API.