StanfordNeuroAILab
/

psi0_5

feature-extraction

video-generation

physical-world-model

controllable-generation

Model card Files Files and versions

psi0_5 / docs /usage.md

klemenk's picture

Update PSI-0.5 install instructions

a2e2408 verified 8 days ago

|

history blame contribute delete

3.07 kB

	# PSI-0.5 Usage Guide

	PSI-0.5 is a promptable physical world model. It accepts notation strings such
	as `rgb0->rgb1`, `rgb0,f01->f01,rgb1`, and `rgb0,c01->rgb1`, then fills in the
	requested missing visual variables.

	## Install

	```bash
	conda create -n psi-demos python=3.10 -y
	conda activate psi-demos
	pip install torch==2.7.1 --index-url https://download.pytorch.org/whl/cu126
	pip install transformers huggingface-hub einops h5py tiktoken numpy pillow opencv-python gradio matplotlib scipy
	```

	The PyTorch command above installs the CUDA 12.6 wheel used on the ccn2 A40
	nodes. For other machines, install the PyTorch build recommended for your
	driver/platform first.

	## Load With Transformers

	```python
	from PIL import Image
	from transformers import AutoModel

	predictor = AutoModel.from_pretrained(
	"StanfordNeuroAILab/psi0_5",
	trust_remote_code=True,
	device="cuda:0",
	)

	rgb1 = predictor.generate(
	"rgb0->rgb1",
	rgb0=Image.open("scene.png").convert("RGB"),
	seed=1110,
	temp=1.0,
	top_k=1000,
	top_p=1.0,
	)
	rgb1.save("scene_next.png")
	```

	## Sparse Flow Prompt

	```python
	from PIL import Image
	from transformers import AutoModel


	predictor = AutoModel.from_pretrained(
	"StanfordNeuroAILab/psi0_5",
	trust_remote_code=True,
	device="cuda:0",
	)
	rgb0 = Image.open("block_slide_rgb0.png").convert("RGB")
	f01 = predictor.sparse_flow_prompt([((70, 221), (168, 221))], rgb0.size)

	dense_flow, rgb1 = predictor.generate(
	"rgb0,f01->f01,rgb1",
	rgb0=rgb0,
	f01=f01,
	seed=1110,
	num_seq_patches=256,
	)
	```

	## Depth, Flow, And RGB

	```python
	import numpy as np
	from PIL import Image

	rgb0 = Image.open("billiards_rgb0.png").convert("RGB")
	depth0 = np.load("billiards_d0_meters.npy").astype(np.float32)
	f01 = predictor.sparse_flow_prompt([((392, 171), (238, 94))], rgb0.size)

	dense_flow, depth1, rgb1 = predictor.generate(
	"rgb0,d0,f01->f01,d1,rgb1",
	rgb0=rgb0,
	d0=depth0,
	f01=f01,
	seed=1110,
	num_seq_patches=256,
	)
	```

	## Camera-Conditioned Novel View Synthesis

	```python
	camera = {
	"fov_x": 60.0,
	"fov_y": 60.0,
	"euler_angles": [0.0, -0.12, 0.0],
	"translation": [0.10, 0.0, 0.04],
	}

	rgb1 = predictor.generate(
	"rgb0,c01->rgb1",
	rgb0=Image.open("coffee_mug_000.png").convert("RGB"),
	c01=camera,
	seed=1110,
	)
	```

	## Advanced Paths

	All runtime files needed by Transformers remote code live at the repository
	root. The release manifest lists the default checkpoint and tokenizer assets for
	reproducibility.

	PSIv0.5 is a modestly sized model that has not undergone any post-training yet.
	Some of its rollouts diverge. We recommend unrestricted sampling for flow
	prediction and `top_p=0.9`, `top_k=1000` for RGB rendering. Correct prompting
	can significantly improve generations, and simple harnesses such as those in the
	provided Gradio app can be used to steer the model much more effectively. We
	believe this direction has great potential for scaling to create even more
	comprehensive models of the world while maintaining this highly controllable
	API.