Instructions to use DarthZhu/VideoRLVR-Wan2.2-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use DarthZhu/VideoRLVR-Wan2.2-Base with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("DarthZhu/VideoRLVR-Wan2.2-Base", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
VideoRLVR
VideoRLVR is a reinforcement learning (RL) recipe for training video reasoning models with verifiable rewards, introduced in the paper Video Models Can Reason with Verifiable Rewards.
This checkpoint is an SFT version of Wan2.2-TI2V-5B trained on procedurally generated reasoning tasks including Maze, FlowFree, and Sokoban.
- Paper: Video Models Can Reason with Verifiable Rewards
- Project Page: https://darthzhu.github.io/VideoRLVR-page/
- Repository: https://github.com/luka-group/VideoRLVR
Overview
VideoRLVR formulates video reasoning as the generation of verifiable visual trajectories. It utilizes an SDE-GRPO optimization backbone, dense decomposed rewards, and an Early-Step Focus strategy for efficient training. This approach enables video diffusion models to satisfy explicit spatial, temporal, or logical constraints, moving beyond perceptual imitation toward reliable rule-consistent visual reasoning.
Across tasks like Maze, FlowFree, and Sokoban, VideoRLVR consistently improves over supervised fine-tuning baselines, demonstrating that verifiable RL can effectively optimize models for objective success criteria.
Citation
@article{zhu2026video,
title={Video Models Can Reason with Verifiable Rewards},
author={Tinghui Zhu and Sheng Zhang and James Y. Huang and Selena Song and Xiaofei Wen and Yuankai Li and Hoifung Poon and Muhao Chen},
journal={arXiv preprint arXiv:2605.15458},
year={2026}
}
- Downloads last month
- 5