LIVEditor
Lightning Unified Video Editing via In-Context Sparse Attention
Shitong Shao · Zikai Zhou · Haopeng Li · Yingwei Song · Wenliang Zhong · Lichen Bai · Zeke Xie
Overview
LIVEditor is a unified video editing model built for fast in-context video editing. It introduces In-Context Sparse Attention (ISA), a lightweight sparse attention mechanism that retrieves relevant source-video context blocks instead of applying dense full attention over all source and generated video tokens.
The model is designed to preserve the editing quality of in-context full-attention video editing while substantially reducing attention latency.
Highlights
- Unified video editing: one editor for diverse text-guided video editing scenarios.
- In-Context Sparse Attention: retrieves only the most relevant source-video blocks for each query block.
- Training-free acceleration block: ISA can be plugged into the diffusion transformer attention backend.
- Efficient sparse kernels: supports both TileLang and Triton implementations.
- Strong speedup: the paper reports up to 2.8× faster attention than FlashAttention-2 at 65K tokens on RTX 4090.
Demo
| Source Video | LIVEditor Output (TileLang) | LIVEditor Output (Triton) |
|---|---|---|
![]() |
![]() |
![]() |
MP4 downloads: source · TileLang output · Triton output
More qualitative comparisons are available on the project page.
Method
LIVEditor stores compressed key/value representations of the source video, computes block-wise relevance scores, retrieves top-k source blocks, and applies sparse piecewise attention for efficient in-context editing. Query blocks with sharper attention patterns can use full FlashAttention, while diffuse blocks use the sparse Top-K path.
Quick Start
Clone the code repository:
git clone https://github.com/xie-lab-ml/Lightning-Unified-Video-Editor-via-In-Context-Sparse-Attention.git
cd Lightning-Unified-Video-Editor-via-In-Context-Sparse-Attention
pip install -r requirements.txt
Download the LIVEditor checkpoint:
pip install huggingface_hub
huggingface-cli download sst12345/liveditor liveditor_ckpt.bin --local-dir .
Run inference:
python inference.py \
--config inference.yaml \
--checkpoint liveditor_ckpt.bin \
--input assets/input.mp4 \
--prompt "Add a small golden crown with delicate jewels on top of the girl's head..." \
--output result.mp4
Model Files
| File | Description |
|---|---|
liveditor_ckpt.bin |
LIVEditor fine-tuned checkpoint |
assets/live_visualization.jpg |
Teaser image for the model card |
assets/in_context_sparse_attention.png |
Method overview |
assets/input.mp4 |
Example input video |
assets/output_tilelang.mp4 |
Example output using TileLang backend |
assets/output_triton.mp4 |
Example output using Triton backend |
assets/input.gif |
Browser-friendly source preview |
assets/output_tilelang.gif |
Browser-friendly TileLang preview |
assets/output_triton.gif |
Browser-friendly Triton preview |
Citation
@article{shao2026liveditor,
title={LIVEditor: Lightning Unified Video Editing via In-Context Sparse Attention},
author={Shao, Shitong and Zhou, Zikai and Li, Haopeng and Song, Yingwei and Zhong, Wenliang and Bai, Lichen and Xie, Zeke},
journal={arXiv preprint arXiv:2605.04569},
year={2026}
}
Model tree for sst12345/liveditor
Base model
Wan-AI/Wan2.2-T2V-A14B

