LIVEditor

Lightning Unified Video Editing via In-Context Sparse Attention

Shitong Shao · Zikai Zhou · Haopeng Li · Yingwei Song · Wenliang Zhong · Lichen Bai · Zeke Xie

LIVEditor teaser

Overview

LIVEditor is a unified video editing model built for fast in-context video editing. It introduces In-Context Sparse Attention (ISA), a lightweight sparse attention mechanism that retrieves relevant source-video context blocks instead of applying dense full attention over all source and generated video tokens.

The model is designed to preserve the editing quality of in-context full-attention video editing while substantially reducing attention latency.

Highlights

Unified video editing: one editor for diverse text-guided video editing scenarios.
In-Context Sparse Attention: retrieves only the most relevant source-video blocks for each query block.
Training-free acceleration block: ISA can be plugged into the diffusion transformer attention backend.
Efficient sparse kernels: supports both TileLang and Triton implementations.
Strong speedup: the paper reports up to 2.8× faster attention than FlashAttention-2 at 65K tokens on RTX 4090.

Demo

Source Video	LIVEditor Output (TileLang)	LIVEditor Output (Triton)

MP4 downloads: source · TileLang output · Triton output

More qualitative comparisons are available on the project page.

Method

In-Context Sparse Attention

LIVEditor stores compressed key/value representations of the source video, computes block-wise relevance scores, retrieves top-k source blocks, and applies sparse piecewise attention for efficient in-context editing. Query blocks with sharper attention patterns can use full FlashAttention, while diffuse blocks use the sparse Top-K path.

Quick Start

Clone the code repository:

git clone https://github.com/xie-lab-ml/Lightning-Unified-Video-Editor-via-In-Context-Sparse-Attention.git
cd Lightning-Unified-Video-Editor-via-In-Context-Sparse-Attention
pip install -r requirements.txt

Download the LIVEditor checkpoint:

pip install huggingface_hub
huggingface-cli download sst12345/liveditor liveditor_ckpt.bin --local-dir .

Run inference:

python inference.py \
  --config inference.yaml \
  --checkpoint liveditor_ckpt.bin \
  --input assets/input.mp4 \
  --prompt "Add a small golden crown with delicate jewels on top of the girl's head..." \
  --output result.mp4

Model Files

File	Description
`liveditor_ckpt.bin`	LIVEditor fine-tuned checkpoint
`assets/live_visualization.jpg`	Teaser image for the model card
`assets/in_context_sparse_attention.png`	Method overview
`assets/input.mp4`	Example input video
`assets/output_tilelang.mp4`	Example output using TileLang backend
`assets/output_triton.mp4`	Example output using Triton backend
`assets/input.gif`	Browser-friendly source preview
`assets/output_tilelang.gif`	Browser-friendly TileLang preview
`assets/output_triton.gif`	Browser-friendly Triton preview

Citation

@article{shao2026liveditor,
  title={LIVEditor: Lightning Unified Video Editing via In-Context Sparse Attention},
  author={Shao, Shitong and Zhou, Zikai and Li, Haopeng and Song, Yingwei and Zhong, Wenliang and Bai, Lichen and Xie, Zeke},
  journal={arXiv preprint arXiv:2605.04569},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for sst12345/liveditor

Base model

Wan-AI/Wan2.2-T2V-A14B

Finetuned

(39)

this model

Paper for sst12345/liveditor

Lightning Unified Video Editing via In-Context Sparse Attention

Paper • 2605.04569 • Published 8 days ago • 17