LIVEditor

Lightning Unified Video Editing via In-Context Sparse Attention

Shitong Shao · Zikai Zhou · Haopeng Li · Yingwei Song · Wenliang Zhong · Lichen Bai · Zeke Xie

Project Page Paper Code Model

LIVEditor teaser

Overview

LIVEditor is a unified video editing model built for fast in-context video editing. It introduces In-Context Sparse Attention (ISA), a lightweight sparse attention mechanism that retrieves relevant source-video context blocks instead of applying dense full attention over all source and generated video tokens.

The model is designed to preserve the editing quality of in-context full-attention video editing while substantially reducing attention latency.

Highlights

  • Unified video editing: one editor for diverse text-guided video editing scenarios.
  • In-Context Sparse Attention: retrieves only the most relevant source-video blocks for each query block.
  • Training-free acceleration block: ISA can be plugged into the diffusion transformer attention backend.
  • Efficient sparse kernels: supports both TileLang and Triton implementations.
  • Strong speedup: the paper reports up to 2.8× faster attention than FlashAttention-2 at 65K tokens on RTX 4090.

Demo

Source Video LIVEditor Output (TileLang) LIVEditor Output (Triton)
Source video preview TileLang output preview Triton output preview

MP4 downloads: source · TileLang output · Triton output

More qualitative comparisons are available on the project page.

Method

In-Context Sparse Attention

LIVEditor stores compressed key/value representations of the source video, computes block-wise relevance scores, retrieves top-k source blocks, and applies sparse piecewise attention for efficient in-context editing. Query blocks with sharper attention patterns can use full FlashAttention, while diffuse blocks use the sparse Top-K path.

Quick Start

Clone the code repository:

git clone https://github.com/xie-lab-ml/Lightning-Unified-Video-Editor-via-In-Context-Sparse-Attention.git
cd Lightning-Unified-Video-Editor-via-In-Context-Sparse-Attention
pip install -r requirements.txt

Download the LIVEditor checkpoint:

pip install huggingface_hub
huggingface-cli download sst12345/liveditor liveditor_ckpt.bin --local-dir .

Run inference:

python inference.py \
  --config inference.yaml \
  --checkpoint liveditor_ckpt.bin \
  --input assets/input.mp4 \
  --prompt "Add a small golden crown with delicate jewels on top of the girl's head..." \
  --output result.mp4

Model Files

File Description
liveditor_ckpt.bin LIVEditor fine-tuned checkpoint
assets/live_visualization.jpg Teaser image for the model card
assets/in_context_sparse_attention.png Method overview
assets/input.mp4 Example input video
assets/output_tilelang.mp4 Example output using TileLang backend
assets/output_triton.mp4 Example output using Triton backend
assets/input.gif Browser-friendly source preview
assets/output_tilelang.gif Browser-friendly TileLang preview
assets/output_triton.gif Browser-friendly Triton preview

Citation

@article{shao2026liveditor,
  title={LIVEditor: Lightning Unified Video Editing via In-Context Sparse Attention},
  author={Shao, Shitong and Zhou, Zikai and Li, Haopeng and Song, Yingwei and Zhong, Wenliang and Bai, Lichen and Xie, Zeke},
  journal={arXiv preprint arXiv:2605.04569},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sst12345/liveditor

Finetuned
(39)
this model

Paper for sst12345/liveditor