| --- |
| language: |
| - "en" |
| tags: |
| - video |
| license: apache-2.0 |
| pipeline_tag: text-to-video |
| library_name: diffusers |
| --- |
| |
| <p align="center"> |
| <img src="assets/logo.jpg" height=30> |
| </p> |
|
|
| # FastMochi Model Card |
|
|
| ## Model Details |
|
|
| <div align="center"> |
| <table style="margin-left: auto; margin-right: auto; border: none;"> |
| <tr> |
| <td> |
| <img src="assets/mochi-demo.gif" width="640" alt="Mochi Demo"> |
| </td> |
| </tr> |
| <tr> |
| <td style="text-align:center;"> |
| Get 8X diffusion boost for Mochi with FastVideo |
| </td> |
| </tr> |
| </table> |
| </div> |
| |
| FastMochi is an accelerated [Mochi](https://huggingface.co/genmo/mochi-1-preview) model. It can sample high quality videos with 8 diffusion steps. That brings around 8X speed up compared to the original Mochu with 64 steps. |
|
|
| - **Developed by**: [Hao AI Lab](https://hao-ai-lab.github.io/) |
| - **License**: Apache-2.0 |
| - **Distilled from**: [Mochi](https://huggingface.co/genmo/mochi-1-preview) |
| - **Github Repository**: https://github.com/hao-ai-lab/FastVideo |
|
|
| ## Usage |
|
|
| - Clone [Fastvideo](https://github.com/hao-ai-lab/FastVideo) repository and follow the inference instructions in the README. |
| - You can also run FastMochi using the official [Mochi repository](https://github.com/Tencent/HunyuanVideo) with the script below and this [compatible weight](https://huggingface.co/FastVideo/FastMochi). |
|
|
| <details> |
| <summary>Code</summary> |
|
|
| ```python |
| from genmo.mochi_preview.pipelines import ( |
| DecoderModelFactory, |
| DitModelFactory, |
| MochiMultiGPUPipeline, |
| T5ModelFactory, |
| linear_quadratic_schedule, |
| ) |
| from genmo.lib.utils import save_video |
| import os |
| |
| with open("prompt.txt", "r") as f: |
| prompts = [line.rstrip() for line in f] |
| |
| pipeline = MochiMultiGPUPipeline( |
| text_encoder_factory=T5ModelFactory(), |
| world_size=4, |
| dit_factory=DitModelFactory( |
| model_path=f"weights/dit.safetensors", model_dtype="bf16" |
| ), |
| decoder_factory=DecoderModelFactory( |
| model_path=f"weights/decoder.safetensors", |
| ), |
| ) |
| # read prompt line by line from prompt.txt |
| |
| |
| output_dir = "outputs" |
| os.makedirs(output_dir, exist_ok=True) |
| for i, prompt in enumerate(prompts): |
| video = pipeline( |
| height=480, |
| width=848, |
| num_frames=163, |
| num_inference_steps=8, |
| sigma_schedule=linear_quadratic_schedule(8, 0.1, 6), |
| cfg_schedule=[1.5] * 8, |
| batch_cfg=False, |
| prompt=prompt, |
| negative_prompt="", |
| seed=12345, |
| )[0] |
| save_video(video, f"{output_dir}/output_{i}.mp4") |
| ``` |
|
|
| </details> |
|
|
|
|
| ## Training details |
|
|
| FastMochi is consistency distillated on the [MixKit](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main) dataset with the following hyperparamters: |
| - Batch size: 32 |
| - Resulotion: 480X848 |
| - Num of frames: 169 |
| - Train steps: 128 |
| - GPUs: 16 |
| - LR: 1e-6 |
| - Loss: huber |
|
|
| ## Evaluation |
| We provide some qualitative comparisons between FastMochi 8 step inference v.s. the original Mochi with 8 step inference: |
|
|
|
|
| | FastMochi 6 steps | Mochi 6 steps | |
| | --- | --- | |
| |  |  | |
| |  |  | |
| |  |  | |
| |  |  | |
|
|
|
|