OOM on 4090 even with 480p and 3s video

#14
by jclinton1 - opened

I am using this model via diffusers and no matter what I do when it comes to the vae decoding step it tries to allocate around 32GB of vram, and so on my 4090 it OOMs.
I enabled vae tiling, vae slicing and cpu offload but this doesn't seem to have done anything. I think it would require the transformer to be fully offloaded before onloading the vae.
Am using hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_i2v_step_distilled

You support 4090 though, so would you be able to help me? Thank you.

I managed to get it to not OOM by:

  1. Splitting the DIT from the decode step and offloading everything but the decoder before this step
  2. Modifying the sample tiles to be 128x128 with latent tiles of 8x8 (which requires disabling tiling during encode time or you get an error)
  3. wrap the decode step in torch.no_grad()
jclinton1 changed discussion status to closed

Sign up or log in to comment