OOM on 4090 even with 480p and 3s video
#14
by
jclinton1
- opened
I am using this model via diffusers and no matter what I do when it comes to the vae decoding step it tries to allocate around 32GB of vram, and so on my 4090 it OOMs.
I enabled vae tiling, vae slicing and cpu offload but this doesn't seem to have done anything. I think it would require the transformer to be fully offloaded before onloading the vae.
Am using hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_i2v_step_distilled
You support 4090 though, so would you be able to help me? Thank you.
I managed to get it to not OOM by:
- Splitting the DIT from the decode step and offloading everything but the decoder before this step
- Modifying the sample tiles to be 128x128 with latent tiles of 8x8 (which requires disabling tiling during encode time or you get an error)
- wrap the decode step in torch.no_grad()
jclinton1
changed discussion status to
closed