data-archetype/capacitor_decoder

Capacitor decoder: a faster, lighter FLUX.2-compatible latent decoder built on the SemDisDiffAE architecture.

Decode Speed

Resolution	Speedup vs FLUX.2	Peak VRAM Reduction	capacitor_decoder (ms/image)	FLUX.2 VAE (ms/image)	capacitor_decoder peak VRAM	FLUX.2 peak VRAM
`512x512`	`1.85x`	`59.3%`	`11.40`	`21.14`	`391.6 MiB`	`961.9 MiB`
`1024x1024`	`3.28x`	`79.1%`	`26.31`	`86.24`	`601.4 MiB`	`2876.4 MiB`
`2048x2048`	`4.70x`	`86.4%`	`86.29`	`405.84`	`1437.4 MiB`	`10531.4 MiB`

These measurements are decode-only. Each image is first encoded once with the same FLUX.2 encoder, latents are cached in memory, and then both decoders are timed over the same cached latent set.

2k PSNR Benchmark

Model	Mean PSNR (dB)	Std (dB)	Median (dB)	Min (dB)	P5 (dB)	P95 (dB)	Max (dB)
FLUX.2 VAE	36.28	4.53	36.07	22.73	28.89	43.63	47.38
capacitor_decoder	36.34	4.50	36.29	23.28	29.06	43.66	47.43

Delta vs FLUX.2	Mean (dB)	Std (dB)	Median (dB)	Min (dB)	P5 (dB)	P95 (dB)	Max (dB)
capacitor_decoder - FLUX.2	0.055	0.531	0.062	-1.968	-0.811	0.886	2.807

Evaluated on 2000 validation images: roughly 2/3 photographs and 1/3 book covers. Each image is encoded once with FLUX.2 and reused for both decoders.

Results viewer

Usage

import torch
from diffusers.models import AutoencoderKLFlux2

from capacitor_decoder import CapacitorDecoder, CapacitorDecoderInferenceConfig


def flux2_patchify_and_whiten(
    latents: torch.Tensor,
    vae: AutoencoderKLFlux2,
) -> torch.Tensor:
    b, c, h, w = latents.shape
    if h % 2 != 0 or w % 2 != 0:
        raise ValueError(f"Expected even FLUX.2 latent grid, got H={h}, W={w}")
    z = latents.reshape(b, c, h // 2, 2, w // 2, 2)
    z = z.permute(0, 1, 3, 5, 2, 4).reshape(b, c * 4, h // 2, w // 2)
    mean = vae.bn.running_mean.view(1, -1, 1, 1).to(device=z.device, dtype=torch.float32)
    var = vae.bn.running_var.view(1, -1, 1, 1).to(device=z.device, dtype=torch.float32)
    std = torch.sqrt(var + float(vae.config.batch_norm_eps))
    return (z.to(torch.float32) - mean) / std


device = "cuda"
flux2 = AutoencoderKLFlux2.from_pretrained(
    "BiliSakura/VAEs",
    subfolder="FLUX2-VAE",
    torch_dtype=torch.bfloat16,
).to(device)
decoder = CapacitorDecoder.from_pretrained(
    "data-archetype/capacitor_decoder",
    device=device,
    dtype=torch.bfloat16,
)

image = ...  # [1, 3, H, W] in [-1, 1], with H and W divisible by 16

with torch.inference_mode():
    posterior = flux2.encode(image.to(device=device, dtype=torch.bfloat16))
    latent_mean = posterior.latent_dist.mean

    # Default path: match the usual FLUX.2 convention.
    # Whiten here, then let capacitor_decoder unwhiten internally before decode.
    latents = flux2_patchify_and_whiten(latent_mean, flux2)
    recon = decoder.decode(
        latents,
        height=int(image.shape[-2]),
        width=int(image.shape[-1]),
        inference_config=CapacitorDecoderInferenceConfig(num_steps=1),
    )

Whitening and dewhitening are optional, but they must stay consistent. The default above matches the usual FLUX.2 pipeline behavior. If your upstream path already gives you raw patchified decoder-space latents instead, skip whitening upstream and call decode(..., latents_are_flux2_whitened=False).

Details

Default input contract: FLUX.2 patchified latents with FLUX.2 BN whitening still applied.
Default decoder behavior: unwhiten with saved FLUX.2 BN running stats, then decode.
Optional raw-latent mode: disable whitening upstream and call decode(..., latents_are_flux2_whitened=False).
Reused decoder architecture: SemDisDiffAE
Technical report
SemDisDiffAE technical report

Citation

@misc{capacitor_decoder,
  title   = {Capacitor Decoder: A Faster, Lighter FLUX.2-Compatible Latent Decoder},
  author  = {data-archetype},
  email   = {data-archetype@proton.me},
  year    = {2026},
  month   = apr,
  url     = {https://huggingface.co/data-archetype/capacitor_decoder},
}

Downloads last month: 14

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support