metadata
license: apache-2.0
tags:
- diffusion
- autoencoder
- image-reconstruction
- image-tokenizer
- pytorch
- fcdm
- semantic-alignment
library_name: fcdm_diffae
data-archetype/semdisdiffae_p32
Version History
| Date | Change |
|---|---|
| 2026-04-10 | Refresh standalone package: fix bf16 RMSNorm precision path in both encoder and decoder to match training code; local export tooling now preserves fp32 EMA weights for future re-exports |
| 2026-04-08 | Initial release |
Experimental patch-32 version of SemDisDiffAE.
This model extends the patch-16 SemDisDiffAE with a 2x2 bottleneck patchification after the encoder, producing 512-channel latents at H/32 x W/32 instead of the base model's 128-channel latents at H/16 x W/16. The decoder unpatchifies back to 128ch before reconstruction.
See the patch-16 SemDisDiffAE model card and its technical report for full architectural details. The p32 technical report covers only the differences.
Architecture
| Property | p32 (this model) | p16 (base) |
|---|---|---|
| Latent channels | 512 | 128 |
| Effective patch | 32 | 16 |
| Latent grid | H/32 x W/32 | H/16 x W/16 |
| Encoder patch | 16 (same) | 16 |
| Bottleneck patchify | 2x2 | none |
| Parameters | 88.8M (same) | 88.8M |
Quick Start
from fcdm_diffae import FCDMDiffAE
model = FCDMDiffAE.from_pretrained("data-archetype/semdisdiffae_p32", device="cuda")
# Encode — returns whitened 512ch latents at H/32 x W/32
latents = model.encode(images) # [B,3,H,W] in [-1,1] -> [B,512,H/32,W/32]
# Decode
recon = model.decode(latents, height=H, width=W)
# Reconstruct
recon = model.reconstruct(images)
Training
Same losses and hyperparameters as the base SemDisDiffAE (DINOv2 semantic alignment, VP posterior variance expansion, latent scale penalty). Trained for 275k steps. See the base model training section for details.
Dependencies
- PyTorch >= 2.0
- safetensors
Citation
@misc{semdisdiffae,
title = {SemDisDiffAE: A Semantically Disentangled Diffusion Autoencoder},
author = {data-archetype},
email = {data-archetype@proton.me},
year = {2026},
month = apr,
url = {https://huggingface.co/data-archetype/semdisdiffae},
}
License
Apache 2.0