--- license: apache-2.0 tags: - diffusion - autoencoder - image-reconstruction - image-tokenizer - pytorch - fcdm - semantic-alignment library_name: fcdm_diffae --- # data-archetype/semdisdiffae_p32 ### Version History | Date | Change | |------|--------| | 2026-04-10 | Refresh standalone package: fix bf16 RMSNorm precision path in both encoder and decoder to match training code; local export tooling now preserves fp32 EMA weights for future re-exports | | 2026-04-08 | Initial release | **Experimental patch-32 version** of [SemDisDiffAE](https://huggingface.co/data-archetype/semdisdiffae). This model extends the patch-16 SemDisDiffAE with a 2x2 bottleneck patchification after the encoder, producing **512-channel latents at H/32 x W/32** instead of the base model's 128-channel latents at H/16 x W/16. The decoder unpatchifies back to 128ch before reconstruction. See the [patch-16 SemDisDiffAE model card](https://huggingface.co/data-archetype/semdisdiffae) and its [technical report](https://huggingface.co/data-archetype/semdisdiffae/blob/main/technical_report_semantic.md) for full architectural details. The [p32 technical report](technical_report_p32.md) covers only the differences. ## Architecture | Property | p32 (this model) | p16 (base) | |----------|-----------------|------------| | Latent channels | 512 | 128 | | Effective patch | 32 | 16 | | Latent grid | H/32 x W/32 | H/16 x W/16 | | Encoder patch | 16 (same) | 16 | | Bottleneck patchify | 2x2 | none | | Parameters | 88.8M (same) | 88.8M | ## Quick Start ```python from fcdm_diffae import FCDMDiffAE model = FCDMDiffAE.from_pretrained("data-archetype/semdisdiffae_p32", device="cuda") # Encode — returns whitened 512ch latents at H/32 x W/32 latents = model.encode(images) # [B,3,H,W] in [-1,1] -> [B,512,H/32,W/32] # Decode recon = model.decode(latents, height=H, width=W) # Reconstruct recon = model.reconstruct(images) ``` ## Training Same losses and hyperparameters as the base SemDisDiffAE (DINOv2 semantic alignment, VP posterior variance expansion, latent scale penalty). Trained for 275k steps. See the [base model training section](https://huggingface.co/data-archetype/semdisdiffae/blob/main/technical_report_semantic.md#6-training) for details. ## Dependencies - PyTorch >= 2.0 - safetensors ## Citation ```bibtex @misc{semdisdiffae, title = {SemDisDiffAE: A Semantically Disentangled Diffusion Autoencoder}, author = {data-archetype}, email = {data-archetype@proton.me}, year = {2026}, month = apr, url = {https://huggingface.co/data-archetype/semdisdiffae}, } ``` ## License Apache 2.0