Switch transformer & unconditional_transformer to split Q/K/V

#10
by multimodalart HF Staff - opened

Updates transformer/ and unconditional_transformer/ from fused attention.qkv/attention.o to split attention.to_q/to_k/to_v/to_out.0, matching the diffusers loader change in huggingface/diffusers#13859 (commit fbe4750).

Also deletes the per-folder diffusion_pytorch_model.safetensors.index.json files: their weight_map still pointed at the old fused qkv/o keys (and only ever referenced the single diffusion_pytorch_model.safetensors), so they would break index-based loading. This matches diffusers-internal-dev/ideogram-4-fp8-diffusers, which carries no index.json.

The q/k/v weights are contiguous row-slices of the old fused weight, so the conversion is lossless. Configs are unchanged.

multimodalart changed pull request title from Switch transformer & unconditional_transformer to split Q/K/V to Match diffusers keys
multimodalart changed pull request title from Match diffusers keys to Switch transformer & unconditional_transformer to split Q/K/V
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment