What is the text encoder?

by sphiratrioth666 - opened 1 day ago

Discussion

sphiratrioth666

1 day ago

Is it Qwen 3 Q8/Q6/Q4?

SeeSee21

Owner about 19 hours ago

In the AIO BF16 version, it uses the standard qwen-3-4b-BF16 model with 398 layers.

For the AIO FP8 version, it’s my downscaled FP8 variant of qwen-3-4b:

28.11.2025  18:27   4,022,515,040 bytes   qwen_3_4b-fp8.safetensors
27.11.2025  19:48   8,044,982,048 bytes   qwen_3_4b.safetensors

sphiratrioth666

about 17 hours ago

•

edited about 17 hours ago

Oh, I see. I think in terms of Q4/6/8, iMatrix and bpw in EXL2/3 when it comes to LLMs so I forgot that it's Safetensors visual AI AIO meaning FP/BF 16/8 😂 Anyway, thx for response.

BTW, I don't know how it works with baking stuff in but - would it be possible for you to make a FP16 clip + FP8 weights? I usually go with the highest clip possible so 16/32 and 16/8 model in separate loaders and it gives much better results - like fixing anatomy problems etc. So I was thinking. I will try FP16 clip with FP8 weights later today to see if it's a visible gain but I'm wondering if it's even possible with those AIO/baked in tunes to mix clip/weights precision?

I mean, I run FP16 but having FP16/FP8 mix may be reasonable for 12GB GPUs and for speed/base generation before detailers or when you're using Z-image as a detailer, hmm.

SeeSee21

Owner about 14 hours ago

Yes, it is technically possible to mix different precisions like FP32 / BF16 / FP16 / FP8 between the S3-DiT and the text encoder, but I wouldn’t recommend it.
I tested mixed-precision setups (FP16 encoder + FP8 weights, BF16 encoder + FP8 weights, etc.) and the results didn’t show any meaningful improvements. Instead, they tended to introduce small visual issues such as distorted limbs, extra noise, and other minor artifacts.

If you can choose between FP16 and BF16, I recommend BF16.
BF16 keeps the same dynamic range as FP32 (because it has the same number of exponent bits), while having a reduced mantissa like FP16. This makes BF16 much more stable in practice, especially for text encoders and DiT blocks. In real image generation you won’t see any difference compared to FP32, but the model size is much smaller.

FP16 can underflow more easily because of its reduced exponent range, while BF16 avoids that problem.

So in short:
Mixing precisions is possible, but in my tests it didn’t bring benefits and only added artifacts. BF16 is generally the best choice if your GPU can run it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment