Diffusion Single File
comfyui

Improving the existing text encoder (Qwen3-0.6B-Base)

#183
by Iwaku-Real - opened

I just ran Heretic for the first time and I got some decent results from creating an abliterated version of Qwen3-0.6B-Base at FP32: https://huggingface.co/Iwaku-Real/Qwen3-0.6B-Base-heretic-test. Changes are very subtle in actual image generation and I'm not sure if it's an improvement, but considering the significant drop in refusals, you decide.

However, I do warn you that you should NOT use any model without the Base in the name as a text encoder or you risk getting slightly degraded results. That's because they're based on the post-trained Qwen3-0.6B- which does not give all the hidden states (encoded text) that Anima actually expects as it's trained on the hidden states of Qwen3-0.6B-Base. I personally think Anima should have been trained on an abliterated Qwen3-0.6B (the non-Base with post-training) so people could train both the diffusion model and TE in LoRA training.

That being said I think there could be improvements possible by working with the current text encoder (it would require VERY precise fine-tuning) but I don't doubt that we're not pleased for the Anima to use a nearly year-old base model, a T5 adapter baked in just to be able to use the aforementioned base, and of course Qwen3 instead of Qwen3.5 (which could "see" what it's generating through built-in VL).

There is already a very very pointlessly long discussion on using the abliterated TE's on Anima in the discussions here. I say pointless because you won't get any better results from using an abliterated TE; the model was trained on the normal base Qwen 0.6B and expects embeddings from that model. They cannot output "refusals" as a TE because all they output are embeddings. The reason you don't get NSFW on some other models with LLM TE's is simply because they weren't trained on it or their dataset was censored.

Now, granted, the degradation from using an abliterated TE is probably pretty small since the whole point of heretic abliterations is to affect the model as little as possible, but no matter what you're just losing out on quality for no gain at all. Just use the normal TE

It's a bit unfortunate, but that doesn't hold much significance.
In the Anima model, what the TE handles is closer to translation rather than interpretation and the difference isn't really due to censorship. It's more that a TE altered through partial pruning or tuning simply produces a slightly off-kilter translation compared to the original model, and that's what causes the image output to shift.

Sign up or log in to comment