Thank You For The Amazing Work!

#1
by onixxexxd5555LOAF - opened

Your work has breathed a fresh life into my GPU. I am benefiting both from the extension and checkpoints you are making.
I made some quick comparisons for how the newest int8 variant works and I am pleased with the results.

q8 vs int8convrot vs nunchaku

bf16 vs int8 convrot anima

If you are open to suggestions though, I would recommend making the dynamic lora behavior the default with option to disable it. The lora effect is just too weak without it in my experience and the >10% speed hit seems to be a necessary evil.

int8 dynamic lora on off

Finally I would like to ask if you have the full quantization pipeline script to make these checkpoints somewhere, if you want to share it. I am interested in trying to make some myself.

Making your own is exceedingly simple:
https://github.com/BobJohnson24/ComfyUI-INT8-Fast/blob/main/example_workflows/int8_save_convrot_model.json

The issue with non dynamic lora is somewhat known, though it does appear even more affected in your example than I've seen in my own testing. I.e. the snr db for the tested loras was still in the ~25 range, which means there are mostly subtle changes. From a quick glance at your result that looks like it'd score around 8 at best. I'd be interested in trying out the lora if it is public to see if there are any additional bugs that need fixing.

Dynamic lora has of course it's own fair share of issues, from only supporting standard loras to slowing down inference speed.

My preferred approach is to utilize the pre-lora node/input, to merge into a BF16 model before converting with on the fly quantization/ConvRot on.
Of course that means you now have to keep a bf16 checkpoint around, but it supports more lora varieties and does not slow down inference beyond initial conversion speed (~25 seconds for example with Chroma BF16.)

Just as an example, I downloaded the first random chroma lora I could find, and it works just fine with non-dynamic lora.
New Project(1)

Oh I didn't expect making a checkpoint to be so simple, thanks.
As for the dynamic lora, I grabbed an anima lora from civit, tried it with the dynamic lora option off, and it worked well indeed. (I was getting "aimdo: /project/src/model-vbar.c:74:WARNING:VBAR 0x7f063895a200: Page 273 pin_count=1" message at each step though, not sure if it is of relevance regardless.)
Then I tried another lora I made and it worked fine with the dynamic lora off.
This particular lora seems problem for some reason. I normally wasn't intending to release it publicly, but hoping that it helps to solve the problem, here is a temporary link:
"https://litter.catbox.moe/12piwnf7qn2d4fjl.safetensors"
TW info is in the metadata. I don't think there is anything highly unusual about it. It was trained on sd-scripts (but so do my other lora that works well with dynamic lora off.)
I will also share the workflow that lora didn't work with:
"https://litter.catbox.moe/sln10yhliq0pi4jf.png"
Thanks for the response.

New Project(2)

I've added a special dropdown for stochastic lora merging in this testing branch https://github.com/BobJohnson24/ComfyUI-INT8-Fast/tree/RAMExp (Edit: now merged to main) which does appear to fix the issue for anima loras. It seems to depend on the specific lora whether "Stochastic" or "None" do better. For example the nicegirls lora for anima appears to do better without it.

Not sure what went wrong to get all the aimdo messages, could be we properly need to register weights so they don't try to fight each other in VRAM.

I have also ran a quick 16 sample test against BF16, and as usual dynamic wins, pre lora is 2nd place but always close to dynamic, and yeah our previous "none" approach was pretty screwed up

Metric I8Dynamic I8None I8Pre I8Stoch
Rel-RMSE ↓ 0.11532 ±0.01719 0.79263 ±0.03690 0.12302 ±0.01520 0.13954 ±0.01769
SNR dB ↑ 23.07 ±1.13 2.63 ±0.40 22.50 ±1.05 21.04 ±0.92
Cos-sim ↑ 0.983092 ±0.004472 0.594366 ±0.035211 0.982433 ±0.003686 0.977338 ±0.005859

★ = best value for that metric  |  ± = avg of per-timestep SE (std/√n_seeds) [--stratify-std]

Thanks for the speedy fix.
Since some Anima loras worked fine enough with the previous approach I have to ask, did you notice any anima loras getting worse with Stochastic compared to None during your testing? I have tested another Anima lora that worked fine with None previously, and it seems to do fine with both None and Stochastic, but that is a possibility to keep in mind perhaps.
Ultimately this sounds like something that might need some ironing out with testing of different loras across variety of models before it's settled.
I will inform if I come across a lora that breaks down with both None and Stochastic in another model.

Thanks for the speedy fix.
Since some Anima loras worked fine enough with the previous approach I have to ask, did you notice any anima loras getting worse with Stochastic compared to None during your testing? I have tested another Anima lora that worked fine with None previously, and it seems to do fine with both None and Stochastic, but that is a possibility to keep in mind perhaps.
Ultimately this sounds like something that might need some ironing out with testing of different loras across variety of models before it's settled.
I will inform if I come across a lora that breaks down with both None and Stochastic in another model.

I have seen some degradation in one benchmark with the nicegirls lora, but the whole benchmark ended up being a bit noisy compared to previous tests. so I am unsure what conclusions one can get from it:

Metric I8Dynamic I8None I8Pre I8Stoch
Rel-RMSE ↓ 0.14496 ±0.03317 0.13930 ±0.03441 0.07309 ±0.01115 0.15527 ±0.03282
SNR dB ↑ 21.56 ±1.48 21.87 ±1.36 25.20 ±0.93 19.96 ±1.21
Cos-sim ↑ 0.972138 ±0.012902 0.972033 ±0.014283 0.994138 ±0.002038 0.971228 ±0.014210

So if there aren't any issues with hijacking this thread further as general purpose bug reporting, making convrot checkpoints seem bugged currently. I can make non-convrot checkpoints, and convrot works fine when on the fly quantizing bf16, but convrot int8 checkpoints produce gibberish results when I load and use them.
At least for anima.

So if there aren't any issues with hijacking this thread further as general purpose bug reporting, making convrot checkpoints seem bugged currently. I can make non-convrot checkpoints, and convrot works fine when on the fly quantizing bf16, but convrot int8 checkpoints produce gibberish results when I load and use them.
At least for anima.

Oh yeah seems I accidentally broke the comfy_quant creation. I'll get right on it.

Edit: It should be fixed now.

Sign up or log in to comment