Effect of training LLM adapter for Loras

#144

by Toserme - opened 10 days ago

According to the model page and circlestone lab's own recommendation you are not supposed to train the LLM Adapter.

However, after training around 20ish different Lora across Anima Preview 1/2/3 with and without 'llm_adapter_lr=0' I honestly can't tell which approach is better.

I've had some Loras where I liked the results more with trained adapter and some where I liked them more with adapter training disabled. For reference I am using sd-scripts for the Lora training and mostly using anime tag captioning.

Anyone has any further insight on this?

khanghy1000

10 days ago

with and without 'llm_adapter_lr=0'

sd-scripts

Are you sure you trained the LLM Adapter? You need to set --network_args "train_llm_adapter=True"to make sd-scripts train the LLM Adapter.

Toserme

10 days ago

•

edited 10 days ago

with and without 'llm_adapter_lr=0'

sd-scripts

Are you sure you trained the LLM Adapter? You need to set --network_args "train_llm_adapter=True"to make sd-scripts train the LLM Adapter.

I didn't specify the flag yet the Loras produce different results on the same epochs despite using identical dataset/config+seed for the training. According to Claude there is potentially some influence on the Lora training conditions even if 'train_llm_adapter=True' isn't specified. Could also just be AI hallucinations though.

There is also the case of circlestone_labs themselves pointing out to disable the LLM adapter training with 'llm_adapter_lr=0'

https://civitai.com/models/2536147/greg-rutkowski-style-anima

Of course this is for diffusion-pipe trainer so can't confirm if it also applies to sd-scripts but if we assume that the default Anima behavior is for the LLM adapter to not be trained in any capacity, then explicitly adding llm_adapter_lr=0 for Lora training would seem kinda pointless.

khanghy1000

10 days ago

Please stop asking AI and start reading the documentation properly :)
https://github.com/kohya-ss/sd-scripts/blob/main/docs/anima_train_network.md

There is also the case of circlestone_labs themselves pointing out to disable the LLM adapter training with 'llm_adapter_lr=0'

Obviously, diffusion-pipe != sd-scripts

Comuse123

10 days ago

•

edited 10 days ago

There is also the case of circlestone_labs themselves pointing out to disable the LLM adapter training with 'llm_adapter_lr=0'

https://civitai.com/models/2536147/greg-rutkowski-style-anima

Of course this is for diffusion-pipe trainer so can't confirm if it also applies to sd-scripts but if we assume that the default Anima behavior is for the LLM adapter to not be trained in any capacity, then explicitly adding llm_adapter_lr=0 for Lora training would seem kinda pointless.

Diffusion-pipe works differently than sd-scripts, in diffusion-pipe you have to specify that otherwise it trains the adapter. On sd-scripts it won't train unless you specify train_llm_adapter=True. This is even stated in the anima_train_network.md docs in sd-scripts and you can see it in the lora.py files "train_llm_adapter = kwargs.get("train_llm_adapter", "false")" (line 239 of lora_anima.py)

I didn't specify the flag yet the Loras produce different results on the same epochs despite using identical dataset/config+seed for the training. According to Claude there is potentially some influence on the Lora training conditions even if 'train_llm_adapter=True' isn't specified. Could also just be AI hallucinations though.

Loras will never be identical even if the dataset, seed, and parameters are all the same. You could maybe get deterministic results if you turned off all attention types but that's unrealistic

Toserme

8 days ago

•

edited 8 days ago

Please stop asking AI and start reading the documentation properly :)
https://github.com/kohya-ss/sd-scripts/blob/main/docs/anima_train_network.md

There is also the case of circlestone_labs themselves pointing out to disable the LLM adapter training with 'llm_adapter_lr=0'

Obviously, diffusion-pipe != sd-scripts

I've read the documentation :)

Documentation != code

You are free to explain in exact details how these flags interact based on the code.

khanghy1000

8 days ago

•

edited 8 days ago

You are free to explain in exact details how these flags interact based on the code.

llm_adapter_lr flag is unused by anima_train_network.py and is only used by the full fine-tuning script anima_train.py. Instead, to adjust the LLM Adapter lr, you need to adjust it with something like "network_reg_lrs=.*llm_adapter.*=5e-5".
You need to set --network_args "train_llm_adapter=True" to make sd-scripts train the LLM Adapter. train_llm_adapter is False by default. if train_llm_adapter is False, LoRANetwork.ANIMA_ADAPTER_TARGET_REPLACE_MODULE is excluded from LoRA creation and no LoRA modules are created for the adapter blocks https://github.com/kohya-ss/sd-scripts/blob/502cc3fab2aa22c106580e2e05c4692cfde5e5ff/networks/lora_anima.py#L539-L540

TLDR: llm_adapter_lr flag is unused by the lora training script, and sd-scripts does not train the LLM Adapter by default.

Toserme

8 days ago

There is also the case of circlestone_labs themselves pointing out to disable the LLM adapter training with 'llm_adapter_lr=0'

https://civitai.com/models/2536147/greg-rutkowski-style-anima

Of course this is for diffusion-pipe trainer so can't confirm if it also applies to sd-scripts but if we assume that the default Anima behavior is for the LLM adapter to not be trained in any capacity, then explicitly adding llm_adapter_lr=0 for Lora training would seem kinda pointless.

Diffusion-pipe works differently than sd-scripts, in diffusion-pipe you have to specify that otherwise it trains the adapter. On sd-scripts it won't train unless you specify train_llm_adapter=True. This is even stated in the anima_train_network.md docs in sd-scripts and you can see it in the lora.py files "train_llm_adapter = kwargs.get("train_llm_adapter", "false")" (line 239 of lora_anima.py)

Yeah I have reviewed the docs but given the relatively substantial differences when comparing the Loras I wasn't sure if this flag fully disabled any and all adapter training. Thanks for pointing it out.

I didn't specify the flag yet the Loras produce different results on the same epochs despite using identical dataset/config+seed for the training. According to Claude there is potentially some influence on the Lora training conditions even if 'train_llm_adapter=True' isn't specified. Could also just be AI hallucinations though.

Loras will never be identical even if the dataset, seed, and parameters are all the same. You could maybe get deterministic results if you turned off all attention types but that's unrealistic

Getting true deterministic results is actually relatively easy in sd-scripts. All you have to do is modify a couple of torch related flags in the train util config, I am still using that for my Illustrious training setup to this day. I am aware that two Loras trained the same way won't be identical 1:1 without such setup but typically the difference are mostly in minute details. The bigger difference that I've noticed could be something specific to Anima though.

Toserme changed discussion status to closed 8 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment