Instructions to use circlestone-labs/Anima with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusion Single File
How to use circlestone-labs/Anima with Diffusion Single File:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Lora character training
Good afternoon, I've encountered a problem. I'm training Lora character with different characters, and I've been training it on Illustrious/noobai, but yesterday I started trying to train Lora on the Anima standalone trainer, and I've encountered issues: the white background of the character doesn't disappear even if I increase the power of the promt, and if I increase it too much, it causes a problem similar to the one in the picture. Can anyone help me understand the issue? The training parameters were as follows:
gpu_ids = "0"
[training_arguments]
output_name = "Training_test_lora"
save_model_as = "safetensors"
max_train_epochs = 7
save_every_n_epochs = 1
sample_every_n_epochs = 1
log_with = "tensorboard"
learning_rate = 1
text_encoder_lr = 1
optimizer_type = "Prodigy"
optimizer_args = [ "weight_decay=0.01", "decouple=True" ]
lr_scheduler = "cosine"
lr_warmup_steps = 0
mixed_precision = "bf16"
save_precision = "bf16"
max_data_loader_n_workers = 2
gradient_accumulation_steps = 1
max_grad_norm = 1
gradient_checkpointing = true
flash_attn = false
torch_compile = false
lowram = false
blocks_to_swap = 0
persistent_data_loader_workers = true
seed = 42
cache_latents_to_disk = false
vae_batch_size = 1
cache_text_encoder_outputs_to_disk = false
disable_bucket_shuffle = true
multigpu_mode = "ddp"
deepspeed = false
use_cuda_direct = false
ddp_gradient_as_bucket_view = false
ddp_static_graph = false
use_fsdp = false
fsdp_sharding_strategy = "1"
fsdp_offload_params = false
fsdp_reshard_after_forward = false
fsdp_activation_checkpointing = false
fsdp_cpu_ram_efficient_loading = false
fsdp_backward_prefetch = ""
fsdp_forward_prefetch = false
fsdp_use_orig_params = true
fsdp_limit_all_gathers = true
fsdp_auto_wrap_policy = "NO_WRAP"
fsdp_min_num_params = 100_000_000
fsdp_transformer_layer_cls_to_wrap = ""
fsdp2_reshard_after_forward = true
fsdp2_offload_params = false
fsdp2_activation_checkpointing = false
fsdp2_cpu_ram_efficient_loading = false
fsdp2_auto_wrap_policy = "NO_WRAP"
fsdp2_min_num_params = 100_000_000
fsdp2_transformer_layer_cls_to_wrap = ""
step_profile = false
profile_microbatch = false
[network_arguments]
network_module = "networks.lora_anima"
network_dim = 24
network_alpha = 12
network_train_unet_only = true
network_dropout = 0.05
auto_resume_last_state = true
[anima_arguments]
timestep_sample_method = "logit_normal"
discrete_flow_shift = 3
weighting_scheme = "logit_normal"
You should probably stop using Prodigy as it's very likely using an LR that's way too high when Anima doesn't really want high LR in the first place (Prodigy almost always overshoots LR anyways and overfits hard). You should use AdamW/AdamW8bit instead
You should probably stop using Prodigy as it's very likely using an LR that's way too high when Anima doesn't really want high LR in the first place (Prodigy almost always overshoots LR anyways and overfits hard). You should use AdamW/AdamW8bit instead
Yes, I tried with AdamW/AdamW8bit, there was also such a problem, there was a white background and occasionally the right background skipped, also I do not understand a little bit if the markup of the dataset of characters differs from SDXL, I got used to training on prodigy/cosine and also the dataset by the usual habitual solution I cleaned from the background, is it considered an error?
I applied the markup based on a recommendation from a neural network based on comments from Reddit, without the clutter of Sdxl:
Perstest, 1girl, perstestOutf, long hair, blue eyes, large breasts, brown hair, long sleeves, hair ornament, cleavage, very long hair, pointy ears, star pointy earrings, blue choker, virtual youtuber, beads hairclip, beads necklace, white kneehighs, red bow, clothing cutout, thigh strap, blue dress, white socks, short dress, frilled dress, cleavage cutout, single thighhigh, puffy long sleeves, blue nails, dragon horns, asymmetrical legwear, star hair ornament, dragon tail, bridal garter, uneven legwear, single sock, frilled kneehighs, demon wings, low wings, heart ahoge, heart-shaped pupils, criss-cross straps, criss-cross halter, blue skirt, plaid skirt, frilled skirt, pleated skirt, white thighhighs, cowboy shot,
I applied the markup based on a recommendation from a neural network based on comments from Reddit, without the clutter of Sdxl:
Perstest, 1girl, perstestOutf, long hair, blue eyes, large breasts, brown hair, long sleeves, hair ornament, cleavage, very long hair, pointy ears, star pointy earrings, blue choker, virtual youtuber, beads hairclip, beads necklace, white kneehighs, red bow, clothing cutout, thigh strap, blue dress, white socks, short dress, frilled dress, cleavage cutout, single thighhigh, puffy long sleeves, blue nails, dragon horns, asymmetrical legwear, star hair ornament, dragon tail, bridal garter, uneven legwear, single sock, frilled kneehighs, demon wings, low wings, heart ahoge, heart-shaped pupils, criss-cross straps, criss-cross halter, blue skirt, plaid skirt, frilled skirt, pleated skirt, white thighhighs, cowboy shot,
You can use AdamW and lower the learning rate to around 0.00002. Tagging can be described using natural language. With an image repeat count of 10, train for 10 epochs; if it underfits, continue training. When tagging, you need to describe the background as well, otherwise the model will treat the background as part of the character's features. Here are some personal training suggestions: it is recommended to train on a Linux system, which can nearly double the speed. Using a mixed resolution of 512, 768, and 1024 for training can also speed up the process, and the results are almost identical to training at 1024 resolution.
Below is the tagging prompt I personally use for large language models, which you can modify according to your needs. I personally use Gemini for tagging. You can download the accompanying script from my repository "kongbai-84/soultide_lora" at the "main" branch, and ask the large language model to explain how to use it and translate the interface into English.
Use English natural language to tag the characters in the images for the training set of an anime-specialized LoRA model. The specific rules are as follows:
- [Clothing Description] Each piece of clothing corresponds to only one independent tag, separated by English commas (e.g., white blouse, black pleated skirt, knee-high socks). Prohibit the use of wearing verbs such as "wearing", "dressed in", or "putting on"; instead, use noun phrases or prepositional phrases for direct descriptions (e.g., red scarf around neck, gloves on hands). Avoid general terms (like shoes, clothes) and use precise descriptions instead (like white platform sneakers, sheer stockings). Keep only the most appropriate one among synonymous tags for the same type of clothing, without repetition. One piece of clothing/accessory corresponds to exactly one prompt word. As long as it is a piece of clothing, it corresponds to only one prompt word. Do not use different prompt words just because the perspective changes while the clothing itself hasn't changed (do not change the prompt word for a specific piece of clothing, but also do not forcefully apply the complete prompt words to images where the corresponding clothing does not appear; adjust according to the visual cropping of the image).
- [Directional Description] Allow the use of spatial directional words for auxiliary positioning, such as on the left arm, around the waist, on the right wrist, etc.
- [Other Content] Retain tags describing character features (hair color, hairstyle, eye color, pupil shape, etc.), actions, and backgrounds. Describe actions using natural language. Add "@lhcx" at the very beginning of the prompt. Use natural language to describe the art style (do not use vague descriptions like "anime" or "exquisite anime illustration"; describe the style, painting method, and brushstrokes in detail). Keep only the most accurate synonym for character features. If the uploaded tags do not match the image or there are omissions, supplement or correct them based on the image content. When summarizing the character's face shape, put it in the "Others" category.
Use natural language to describe the character's actions in detail. - The prompt format should be: Art style, a XXX picture of a girl named XXX, a girl named XXX has XXX appearance, a girl named XXX has XXX clothing, a girl named XXX performs XXX action. Replace XXX with the character's name and append the "(soul tide)" suffix. Include this suffix in the summary as well. Prioritize following the user's new instructions.
- [Summary] After completing the above organization, gather and arrange all clothing-related tags together.
IMPORTANT: Please strictly maintain the following format for your reply. Do not change the file name marker, and place it in a code block to prevent the '#' symbol from being swallowed, so that I can write it back to the file via a script:
FILE: filename.txt
tag1, tag2, tag3...
Output the tagging/modification logic first, then output the prompt words, and finally summarize the character and clothing features in the following format:
Character Features:
Clothing Features:
Props (leave blank if none):
Others:
Place the modified tags in a code block, and output the summary directly.
Note that classifications like expressions should be placed in "Others". Check to ensure the tags in the Plaintext are consistent with those in the summary, avoiding situations where tags exist in the summary but not in the Plaintext, or vice versa.
When tagging, use natural language to describe the character's appearance and clothing in detail. The character features in the summary should not include perspective. Expressions (eye color belongs to Character Features, while closed eyes belongs to Others) should be placed in "Others", breast size should be placed in "Others", and traits like a mole on the breast should be placed in "Character Features".
When summarizing, the character, clothing, and props should not carry the prefix "a girl named XXX"; only place the character's name at the beginning of the character summary.
I applied the markup based on a recommendation from a neural network based on comments from Reddit, without the clutter of Sdxl:
Perstest, 1girl, perstestOutf, long hair, blue eyes, large breasts, brown hair, long sleeves, hair ornament, cleavage, very long hair, pointy ears, star pointy earrings, blue choker, virtual youtuber, beads hairclip, beads necklace, white kneehighs, red bow, clothing cutout, thigh strap, blue dress, white socks, short dress, frilled dress, cleavage cutout, single thighhigh, puffy long sleeves, blue nails, dragon horns, asymmetrical legwear, star hair ornament, dragon tail, bridal garter, uneven legwear, single sock, frilled kneehighs, demon wings, low wings, heart ahoge, heart-shaped pupils, criss-cross straps, criss-cross halter, blue skirt, plaid skirt, frilled skirt, pleated skirt, white thighhighs, cowboy shot,You can use AdamW and lower the learning rate to around 0.00002. Tagging can be described using natural language. With an image repeat count of 10, train for 10 epochs; if it underfits, continue training. When tagging, you need to describe the background as well, otherwise the model will treat the background as part of the character's features. Here are some personal training suggestions: it is recommended to train on a Linux system, which can nearly double the speed. Using a mixed resolution of 512, 768, and 1024 for training can also speed up the process, and the results are almost identical to training at 1024 resolution.
Below is the tagging prompt I personally use for large language models, which you can modify according to your needs. I personally use Gemini for tagging. You can download the accompanying script from my repository "kongbai-84/soultide_lora" at the "main" branch, and ask the large language model to explain how to use it and translate the interface into English.
Use English natural language to tag the characters in the images for the training set of an anime-specialized LoRA model. The specific rules are as follows:
- [Clothing Description] Each piece of clothing corresponds to only one independent tag, separated by English commas (e.g., white blouse, black pleated skirt, knee-high socks). Prohibit the use of wearing verbs such as "wearing", "dressed in", or "putting on"; instead, use noun phrases or prepositional phrases for direct descriptions (e.g., red scarf around neck, gloves on hands). Avoid general terms (like shoes, clothes) and use precise descriptions instead (like white platform sneakers, sheer stockings). Keep only the most appropriate one among synonymous tags for the same type of clothing, without repetition. One piece of clothing/accessory corresponds to exactly one prompt word. As long as it is a piece of clothing, it corresponds to only one prompt word. Do not use different prompt words just because the perspective changes while the clothing itself hasn't changed (do not change the prompt word for a specific piece of clothing, but also do not forcefully apply the complete prompt words to images where the corresponding clothing does not appear; adjust according to the visual cropping of the image).
- [Directional Description] Allow the use of spatial directional words for auxiliary positioning, such as on the left arm, around the waist, on the right wrist, etc.
- [Other Content] Retain tags describing character features (hair color, hairstyle, eye color, pupil shape, etc.), actions, and backgrounds. Describe actions using natural language. Add "@lhcx" at the very beginning of the prompt. Use natural language to describe the art style (do not use vague descriptions like "anime" or "exquisite anime illustration"; describe the style, painting method, and brushstrokes in detail). Keep only the most accurate synonym for character features. If the uploaded tags do not match the image or there are omissions, supplement or correct them based on the image content. When summarizing the character's face shape, put it in the "Others" category.
Use natural language to describe the character's actions in detail.- The prompt format should be: Art style, a XXX picture of a girl named XXX, a girl named XXX has XXX appearance, a girl named XXX has XXX clothing, a girl named XXX performs XXX action. Replace XXX with the character's name and append the "(soul tide)" suffix. Include this suffix in the summary as well. Prioritize following the user's new instructions.
- [Summary] After completing the above organization, gather and arrange all clothing-related tags together.
IMPORTANT: Please strictly maintain the following format for your reply. Do not change the file name marker, and place it in a code block to prevent the '#' symbol from being swallowed, so that I can write it back to the file via a script:
FILE: filename.txt
tag1, tag2, tag3...
Output the tagging/modification logic first, then output the prompt words, and finally summarize the character and clothing features in the following format:
Character Features:
Clothing Features:
Props (leave blank if none):
Others:Place the modified tags in a code block, and output the summary directly.
Note that classifications like expressions should be placed in "Others". Check to ensure the tags in the Plaintext are consistent with those in the summary, avoiding situations where tags exist in the summary but not in the Plaintext, or vice versa.
When tagging, use natural language to describe the character's appearance and clothing in detail. The character features in the summary should not include perspective. Expressions (eye color belongs to Character Features, while closed eyes belongs to Others) should be placed in "Others", breast size should be placed in "Others", and traits like a mole on the breast should be placed in "Character Features".
When summarizing, the character, clothing, and props should not carry the prefix "a girl named XXX"; only place the character's name at the beginning of the character summary.
Thanks for the reply. I'll try to rework my training approach, but I still don't understand the tag description. Was it a prompt for "labeling" or a guide on how to label a dataset yourself?
I'm still trying to find training parameters, as I've got transfer results so far, but things like "jewelry" often transfer incorrectly. I'm also still having problems with the transfer itself: the background behind the character stubbornly doesn't change, even when I specify something like "girl standing in bathroom." It often makes the background white or partially changes, mixing in a plain background. I've also encountered a problem I'm trying to solve: clothing sticking to the character. For example, my character has three poses: naked, in a dress, and in a sweater. During subsequent generations, the character is generated in the correct pose, but now wearing clothing that shouldn't be there according to the prompt. I also can't understand why the Anima model stubbornly pushes any censorship on my picture. Even if I write, for example, "the character changes clothes in the shower," it pushes censorship on my chest... and it would be fine if this happened on a woman's chest, but sometimes it also covers a man's chest with a haze HD, so I don't yet know how to deal with the censorship and the problem with training.
The part I provided is a prompt intended for Large Language Models. You can send that prompt along with your images to Gemini or other multimodal models with vision capabilities, and retrain using the tags generated by the LLM. This should solve the background and clothing overfitting issues you are encountering. Adding tags like "nsfw" and "uncensored" and increasing their weights to 2-8 should help avoid the issue of constantly getting censored images. Alternatively, you could try changing the reference artist. Constantly generating censored images is usually caused by overfitting, which happens when all of the artist's works in the dataset are censored.
I applied the markup based on a recommendation from a neural network based on comments from Reddit, without the clutter of Sdxl:
Perstest, 1girl, perstestOutf, long hair, blue eyes, large breasts, brown hair, long sleeves, hair ornament, cleavage, very long hair, pointy ears, star pointy earrings, blue choker, virtual youtuber, beads hairclip, beads necklace, white kneehighs, red bow, clothing cutout, thigh strap, blue dress, white socks, short dress, frilled dress, cleavage cutout, single thighhigh, puffy long sleeves, blue nails, dragon horns, asymmetrical legwear, star hair ornament, dragon tail, bridal garter, uneven legwear, single sock, frilled kneehighs, demon wings, low wings, heart ahoge, heart-shaped pupils, criss-cross straps, criss-cross halter, blue skirt, plaid skirt, frilled skirt, pleated skirt, white thighhighs, cowboy shot,You can use AdamW and lower the learning rate to around 0.00002. Tagging can be described using natural language. With an image repeat count of 10, train for 10 epochs; if it underfits, continue training. When tagging, you need to describe the background as well, otherwise the model will treat the background as part of the character's features. Here are some personal training suggestions: it is recommended to train on a Linux system, which can nearly double the speed. Using a mixed resolution of 512, 768, and 1024 for training can also speed up the process, and the results are almost identical to training at 1024 resolution.
Below is the tagging prompt I personally use for large language models, which you can modify according to your needs. I personally use Gemini for tagging. You can download the accompanying script from my repository "kongbai-84/soultide_lora" at the "main" branch, and ask the large language model to explain how to use it and translate the interface into English.
Use English natural language to tag the characters in the images for the training set of an anime-specialized LoRA model. The specific rules are as follows:
- [Clothing Description] Each piece of clothing corresponds to only one independent tag, separated by English commas (e.g., white blouse, black pleated skirt, knee-high socks). Prohibit the use of wearing verbs such as "wearing", "dressed in", or "putting on"; instead, use noun phrases or prepositional phrases for direct descriptions (e.g., red scarf around neck, gloves on hands). Avoid general terms (like shoes, clothes) and use precise descriptions instead (like white platform sneakers, sheer stockings). Keep only the most appropriate one among synonymous tags for the same type of clothing, without repetition. One piece of clothing/accessory corresponds to exactly one prompt word. As long as it is a piece of clothing, it corresponds to only one prompt word. Do not use different prompt words just because the perspective changes while the clothing itself hasn't changed (do not change the prompt word for a specific piece of clothing, but also do not forcefully apply the complete prompt words to images where the corresponding clothing does not appear; adjust according to the visual cropping of the image).
- [Directional Description] Allow the use of spatial directional words for auxiliary positioning, such as on the left arm, around the waist, on the right wrist, etc.
- [Other Content] Retain tags describing character features (hair color, hairstyle, eye color, pupil shape, etc.), actions, and backgrounds. Describe actions using natural language. Add "@lhcx" at the very beginning of the prompt. Use natural language to describe the art style (do not use vague descriptions like "anime" or "exquisite anime illustration"; describe the style, painting method, and brushstrokes in detail). Keep only the most accurate synonym for character features. If the uploaded tags do not match the image or there are omissions, supplement or correct them based on the image content. When summarizing the character's face shape, put it in the "Others" category.
Use natural language to describe the character's actions in detail.- The prompt format should be: Art style, a XXX picture of a girl named XXX, a girl named XXX has XXX appearance, a girl named XXX has XXX clothing, a girl named XXX performs XXX action. Replace XXX with the character's name and append the "(soul tide)" suffix. Include this suffix in the summary as well. Prioritize following the user's new instructions.
- [Summary] After completing the above organization, gather and arrange all clothing-related tags together.
IMPORTANT: Please strictly maintain the following format for your reply. Do not change the file name marker, and place it in a code block to prevent the '#' symbol from being swallowed, so that I can write it back to the file via a script:
FILE: filename.txt
tag1, tag2, tag3...
Output the tagging/modification logic first, then output the prompt words, and finally summarize the character and clothing features in the following format:
Character Features:
Clothing Features:
Props (leave blank if none):
Others:Place the modified tags in a code block, and output the summary directly.
Note that classifications like expressions should be placed in "Others". Check to ensure the tags in the Plaintext are consistent with those in the summary, avoiding situations where tags exist in the summary but not in the Plaintext, or vice versa.
When tagging, use natural language to describe the character's appearance and clothing in detail. The character features in the summary should not include perspective. Expressions (eye color belongs to Character Features, while closed eyes belongs to Others) should be placed in "Others", breast size should be placed in "Others", and traits like a mole on the breast should be placed in "Character Features".
When summarizing, the character, clothing, and props should not carry the prefix "a girl named XXX"; only place the character's name at the beginning of the character summary.Thanks for the reply. I'll try to rework my training approach, but I still don't understand the tag description. Was it a prompt for "labeling" or a guide on how to label a dataset yourself?
I'm still trying to find training parameters, as I've got transfer results so far, but things like "jewelry" often transfer incorrectly. I'm also still having problems with the transfer itself: the background behind the character stubbornly doesn't change, even when I specify something like "girl standing in bathroom." It often makes the background white or partially changes, mixing in a plain background. I've also encountered a problem I'm trying to solve: clothing sticking to the character. For example, my character has three poses: naked, in a dress, and in a sweater. During subsequent generations, the character is generated in the correct pose, but now wearing clothing that shouldn't be there according to the prompt. I also can't understand why the Anima model stubbornly pushes any censorship on my picture. Even if I write, for example, "the character changes clothes in the shower," it pushes censorship on my chest... and it would be fine if this happened on a woman's chest, but sometimes it also covers a man's chest with a haze HD, so I don't yet know how to deal with the censorship and the problem with training.

You can try using a large language model for natural language tagging (the @lhcx at the beginning is not required).
Вам следует отключить Shuffle Captions, чтобы не разрушать грамматику и семантику естественного языка. Также можете попробовать включить Torch Compile — это может повысить скорость обучения. Если возникнут ошибки, которые не получится решить, просто выключите его. Для ускорения тренировки базовое разрешение (resolution) можно установить на 512, 768, 1024.
Затем в поле Network Args введите:
verbose=True network_reg_lrs=blocks.(?:[0-9]|1[0-8])..=2e-5,blocks.(?:19|2[0-7])..=8e-6Обязательно следите за логами (консолью), чтобы проверить, успешно ли применились настройки послойной скорости обучения (learning rate).
А можешь подсказать, будет ли данная команда для послойного обучения работать например на архитектуре Illustrious?
Да, это возможно, если ваш трейнер поддерживает такую функцию. Однако количество слоев и то, за что они отвечают, в модели Illustrious отличаются от Anima.
Попробовал обучение заняло 2 часа и 38 минут, результатом стало что одежду примерно запомнило и меняет, но теряется задний фон как бы я не старался, он либо получается полностью монотонным, либо получается подобной:
Вам следует отключить Shuffle Captions, чтобы не разрушать грамматику и семантику естественного языка. Также можете попробовать включить Torch Compile — это может повысить скорость обучения. Если возникнут ошибки, которые не получится решить, просто выключите его. Для ускорения тренировки базовое разрешение (resolution) можно установить на 512, 768, 1024.
Затем в поле Network Args введите:
verbose=True network_reg_lrs=blocks.(?:[0-9]|1[0-8])..=2e-5,blocks.(?:19|2[0-7])..=8e-6Обязательно следите за логами (консолью), чтобы проверить, успешно ли применились настройки послойной скорости обучения (learning rate).
А можешь подсказать, будет ли данная команда для послойного обучения работать например на архитектуре Illustrious?
Да, это возможно, если ваш трейнер поддерживает такую функцию. Однако количество слоев и то, за что они отвечают, в модели Illustrious отличаются от Anima.
Для обучений на базе Illustrious я использую Kohya_ss, там есть такая же строка "network args" так что я так понимаю и там я смогу вставить эту подсказку по слоям?
Да, всё верно. В Illustrious также можно настроить послойную скорость обучения, чтобы уменьшить переобучение стилю. Но слои в Illustrious отличаются от тех, что в Anima, поэтому вам придется протестировать это самостоятельно.
Я начинаю думать, может в моем позитиве и негативе есть проблемы?
Позитив: (masterpiece, best quality, amazing quality, very aesthetic, extremely detailed, very detailed, absurdres, newest, highres, score 9, score 8,detailed background,perfect hands, perfect anatomy, anime source) Негатив: score 1,score 2,score 3,(worst quality, bad quality:1.2),low quality,jpeg artifacts,copyright name,watermark,artist name,signature,out of frame,censored,simple background,white background,black background,
Вам следует отключить Shuffle Captions, чтобы не разрушать грамматику и семантику естественного языка. Также можете попробовать включить Torch Compile — это может повысить скорость обучения. Если возникнут ошибки, которые не получится решить, просто выключите его. Для ускорения тренировки базовое разрешение (resolution) можно установить на 512, 768, 1024.
Затем в поле Network Args введите:
verbose=True network_reg_lrs=blocks.(?:[0-9]|1[0-8])..=2e-5,blocks.(?:19|2[0-7])..=8e-6Обязательно следите за логами (консолью), чтобы проверить, успешно ли применились настройки послойной скорости обучения (learning rate).
А можешь подсказать, будет ли данная команда для послойного обучения работать например на архитектуре Illustrious?
Да, это возможно, если ваш трейнер поддерживает такую функцию. Однако количество слоев и то, за что они отвечают, в модели Illustrious отличаются от Anima.
Для обучений на базе Illustrious я использую Kohya_ss, там есть такая же строка "network args" так что я так понимаю и там я смогу вставить эту подсказку по слоям?
Да, всё верно. В Illustrious также можно настроить послойную скорость обучения, чтобы уменьшить переобучение стилю. Но слои в Illustrious отличаются от тех, что в Anima, поэтому вам придется протестировать это самостоятельно.
Кажется я понял в чем моя проблема была, в найстройках генерации, мой параметр Shift был 1 и 2 для Hires-fix
Вам следует отключить Shuffle Captions, чтобы не разрушать грамматику и семантику естественного языка. Также можете попробовать включить Torch Compile — это может повысить скорость обучения. Если возникнут ошибки, которые не получится решить, просто выключите его. Для ускорения тренировки базовое разрешение (resolution) можно установить на 512, 768, 1024.
Затем в поле Network Args введите:
verbose=True network_reg_lrs=blocks.(?:[0-9]|1[0-8])..=2e-5,blocks.(?:19|2[0-7])..=8e-6Обязательно следите за логами (консолью), чтобы проверить, успешно ли применились настройки послойной скорости обучения (learning rate).
А можешь подсказать, будет ли данная команда для послойного обучения работать например на архитектуре Illustrious?
Да, это возможно, если ваш трейнер поддерживает такую функцию. Однако количество слоев и то, за что они отвечают, в модели Illustrious отличаются от Anima.
Для обучений на базе Illustrious я использую Kohya_ss, там есть такая же строка "network args" так что я так понимаю и там я смогу вставить эту подсказку по слоям?
Да, всё верно. В Illustrious также можно настроить послойную скорость обучения, чтобы уменьшить переобучение стилю. Но слои в Illustrious отличаются от тех, что в Anima, поэтому вам придется протестировать это самостоятельно.
Я начинаю думать, может в моем позитиве и негативе есть проблемы?
Позитив: (masterpiece, best quality, amazing quality, very aesthetic, extremely detailed, very detailed, absurdres, newest, highres, score 9, score 8,detailed background,perfect hands, perfect anatomy, anime source) Негатив: score 1,score 2,score 3,(worst quality, bad quality:1.2),low quality,jpeg artifacts,copyright name,watermark,artist name,signature,out of frame,censored,simple background,white background,black background,
Попробуйте использовать этот промпт:
masterpiece, best quality, newest, 1girl, walking, looking at viewer, calm, overgrown ruin, gothic cathedral, mossy pillar, plants, broken stained glass window, god rays, cinematic lighting, high angle, from above, the girl is situated in the center of the frame walking along the fallen mossy pillar spanning a dark chasm while massive ruined arches surround the girl.
Вам следует отключить Shuffle Captions, чтобы не разрушать грамматику и семантику естественного языка. Также можете попробовать включить Torch Compile — это может повысить скорость обучения. Если возникнут ошибки, которые не получится решить, просто выключите его. Для ускорения тренировки базовое разрешение (resolution) можно установить на 512, 768, 1024.
Затем в поле Network Args введите:
verbose=True network_reg_lrs=blocks.(?:[0-9]|1[0-8])..=2e-5,blocks.(?:19|2[0-7])..=8e-6Обязательно следите за логами (консолью), чтобы проверить, успешно ли применились настройки послойной скорости обучения (learning rate).
А можешь подсказать, будет ли данная команда для послойного обучения работать например на архитектуре Illustrious?
Да, это возможно, если ваш трейнер поддерживает такую функцию. Однако количество слоев и то, за что они отвечают, в модели Illustrious отличаются от Anima.
Для обучений на базе Illustrious я использую Kohya_ss, там есть такая же строка "network args" так что я так понимаю и там я смогу вставить эту подсказку по слоям?
Да, всё верно. В Illustrious также можно настроить послойную скорость обучения, чтобы уменьшить переобучение стилю. Но слои в Illustrious отличаются от тех, что в Anima, поэтому вам придется протестировать это самостоятельно.
Кажется я понял в чем моя проблема была, в найстройках генерации, мой параметр Shift был 1 и 2 для Hires-fix
Для базового изображения рекомендуется устанавливать значение Shift на 3-4, а для Hires. fix (увеличения разрешения) можно использовать 2-3.
Вам следует отключить Shuffle Captions, чтобы не разрушать грамматику и семантику естественного языка. Также можете попробовать включить Torch Compile — это может повысить скорость обучения. Если возникнут ошибки, которые не получится решить, просто выключите его. Для ускорения тренировки базовое разрешение (resolution) можно установить на 512, 768, 1024.
Затем в поле Network Args введите:
verbose=True network_reg_lrs=blocks.(?:[0-9]|1[0-8])..=2e-5,blocks.(?:19|2[0-7])..=8e-6Обязательно следите за логами (консолью), чтобы проверить, успешно ли применились настройки послойной скорости обучения (learning rate).
А можешь подсказать, будет ли данная команда для послойного обучения работать например на архитектуре Illustrious?
Да, это возможно, если ваш трейнер поддерживает такую функцию. Однако количество слоев и то, за что они отвечают, в модели Illustrious отличаются от Anima.
Для обучений на базе Illustrious я использую Kohya_ss, там есть такая же строка "network args" так что я так понимаю и там я смогу вставить эту подсказку по слоям?
Да, всё верно. В Illustrious также можно настроить послойную скорость обучения, чтобы уменьшить переобучение стилю. Но слои в Illustrious отличаются от тех, что в Anima, поэтому вам придется протестировать это самостоятельно.
Кажется я понял в чем моя проблема была, в найстройках генерации, мой параметр Shift был 1 и 2 для Hires-fix
Для базового изображения рекомендуется устанавливать значение Shift на 3-4, а для Hires. fix (увеличения разрешения) можно использовать 2-3.
Генерации начали получаться, но стиль генераций не нравится, чего то ему не хватает, пойду попробую чего нить сделать с этим, возможно это как то поможет
Но пока я не понимаю почему персонаж так разительно отличается от датасета по стилю рисовки, да и сама рисовка в Anima пока не понимаю как уйти от мультипликационности ближе к той что была в Illustrious...
Большое спасибо за терпение и помощь ;) <3
Кажется я понял в чем моя проблема была, в найстройках генерации, мой параметр Shift был 1 и 2 для Hires-fix
Для базового изображения рекомендуется устанавливать значение Shift на 3-4, а для Hires. fix (увеличения разрешения) можно использовать 2-3.
Слушай, у меня вопрос, а ты не знаешь, как можно повторить стиль который использовал на Illustrious:
Кажется я понял в чем моя проблема была, в найстройках генерации, мой параметр Shift был 1 и 2 для Hires-fix
Для базового изображения рекомендуется устанавливать значение Shift на 3-4, а для Hires. fix (увеличения разрешения) можно использовать 2-3.
Слушай, у меня вопрос, а ты не знаешь, как можно повторить стиль который использовал на Illustrious:
Вы можете поискать художников на этом сайте:
Anima Style Explorer
И еще одно: Anima поддерживает более высокие веса, вы можете поднимать их вплоть до 8.




