This is an early experimental LoRA that adds bbox guided inpainting / editing to the Ideogram 4 model. It is a work in progress, so the files here are snapshots at different points in time while I adjust training parameters and build a better dataset.
I currently get the most stable results with the checkpoint at step 4000 of the second training run.
The dataset is very small, so do not expect any magic or precision. It is a starting point that hopefully evolves over the next weeks as I prepare a bigger dataset and start over with training with larger rank and finetuned parameters.
Prerequisites
Custom Node
You can find my custom node set on GitHub at ComfyUI-bitpoet-IG4Inpaint. The necessary workflow is included in the node or can be downloaded here.
ComfyUI Changes
Check out or download the dev-ideogram4-inpaint branch of my Comfy fork.
Training
To train with reference images, you currently need to use a slightly adapted fork of AI-Toolkit. You can find my bitpoet-ideogram4-refimages branch here on GitHub
It also includes a fix for the UTF-8 / ANSII error lately popping up on Windows that has jobs fail at startup.
Note that this AI-Toolkit adaption has a switch for reference image support at the top of the dataset editor. You have to switch this on every time you open a dataset with reference images.
An example training config for AI-Toolkit is also in this repository.
I will add a small example dataset at some point.
If you want to assemble your own dataset, you might find my simple node.js based dataset editor handy. It's tailored for especially for Ideogram 4 image-reference-prompt datasets, with a graphical bbox editor and completion indication.
Buzzwords (technical details)
What we changed in AI-Toolkit besides the dataset editor:
We added reference-latent token concatenation for Ideogram 4: each clean reference image is VAE-encoded and appended to the packed sequence as [text | noisy target | clean reference], with its own indicator, MRoPE time coordinate, and clean timestep. The transformer output and diffusion loss are sliced to target tokens only, while bounding-box JSON prompts provide spatial edit conditioning.
These changes have to be mirrored in ComfyUI as well:
ComfyUI core: Extended the native Ideogram 4 model to accept reference latents and reproduce the training sequence [text | noisy output | clean reference], including the separate indicator, MRoPE coordinate, clean timestep, and output-only prediction slicing.
Custom node: Ideogram4ReferenceConditioning resizes and VAE-encodes a reference image to match the target latent, then attaches it only to positive conditioning so the separate unconditional model remains unchanged.
Credits
Credits go to:
- ideogram-ai for releasing a highly interesting and high quality new image model.
- Ostris for AI-Toolkit
- Comfy-Org and Kijai for ComfyUI itself and zero day support for Ideogram 4