Comfy Use

by Ripcurlsurf - opened 15 days ago

Can this be used or is it under development with Comfy to maximize its potential? I understand some third party nodes are available but quality isn't the best.

Kijai

Comfy Org org 15 days ago

It is now available in the ComfyUI nightly version, no official example workflow yet, but I have simple test workflow available in the PR: https://github.com/Comfy-Org/ComfyUI/pull/13817

The biggest quality issues are in the model itself though, we have some workarounds such as the seam smoothing, and with the native implementation you have access to all different samplers etc. so I'm sure we can find better ways to use the model, but still it's going to be limited when it comes to the final quality, at least without further training.

Personally most interesting use so far has been the reference based image generation.

Andyx1976

15 days ago

it's faster but i don't think it's better than hidream1- Except for the editing stuff which has come a long way. But what use is that if it's blurry and stitched. Good that it's worked on but the competition works just fine.

Ripcurlsurf

15 days ago

@kijai . Just curious. Are you part of the comfy team or a very strong talented supporter? Your names comes up a lot

Kijai

Comfy Org org 15 days ago

@kijai . Just curious. Are you part of the comfy team or a very strong talented supporter? Your names comes up a lot

Started as just custom node dev, but I've been full time part of the official backend team since January now.

RuneXX

15 days ago

•

edited 15 days ago

It's really good with text.... and composition etc.
A little to be desired when it comes to overall quality perhaps (face, hands, details etc)
But was just a first few attempt ;-) Might be possible to tweak it a bit; prompt, sampler, steps etc - or even a refiner 2nd pass (with same model or other model)

(havent tried the ref image part much yet, but that looks really good as well)

(and the spelling in the title is all my fault.. ComfyUI, not Comfy-UI.. prompted a bit too fast.. haha.)

Andyx1976

14 days ago

•

edited 14 days ago

everything photo is blurry and low detail. Reference images seem to work fine for composition and editing. but that doesn't help a weak result. And it doesn't work better than flux.2 or qwen edit.
Also fairly samy images from the same prompt, even the full model. edit: that's down to the workflow not really having random seeds, only adding noise in the sampler.

Anatomy seems to be less messy than in flux.2 models but that's a low bar that only Ernie could beat (with a shovel)... tbh i come back to qwen edit as reliable workhorse, in all but the resolution it's nearly as good but just less fiddly than others. All the same prompts after the first one: hidream o1 full: it's not shiny as usual, it kinda fuzzy even prompted a high quality photo...
same prompt for all following:
Hidream o1, full, mxfp8 (wf says it's higher quality). if you make it small it looks ok, tiny bit closer and everything is fuzzy (faces) Also kinda boring the guys look almost cloned (that's the reason i changed the prompt to "different guys":

Flux.2 klein9b. Amazing visuals, can't count arms (give the right guy the benefit of the doubt that he is jumping):

klein4b for a change the better:

hidream-i1 Full (old) hidream. i wouldn't say it is better than O1 but less fuzzy :
and for some comedy, Ernie (how has this cf of a model so many likes?) Not just the phantoms but arms almost always look weird :

RuneXX

14 days ago

everything photo is blurry and low detail.

Seems ok here. If its the best ever model, probably not.. it's really good with text, and decent compositions.
The ref image feature is the great feature though..

Been trying different samplers, undecided on what works best

Kijai

Comfy Org org 14 days ago

Some of my outputs I've liked:

Some observations, (mostly using reference images):

Base model way better, but requires using the seam fix workaround for the tiling issue
You can use higher res than the default
deis or res_multistep with beta has worked nicely for me, but too many options here to choose best

RuneXX

14 days ago

Also got some good results with res_multistep. Maybe a good candidate ;-)

RuneXX

14 days ago

•

edited 14 days ago

Deis works really well, even got a bit of skin blemishes and details (that was in the prompt)

Kijai

Comfy Org org 14 days ago

Deis works really well, even got a bit of skin blemishes and details (that was in the prompt)

You can also try adding some of the dev distill as a LoRA, not too much or it will burn it: https://huggingface.co/Kijai/hidream-O1-image_comfy/blob/main/loras/hidream_o1_dev_lora_rank_64_bf16_pruned_v1.safetensors

RuneXX

14 days ago

You can also try adding some of the dev distill as a LoRA, not too much or it will burn

yes that helped a bit as well

Andyx1976

12 days ago

•

edited 12 days ago

tbc the images as such are ok, (but my three guys were still clones). without the fuzziness. Far enough out (or small enough), like the size we see here, and apparently beyond photorealism.
But it is meant to do 2048x2048. It's promising but not great. We'll see what people do with it. And thanks for working on it. Now that Qwen Image seems to go closed (and small) , alternatives are good. Except ernie...

RuneXX

12 days ago

•

edited 12 days ago

Also found that using the Gemma 4 text generate in comfy, and feeding your prompt with the instruct from HiDream, vastly improved the output.
I used the prompt instruction here https://github.com/HiDream-ai/HiDream-O1-Image/blob/main/prompt_agent.py (but i translated it to english)
It makes a json prompt that the model seems to like a lot ;-)

Andyx1976

12 days ago

•

edited 12 days ago

i opened the lady 2 posts up in full size. The skin is pure blur. The back and white old guy further up, even in small size, the hairs looks ok but the skin is completely blurry.
Are they just overselling the resolutions it can do? Maybe someone makes a anti blur or skin lora. That seems to be by far the biggest problem. At least in photorealistic images. And it's a techncial problem of the model. You can prompt as much about no blur, sharpness, skin details or no dof, or change samplers, and it still does it.
The black and white guy is a good example of prompting the hell out of it (tons of prompted skin details, hair and face details to try to fix the problem , and you end up with these typical 100 year olds, even if you prompt someone age 40. It's the only way the poor model can cramp all these prompted details into a face. but the skin is STILL blurry.

With enhanced prompts like this btw i find it a double edged sword anyway. Not buying this new meta about short story sized prompts, that started with Z-Image. because it's very hard to stop it from changing too much. And even Z-image does just fine with a simpler prompt. It just does always the same with it. If you want a randomized/ fancied up version of a core idea, a long, flowery prompt is great. But for something precise it's often more annoying.
And if a model has problems like F2, Ernie with anatomy or HidreamO1 with blur, even a long prompt doesn't change the fundamental flaw. Hence the b/w 100 year old guy.

RuneXX

12 days ago

This model has its strength and weaknesses ... as any other model i guess.
But its open source, so community will evolve on it, if they want. Make fine tuned models, loras and what not ;-)

So the most important part is that its open source

RuneXX

12 days ago

•

edited 12 days ago

Ah i see there has been added a default workflow inside Comfy now.
With prompt enhancer and more

Try that perhaps. Gives better results

Kijai

Comfy Org org 12 days ago

@RuneXX I noticed you had Shift adjustment in one of your workflows, and realized I had a mistake in the initial ModelNoiseScale node that had two buggy behaviours with the shift adjustments:

If the Shift node was after the ModelNoiseScale, it reset the noise scale to the model default (8.0) making the node adjustment do nothing
ModelNoiseScale was after the shift, it reseted the shift to model default

PR has been merged now that fixes that and it should work both ways.

RuneXX

12 days ago

•

edited 12 days ago

Yes i was just experimenting. trying the shift to see how things improved or got worse ;-)
will try again

The model has some serious strength (composition, text, and more .. it looks really "artistic" sometimes).
It does lack a bit in the finer grain details, skin etc, but that might come with community iterations and improvements

I dont know if its just me, but i really like some of the outputs, reminds me of the days I did black and white photography. When you do close up photos, not everything is in focus.
Makes it look more real to me.. . But i do see why some say the skin is plastic etc (but that been said about ai images since sdxl, flux etc etc)

To me it looks abit more like something you'd find in a photography art gallery, while Z-image looks more like a magazine photo.. or something like that ;-)

(images below are stock comfyui workflow with the fp16 full model and a small dash of Kijai's lora (0.3), with res_multistep sampler... if i remember correctly)

RuneXX

12 days ago

•

edited 12 days ago

Did they already release an updated model btw?
https://huggingface.co/HiDream-ai/HiDream-O1-Image-Dev-2604

From the "sales pitch", it sounds like it depends on the prompt refiner, but i guess thats also true for the previous ones

Kijai

Comfy Org org 12 days ago

The strength of the model is the reference image mode really, as text to image it's just too lacking as it is.

The new model is aimed to improve pose following when using something like openpose rig as one of the references, otherwise initial impression is that it just seems... worse in details, even blurrier etc... and it's dev only. I could be doing something wrong still, didn't do any extensive tests yet. Definitely does follow the pose more.

RuneXX

12 days ago

•

edited 12 days ago

(just a low res test run)

yes the ref image way is for sure good fun. And if you have a character, easy to put into different scenes, different clothing, etc etc.

realrebelai

12 days ago

•

edited 12 days ago

The strength of the model is the reference image mode really, as text to image it's just too lacking as it is.

Hey kijai,
I’ve been trying to get a "Detail Daemon" effect (per-step sigma modulation) working with the HiDream-01 dev model.
Since my nodes for the model relies on a vendored pipeline.py with custom flow-matching schedulers (FlashFlowMatch / UniPC) rather than ComfyUI's native KSampler infrastructure, standard Detail Daemon hooks completely miss it. We've tried directly modifying the denoising loop and monkey-patching SIGMA_SCHEDULE_MAP to warp the schedule, but it consistently causes stability issues and tensor blowouts.
Is it possible to natively implement support for this kind of sigma modulation directly within your custom denoising loop? Alternatively, is there a recommended, safe way to hook into the pipeline to modulate sigmas per-step without breaking the flow-matching shift math?

I feel this is something that the community could benefit from and will revitalyze the model entirely if it can be executed properly!

Heres my nodes if you want to take a look 🤷‍♂️ claude just isnt getting it done for me and i keep hitting limits lol i removed the detail injector (essentially custom mapped detail daemon) because it was giving grey outputs and i feel any pipeline changes just ruin the flow entirely. But i have the code if you want to look at that as well. Tried imementing into sampler node AND attempted a seperate node entirely with the same greyed results.

https://github.com/RealRebelAI/Rebels_HiDream-01_Image_Dev_NODES/tree/main

Kaleidia

12 days ago

•

edited 12 days ago

and for some comedy, Ernie (how has this cf of a model so many likes?) Not just the phantoms but arms almost always look weird :

ernie is really good with prompted skin detail, but yes, the ghost limbs are really bad and i initially thought resolution dependent which is not the case, they are just breaking from time to time. lets also not talk about the training data bias... but ernie is also mostly uncensored or can at least display normal nudity (no hardcore stuff) whereas hidream o1 has never seen a nipple... might not be important for a lot of people but for creating character images, it is nice if the base model can do stuff like that...

Kaleidia

12 days ago

just some test images

Kijai

Comfy Org org 12 days ago

Here's the new dev checkpoint as a LoRA to experiment with, it's slightly weaker but honestly that's just better... reducing strength helps it not destroy the background too:

https://huggingface.co/Kijai/hidream-O1-image_comfy/blob/main/loras/hidream_o1_image_dev_2604_lora_avg_rankg_224_bf16.safetensors

Kijai

Comfy Org org 12 days ago

The strength of the model is the reference image mode really, as text to image it's just too lacking as it is.

Hey kijai,
I’ve been trying to get a "Detail Daemon" effect (per-step sigma modulation) working with the HiDream-01 dev model.
Since my nodes for the model relies on a vendored pipeline.py with custom flow-matching schedulers (FlashFlowMatch / UniPC) rather than ComfyUI's native KSampler infrastructure, standard Detail Daemon hooks completely miss it. We've tried directly modifying the denoising loop and monkey-patching SIGMA_SCHEDULE_MAP to warp the schedule, but it consistently causes stability issues and tensor blowouts.
Is it possible to natively implement support for this kind of sigma modulation directly within your custom denoising loop? Alternatively, is there a recommended, safe way to hook into the pipeline to modulate sigmas per-step without breaking the flow-matching shift math?

I feel this is something that the community could benefit from and will revitalyze the model entirely if it can be executed properly!

Heres my nodes if you want to take a look 🤷‍♂️ claude just isnt getting it done for me and i keep hitting limits lol i removed the detail injector (essentially custom mapped detail daemon) because it was giving grey outputs and i feel any pipeline changes just ruin the flow entirely. But i have the code if you want to look at that as well. Tried imementing into sampler node AND attempted a seperate node entirely with the same greyed results.

https://github.com/RealRebelAI/Rebels_HiDream-01_Image_Dev_NODES/tree/main

Detail Daemon already works with the base model in ComfyUI with the native implementation though? Just tested it and it's fine. It doesn't really work with the dev model though as that model just smooths everything out so aggressively.

realrebelai

11 days ago

Detail Daemon already works with the base model in ComfyUI with the native implementation though? Just tested it and it's fine. It doesn't really work with the dev model though as that model just smooths everything out so aggressively.

I understand but i was attempting to address the dev model specifically for that purpose as the model does wash everything out pretty bad. I was trying to figure out a different way to achieve the detail injection and reject some of the aggressive smoothing without causing hallucinations or forcing the smoothing regardless. It seems it doesnt work as well as is

Kaleidia

11 days ago

•

edited 11 days ago

I have a set of detailing prompt and ran it through with the full model, it gives some variations but there is still a bit of smoothing happening in the last few steps of the image generation, it also needs to get a bit more variety in its results but we can prompt those as well for the moment. So far each model I tested had their goto face and it always helped to prompt in some ethnicity and more detail. Unlike Chroma or Flux (as well as older models) which is limited to a certain prompt length, newer models can be told a lot of detail in prompt.
Some more examples:

Kijai

Comfy Org org 11 days ago

Detail Daemon already works with the base model in ComfyUI with the native implementation though? Just tested it and it's fine. It doesn't really work with the dev model though as that model just smooths everything out so aggressively.

I understand but i was attempting to address the dev model specifically for that purpose as the model does wash everything out pretty bad. I was trying to figure out a different way to achieve the detail injection and reject some of the aggressive smoothing without causing hallucinations or forcing the smoothing regardless. It seems it doesnt work as well as is

It smooths everything out on the last (low) sigmas, if you end the schedule early there's bit more detail, but also the same patch grid artifacts as with the base model. It looks to me the dev model has been trained (either on purpose or by side effect) to smooth out the grid artifacts, which ends up also losing ton of normal detail. Just a theory, don't know anything for sure, I have tried various methods trying to get more quality out of it and really only worthwhile approach seems some sort of hybrid using the base model and the dev as a LoRA at lower strength.

Tophness2022

11 days ago

Can this be used or is it under development with Comfy to maximize its potential? I understand some third party nodes are available but quality isn't the best.

Works great with WAN2GP. They tend to have day 0 support for weird stuff more often than ComfyUI these days.
It's more memory efficient than comfy too, so you don't have to sacrifice on quality tradeoffs

RuneXX

11 days ago

•

edited 11 days ago

Here's the new dev checkpoint as a LoRA to experiment with, it's slightly weaker but honestly that's just better... reducing strength helps it not destroy the background too:
https://huggingface.co/Kijai/hidream-O1-image_comfy/blob/main/loras/hidream_o1_image_dev_2604_lora_avg_rankg_224_bf16.safetensors

Works nicely
with that lora ;-)

I dont really know if it helps, but adding something like: Ultra-realistic, high detail skin texture at top of the prompt, the model seems to follow

Kijai

Comfy Org org 11 days ago

Can this be used or is it under development with Comfy to maximize its potential? I understand some third party nodes are available but quality isn't the best.

Works great with WAN2GP. They tend to have day 0 support for weird stuff more often than ComfyUI these days.
It's more memory efficient than comfy too, so you don't have to sacrifice on quality tradeoffs

This model was available unofficially way before Wan2GP and officially around same time. There's no memory issues either.
And "they"? Aren't you working on that project yourself? This is ComfyUI repository and ComfyUI thread, you basically came here to advertise, bad look.

Andyx1976

11 days ago

•

edited 11 days ago

to be fair putting a person into a new image/clothing works very well with flux.2 or qwen image. Put one or two reference images and just prompt some scene like in a normal image model. And they will use the reference person. Even more if you reference them in the prompt. People who use even just editing models (QIE2511), let alone all in ones like Flux.2 or O1, just to edit a image, are criminally underusing them.

Talking of NSFW, hidream I1 (what's with the naming scheme?) could do a lot of stuff, Flux.1 couldn't. Not outright pron of course but kinda kinky stuff. Or just simply dirty or damaged stuff. It was in generally more flexible and i liked it. Flux only caught up with KRea (underrated). Hidream I1 dev or full was just really slow on my then hardware. So i don't want O1 to fail, and i'm glad they seem to work on it.

RuneXX: Nope that skin is still pure blur, just with some color grading. This is something a lora at the very least needs to fix, not a prompt. Better a model update. Or artsifying it instead of trying photorealism. There are some examples here of non photo stuff where it seems to be quite good.

RuneXX

11 days ago

•

edited 11 days ago

non photo stuff where it seems to be quite good.

yes noticed it was very good at things that are not aiming for photo realism. And for text, schematics, infographics, advertisement shots etc
So it definitely has its use cases. And the ref.image feature i think must be close to the best out there. Havent tried it a lot in Klein and Qwen, but cant remember it being that accurate.

And since open source, some derivative models or loras might come ;-)

Odd thing is that photo-realistic shots also looks great at "normal" size (in posts or sized down). But if you blow it up to 100% and peak at it 2048x2048 you will see some smudges and blurs.
Perhaps photorealism at 2048px was aiming too far, but who knows.. its early stages, new model ;-)

Kijai

Comfy Org org 11 days ago

Another thing I noticed is that using even a little of the dev as a LoRA, even when using cfg, gets rid of worst of the patch grid artifacts.

Examples with er_sde/beta, 30 steps, cfg 2.0, base + dev 2604 at 0.2 strength, no seam fix.

Not really optimal still, but I do think there's a good balance to be found like this.

Kaleidia

11 days ago

•

edited 11 days ago

Talking of NSFW, hidream I1 (what's with the naming scheme?) could do a lot of stuff, Flux.1 couldn't. Not outright pron of course but kinda kinky stuff. Or just simply dirty or damaged stuff. It was in generally more flexible and i liked it. Flux only caught up with KRea (underrated). Hidream I1 dev or full was just really slow on my then hardware. So i don't want O1 to fail, and i'm glad they seem to work on it.

It was sad that people did not work more with it, there was an uncensored model on civit but the base was just overlooked and nothing more came out of it. Maybe people were scared because the stock config wanted 4 text encoders even tho only the llama one was actually needed... Come to think of it, hidream l1 was the first model to work with a llm instead of just a clip or t5... that part was used on other models but sadly hidream l1 was kind of dead in the water... it was really good at multi character prompts and such. it outshined a lot of the models at the time but was not really adopted by the community...

Kaleidia

11 days ago

the workflow with dev lora seems to be a bit better but it still produces way too smooth skin, might need to look into better prompting as well to add more micro detail and not just the macro ones like in the above pic.

if we go to full size, it still has tiling and is blocky, especially on the transitions around the characters

Andyx1976

10 days ago

•

edited 10 days ago

i agree with you on many things. I don't on LLM for clips. I don't think it hurt. But in my experience it makes no bloody difference (As CLIP!) Using Chroma or Wan or Flux Keea, all with T5, they work AS well as llm-clip models. I call bs on this llm hype (for clip) that has taken over (each model a different llm of course). You prompt slightly different for t5xl (much less effort btw). but that's it. I think Hidream1 was just better trained for new stuff, that was its advantage. Which is proven by Flux.1 Krea, still t5 clip, bringing flux to the same level (complex compositions, Dirty, wet stuff, text placement....) .

tuolaku

10 days ago

I can't believe I missed such an exciting discussion! I'm officially 'debugging' this model—or 'trial and error' might be a more fitting term. Right now, I'm using the RES4LFY 'chain' scheduling method, as shown in the image below. I tried injecting a bit of eta at the initial stage and having it execute one less step at the end. This way, I got an image with a bit of grain, which I personally feel looks a bit more realistic.
By the way, I've quantized the Prompt-Refine model into GGUF format, which can be loaded and inferred using LM Studio. Link: https://huggingface.co/tuolaku/Prompt-Refine-GGUF/tree/main

tuolaku

10 days ago

QQuick question for everyone: how do you handle the issue of facial consistency? I suspect it might be due to the model's limitations. When predicting pixel blocks, the patches are relatively large, which leads to less information and makes it hard to maintain consistency. However, I can't quite explain why consistency is much better with other objects.

Kijai

Comfy Org org 10 days ago

I made a node (available for testing in KJNodes) to see each step's result easier, it will also show you full resolution preview if you want, since this is pixel space model it works especially well:

Ripcurlsurf

8 days ago

For photo realism what are the best settings for clownshark samplier. I find the standard settings give me the doll like skin? The loras set at .7 help a bit but nothing like flux standards. I have 32 vram so I have a lot of room to play with

Andyx1976

7 days ago

•

edited 5 days ago

i think we're fairly sure now that this is just the models weakness. Pompting and samplers can only help so much. It needs a real fix (lora) for it's skin issues.

I've seen Ai Toolkit has been updated to support it. Let's see what it can do with some help.

Andyx1976

3 days ago

•

edited 3 days ago

I found a simple solution in a workflow from civitai, that works remarkably well. It takes the output image from hidream and puts it through a z-image turbo run with 4 steps (i use mostly 6) and 0.35 denoising. While copying the same prompt from the hidream image in. The results are pretty good even for the blurry or shiny skin. Even with disabling the upscaling step at the end (which takes longer than the z-image run)..
i changed it for myself to avoid another custom node (and add my own ones :P) but this is the principle:

the og civitai workflow site: https://civitai.com/models/2629261/hidream-o1-dev-2604-z-image-turbo-refiner?modelVersionId=2952028
Although i'm fairly sure the negative prompt doesn't do anything on a cfg1 distilled model like ZIT.
before and after (6 step z.image, end-upscaling disabled):

realrebelai

3 days ago

I found a simple solution in a workflow from civitai, that works remarkably well. It takes the output image from hidream and puts it through a z-image turbo run with 4 steps (i use mostly 6) and 0.35 denoising. While copying the same prompt from the hidream image in. The results are pretty good even for the blurry or shiny skin. Even with disabling the upscaling step at the end (which takes longer than the z-image run)..

thats actually my dual pass workflow haha <3

Andyx1976

1 day ago

•

edited 1 day ago

i discovered, the zit upscaler is more or less as a template in comfy but it never crossed my mind to just attach it (and zit is so fast). Or to realize how good it is against the new King of blurry ... everything. .
I remember even have somewhere a old flux1+zit workflow to use my old flux loras. But it works great.
But btw i'm not convinced by the upscale-downscale thing at the end. I seem to get worse results with it (it gets a bit too noisy). Tried a few variations. But the main thing is very nearly perfect usually.

My new problem: in all workflows that use the checkpoint trick my (Aitoolkit) trained lora doesn't work. at all. Even cranked up to 3.x weight. Only with the one using the Hidream-o1 special nodes does it. But that hidream-o1 lora loader doesn't connect to normal nodes (doubt it would help).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment