Hacker News new | past | comments | ask | show | jobs | submit login
Stable Diffusion:Real time prompting with SDXL Turbo and ComfyUI running locally (reddit.com)
122 points by belltaco 9 months ago | hide | past | favorite | 42 comments



Someone also improved speed of LCM recently.

Code: https://github.com/discus0434/faster-lcm Blog post (in Japanese): https://zenn.dev/discus0434/articles/12427b887b4082

Have no idea how this can be used but they claim 26fps on a RTX 3090.


I'm sorry ... WHAT! I pay attention to the SD on reddit and no one saw this or somehow it got swept under the rug somewhere lol... Thats insane!

I wonder if SDXL Turbo + LCM will be a thing, to get to realtime generation


> I wonder if SDXL Turbo + LCM will be a thing

SDXL Turbo works best (at least from my trials today) with the LCM sampler, producing better results in fewer iterations with it than it does with Euler A.


This was one of the top posts on reddit yesterday, not as much as turbo though.


Using the comfyui workflow [0] I'm getting really impressive results (obviously, not as quick as single step, but still very fast [1]) at 768x768, 10 steps, using the lcm sampler instead of euler ancestral, and putting CFG at 2.0 instead of 1.0.

[0] drop this image on the comfyui canvas: https://comfyanonymous.github.io/ComfyUI_examples/sdturbo/sd...

[1] On a 3080Ti laptop card


I love that they embed entire workflows into the meta of their images


Combined with the ComfyUI Manager extensions which provides an index of custom node packages and can install missing ones from a loaded workflow it makes it very easy to get up and running with a new workflow.


Reading through the comments, it's hard to not think of "Pygmalion and Galatea."


I'm so glad this is now available for free and not needing to ask or being charged extra by an artist for different concepts.

The barrier is really being lowered and this is beautiful.

What a great time to be alive.


Bad news for that: it is only free for noncommercial use, and this isn't just a temporary early-release thing, but the new general direction for StabilityAI:

https://twitter.com/EMostaque


Slightly off-topic, but holy crap, some of these use cases he retweeted are bonkers:

https://twitter.com/toyxyz3/status/1729922123119104476

https://nitter.net/toyxyz3/status/1729922123119104476


I mean the fact SD has shown its possible just means that other research groups can also use the same concept...

Someone on Reddit was actually pointing towards a model thats more narrow in scope to SDXL but that was trained by a single guy on an A100, so no reason we can't expect other groups to pop up or maybe a consortium of freelancers from the fine tuning community to maybe get together to start there own base model.


Really impressive. Is that video sped up at all? Crazy fast!


Not sped up.

Here's a live demo, but you need to register an account.

https://clipdrop.co/stable-diffusion-turbo


You need to provide an email address and click on a link. Emails from grr.la work.

Yes, it is "register an account", but it is the lowest friction I know.


Nope and this is just the beginning imagine a year from now or once the people that bring us LCM and controlnet and ipadapter start looking at the possibilities, not to mention fine tunes on turbo.


It's not sped up. I tried it locally last night and it was just as fast. Running on Windows 11 w/ a RTX 3090.


I just tried it locally with a 3070 and it was about 3 seconds per render. I'm far from great at this stuff and it was my first use of ComfyUI, so I don't know if that number could be improved on my setup.


On my machine with AMD RX 7900XT, it takes ~0.17s per image. Are you using SD Turbo Scheduler node?


At the time I hadn't, but now I've used the sample image (the one with the fox in the bottle in the snow) to load up the workflow with sdturboscheduler, and it's still about the same time.

Edit: Even thought the UI says sdxl turbo, I notice that the command prompt is saying sdxl.

  got prompt
  Requested to load SDXLClipModel
  Loading 1 new model
  Requested to load SDXL
  Loading 1 new model
  100%|| 1/1 [00:00<00:00, 11.30it/s]
  Requested to load AutoencoderKL
  Loading 1 new model
  Prompt executed in 4.04 seconds
  gc collect
Edit 2: More info... I noticed that if I just change the steps, it takes less than a second to generate an image. If I just change the prompt, it takes 3+ seconds. I don't know enough about this to know what that means.


Have you used the 7900XT with LLMs at all?


Yup, works just fine. Overall pytorch on ROCm 5.6 has been working very well. I'm impressed with how stable it is, given how much hate AMD driver stack has been getting.


same on my end, the biggest issue sometimes is finding the correct torch/torchvision version that go with different projects as AMD users are still rare. So, the problem is still the AMD ecosystem is pretty niche and will be pretty stable unless you pick a dev or wrong version, then it may happen to cause kernel panics! In any case, things looks to be improving, but rocm has a lot to support (xformers and other core ML libraries)


Doesn't seem that crazy 3070s can do 20 step 768px in like 6 to 9s


I really want to know if it would work on a CPU. I don’t have the money for a graphics card


You can already do that using existing models, but instead of generating 1 image taking a few seconds it will take at least a minute, perhaps SDXL Turbo brings that down


You can run these models on CPU but it would be much slower than the demo.


Ideally you would run SDXL Turbo with OpenVino optimizations. I'm not aware of any project that has support for both but maybe there is something.


Any help ? When loading the graph, the following node types were not found: SDTurboScheduler


When loading the graph, the following node types were not found: SDTurboScheduler


is it the model that made it really fast?


Yes, but its not a change to the model architecture, so you can actually subtract out the base SDXL model, merge in any existing SDXL checkpoint, and get an Turbo version of the existing checkpoint..

(In ComfyUI, this just takes loading all three checkpoints -- turbo, base, and the target one -- and using a ModelMergeSubtract node to subtract base from turbo, to get the "turbocharger" alone, and a ModelMergeAdd to add the turbocharger to the target checkpoint.)


So basically like what they do with the inpainting models. We need a jugernaught XL turbo


Yes, SDXL Turbo can do inference in a single step.


Technically, any SD model can do inference in a single step, SDXL Turbo just is an improvement in quality with a small number of steps (per the comparisons in the announcement, at 4 steps it is a little bit better than SDXL base at 50 steps.)

EDIT: When I say "just", I'm not saying this isn't super impressive, just that the change is in quality of low-step-count generations, not making it possible.


[flagged]


[flagged]


It's a tool. Some people will do something interesting with it. Most people won't.


True. For most, it will be a tool to generate ugly pictures.


True, diffusion models do not cure Sturgeon's Law.

OTOH, there is nothing special about Sturgeon's Law applying to them as well as everything else.


This could be a fact if all you have in mind is kitsch, sure.


I mean, this defeats the point of democratizing art. What's the point of driving the cost of an image to zero and replacing a craft with automated art of that level? Where exactly is the progress here? Are we expected to be impressed by speed or quality? Where is the value? The level of kitsch here is overwhelming. Should we applaud for more kitsch in the world?

"Was einer möchte und nicht kann, wird Kitsch." —Jan Tschichold


> Should we applaud for more kitsch in the world?

Of course. It gets nauseating fast and makes worthwhile messages stand out more, causing the balance swinging to the other side. You're creating something for someone, what's the point if they don't value your carefully crafted message and want kitsch instead?

> Are we expected to be impressed by speed or quality?

Realtime generation in particular makes 3D rendering in the viewport and live painting possible. [1] This kind of tooling makes a lot of difference. What you make with it, either kitsch or something worthwhile, is up to you.

[1] https://www.youtube.com/watch?v=AF2VyqSApjA


> Where exactly is the progress here?

The progress here is that driving the inference phase of generating AI imagery down to a much smaller time/cost stops it from holding back the rest of the creative process using that toolset.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: