Show HN: Shortbread – Create AI comics in minutes

brucethemoose2 · on Oct 6, 2023

This is all super cool.

Some random suggestions:

- I dunno what diffusion framework you are using, but the AITemplate (for GPUS) or diffusers JAX (for TPUs) backend can massively increase your diffusion throughput.

- Alternatively, I believe HuggingFace already has a JAX backend for Stable Diffusion XL, so you could run a model with much better support for large resolutions/inpainting massive images at a similar (?) speed.

- There are schemes for area prompting and subject "subset" prompting in stable diffusion, as well as using images as input. As an example of how y'all might use this, you could generate a image for Character A, an image for Character B, encode them. specify that the character A prompt latents go on the left side of the image, and the character B prompt latents go on the right side of the image. And of course you can add to these area prompts, like "jumping" on the left side and "ducking" on the right side of the image. There's also a way to specify which prompts/encoded images belong to which subjects instead of manually cutting out areas, see: https://github.com/BlenderNeko/ComfyUI_Cutoff

Fengjiao · on Oct 6, 2023

Woaaa love these inputs. Thank you! Wasn't aware of the JAX backend will check it out. Right now we're on SD 1.5. We tried SDXL but found the quality improvement to be marginal. Yes to area prompting/regional control to help people create more complex scenes. I need some design thinking first since it's easy to over build and spit out something super complicated. Immediate next step is to def add controlnet.

brucethemoose2 · on Oct 6, 2023

> JAX

Yeah, check out their post: https://huggingface.co/blog/sdxl_jax

I dunno how expensive TPU instances are these days, but the performance is insane!

> We tried SDXL but found the quality improvement to be marginal.

Yeah, the vanilla HF diffusers pipe is unimpressive to me.

Try playing with this though, turn on FreeU and specify an anime style: https://github.com/MoonRide303/Fooocus-MRE

I have never gotten such high quality results from simple prompts, even in cloud models like Midjourney/GPT4. The question is how to port even part of that magic over to the diffusers pipeline...

brucethemoose2 · on Oct 6, 2023

Also, VoltaML has a good reference GPU AITemplate SD 1.5 implementation:

https://github.com/VoltaML/voltaML-fast-stable-diffusion/tre...

The speed jump is massive on my desktop GPU, probably even more dramatic on cloud hardware, and it may support some things (weight swapping/lora swapping/resolution changing/controlnet) better than JAX.

Fengjiao · on Oct 6, 2023

My issue previously with these prebuilt backends is that you can't tweak it like sdwebui does, but to make our thing work it took a thousand tweaks. Can look into this first to see how customizable it is.

brucethemoose2 · on Oct 6, 2023

VoltaML is a relatively vanilla diffusers-based backend, so its not a hairy monster to hack like you may have seen with SAI-based UIs (like Comfy, Fooocus and Automatic)

The AITTemplate code is a lightly modified version of Facebook's example dynamic AIT script, to get rid of small issues like VRAM spikes: https://github.com/facebookincubator/AITemplate/tree/main/ex...

InvokeAI is also diffusers based, but they seem to mess with the pipeline a bit more.

Anyway, all that may be better as a reference for interesting features rather than a backend to try and adopt.

anigbrowl · on Oct 7, 2023

It's very cool. How will you keep the characters consistent? I generated a simple 3 panel page and there were some variations between the same character across two panels. I think it could be worth it to have special character design pages (like the bonus art/mini posters many comics use now).

The only negative feeling I had was that the layouts are kind of dull, no real way to do angled panels and break up the strict horizontal/vertical flow. But since this would probably be used in combination with image editor software, that is not such a big deal.

Fengjiao · on Oct 7, 2023

Yeah, the characters aren't truly consistent yet. We did what we could to make them mostly look the same. I might be adding inpainting to help address this. Right now you can just regenerate a couple times in editor and pick the most consistent ones. I also like the slanted panels like in real manga. Right now you can resize the panels. I plan to add a custom layout thing on the select layout page. + allow folks to shape their panels into non-rectangular!

lxe · on Oct 6, 2023

"A frantic person bursts open the hospital door. They talk to the receptionist, then the doctor. In the last panel we see them standing over their bed-bound partner, looking sad."

https://create.shortbread.ai/viewer/08769382-4e2f-426a-9779-...

Kerrick · on Oct 7, 2023

Is this loss?

Fengjiao · on Oct 6, 2023

I love this. I didn't know it can do complex scenes like this yet. Thank you

iamwil · on Oct 7, 2023

I tried recreating one of my favorite strips, just to see what would result. This was the result:

https://create.shortbread.ai/viewer/41e5642c-86ee-421e-b985-...

The original (SFW, despite the warning):

https://www.oglaf.com/trapmaster/

The writing obviously will need work, but I can see this being much better in six years time. The fact we can do this at all is pretty neat. And honestly, a lot of people have a hard time going from zero to one. It's much easier for most people to fiddle and edit something, than to stare at a blank page.

I suspect this kind of workflow will permeate all kinds of work, where the generative AI gives us a starting point, and we go from here. The question then, is whether starting points matter, because it's still a far gap to get to the Oglaf strip given the current starting point. Would some people fare better by random? Or we can make AIs that have enough context to know what's fresh for people?

Fengjiao · on Oct 7, 2023

I think the baseline can def be improved. But I suspect the gap between 60% and 99% would be so wide that it would still take a significant amount of time for the human creator

atleastoptimal · on Oct 6, 2023

I think the flow should be

I give summary => site gives me panel descriptions to choose from

I pick descriptions = > Site creates a "skeleton" layout option for each panel. At this point I can modify the skeleton or save them

I get skeleton layout => Site makes it look pretty

With each stage modifiable

I think this is very cool and well made.

Fengjiao · on Oct 6, 2023

Thanks - you think more like a real comics creator. Right now the flow is maximally reduced to give minimal friction. But for creators to make meaningful long form content, providing a storyboarding process is important. I plan to add some optional stuff in the flow to do this!

atleastoptimal · on Oct 7, 2023

You must be running a lot of compute for this. What are the approx costs per 1000 user sessions?

Fengjiao · on Oct 7, 2023

just did some rough math. 1k session = 750 minutes compute time with our current GPU instance = $9.375. so gladly not broke yet! (rough math might be wrong, need to double check later)

Fengjiao · on Oct 6, 2023

Author here - Didn't expect the amount of traffic rn. The wait might be EXTRA long. We're trying hard to spin up more servers!!

riffraff · on Oct 6, 2023

This is cool, but it appears the consistency issue with this stuff is not solved yet, e.g. in every panel clothing and armors on the dame character are different.

Perhaps a more simplified style might work better.

Fengjiao · on Oct 6, 2023

Yes, we didn't actually solve consistency, although the way we did it makes the same character pretty similar on the same page. If you do some more regenerates of a single panel, you can find some generations that are similar. I MIGHT be supporting inpainting to help fix incoherence details, but need to think about it more -

ianbicking · on Oct 6, 2023

Very cool. I like the editor, though I wish I could drag the bubbles without clicking on them first.

There's not a lot of plot in a single page, but even with that I wish I had more control, and for longer pieces I would absolutely want a lot more control. I'd want to see some basic text previews, and be able to control both the large scale and fine scale of the progression. I'd want to be able to control the tone of the piece, clarify points that GPT might not be picking up on, override choices, etc. I might tweak some dialog... but most of the changes I envision are before dialog, about how scenes are broken up, or the basic premise of the story/world.

Many of the other features you list (outside of more pages, of course) feel less important than the story building itself. (But that's also coming from my personal interest in the story design.)

Fengjiao · on Oct 6, 2023

This makes total sense, and that's how I work too. Comic artists, animators and directors all spend a lot of time on storyboarding before actual production. I'll share that the current design is to allow 99% of people to make at least one comic ;) I think it's possible for both end of the worlds to exist in the same product, but I need to design carefully and not add too much complexity too fast. And agreed on precise panel level control -> more coming soon

ianbicking · on Oct 7, 2023

That makes sense; I wouldn't have wanted to use it, or had any feedback at all, until I got through the first complete generation.

Fengjiao · on Oct 6, 2023

Author here. Just go to https://shortbread.ai and click on "Start Creating". No signing in or anything required.

totetsu · on Oct 7, 2023

https://create.shortbread.ai/viewer/a500fa7c-04c3-4c4c-8708-...

Fengjiao · on Oct 7, 2023

codetrotter · on Oct 7, 2023

> Fine panel-level control. Poses, control net, etc.

One thing that stuck out to me about the https://create.shortbread.ai/viewer/4566613c-7146-4ed7-9b8d-... example, and to a lesser extent the https://create.shortbread.ai/viewer/aafc2f61-d008-4f3f-aa8f-... example is that the people of opposing teams sometimes face in the same direction. Typically you'd have them physically facing each other.

But that is very much a nit-pick. And one that a human can manually fix when looking at the result by mirroring some of the panels. This is super cool!

Fengjiao · on Oct 7, 2023

Yes was talking to a creator about this. Getting the eyeliner to match actually uplifts the story a lot. I imagine some simple rotation/flipping image post processing can help a lot here

colesantiago · on Oct 6, 2023

Was looking for something like this, i'm definitely signing up!

I'm not a comic book artist by any means but now I can add this to my skills list with this.

Is there any pricing yet on this and is this backed by YC or bootstrapped?

Fengjiao · on Oct 6, 2023

- I intended for anyone to be able to use this - so def no worries - No pricing yet, everything is free since it's early beta, but I'm thinking a subscription for a bunch of credits style like Midjourney. - Yes, we are YC

anigbrowl · on Oct 7, 2023

I'd definitely pay. I can draw and have multiple comics I'd like to do, but am not so good that I could commit to the time involved for long-form. This would be ideal as an assistive technology.

Fengjiao · on Oct 7, 2023

Thank you! I'd love you to play with it as much as you can. Can you go to create.shortbread.ai, click on "login to save projects", and then send me an email at fengjiao@shortbread.ai. I'll give you a bunch of free credits to play with

anigbrowl · on Oct 7, 2023

I like your sales technique :D

Fengjiao · on Oct 7, 2023

hue hue hue (evil founder laugh)

anigbrowl · on Oct 7, 2023

I can't log in with google rn (overloaded servers?) but I'll try again tomorrow.

alex_c · on Oct 6, 2023

Very cool! I was briefly toying with something similar just last week.

Main challenge I see is character consistency. I really like the way you set up the prompts, but even so:

Outfit - a simple black sleeveless gi with white pants, a black belt tied around his waist

In two consecutive panels, the output swaps the colors (first panel gets it right, second panel has white gi, black pants).

Curious how you'll tackle this challenge!

Fengjiao · on Oct 6, 2023

Hi Alex, great catch -> We didn't solve consistency, but we saw that if you regenerate a few times, you usually get something that's visually similar. Right now AI artists all do loads of postprocessing - using AI, so later we might have a "smear" feature that inpaints the inconsistent part. Let me know if you have thoughts on this

nottheengineer · on Oct 6, 2023

People can generate a few good samples of what they want a character to look like and then interrogate clip to get a more detailed prompt that makes it more consistent. It increases the prompt sizes a lot, but I don't think there's an easy way to solve this.

Maybe you could build a UI that semi-automates this process?

mentos · on Oct 6, 2023

Could eventually use GPT4vision to review the output of the art to see if all the panels are consistent?

manoj_SprintsQ · on Oct 7, 2023

Amazing, this is the tool I have been looking for a long time. I can create a plot, but I struggle to portray it. This helps me.

ChatGPT vs Jarvis : https://create.shortbread.ai/viewer/9a646605-124d-4dfe-8dca-...

kristopolous · on Oct 7, 2023

I'm always amazed when you type in things like "Gothic Quarter of Barcelona", "at a home in Arcosanti Arizona", "busy Manhattan street", "red carpet at the Oscars", etc, it basically nails the background in any style.

Fengjiao · on Oct 7, 2023

that's way more experimental than my prompts! glad it held up. From what I know if you try to generate animals as main characters it breaks p bad lol

kristopolous · on Oct 7, 2023

right, but you can have someone walking, say, a pig on a leash and you're in business. Also try something like "man in realistic dog costume" for anthropomorphism

weitzj · on Oct 7, 2023

This looks interesting. I am trying to generate comics for my daughter to explain graphically some upcoming events so she hopefully can understand them better or is prepared.

Fengjiao · on Oct 7, 2023

Hope it helps!

cco · on Oct 13, 2023

Very cool! I wasn't able to get much other than manga though haha.

ashu1461 · on Oct 7, 2023

Do you super impose the text or make AI generate the text? Are there good tools to make AI generate text on images ?

Fengjiao · on Oct 7, 2023

Do you mean the text in the speech bubbles? Those are generated by the AI and we fill them in the bubbles. If you’re seeing text in images like a billboard, it probably isn’t that great quality. I heard Dalle3 does pretty good text in image!

ioshaan · on Oct 9, 2023

Thank you very much! I was able to create comic strips from it. :-D

ibbih · on Oct 7, 2023

Saving as a JPG gave me a cropped version that cuts off the bottom, just fyi

NaSeRneynd · on Oct 7, 2023

Congrats on the launch!

Whats the SD workflow? Which models, tools (controlnet, etc) do you use?

vimy · on Oct 7, 2023

This is really cool. :D

There’s a typo on the message founders page.

Fengjiao · on Oct 7, 2023

Ty just fixed it. whew

HanClinto · on Oct 6, 2023

This is really impressive, well done!!

Fengjiao · on Oct 6, 2023

Ty Ty

jrflowers · on Oct 6, 2023

I love “as seen on y combinator”

Fengjiao · on Oct 6, 2023

hahaha sorry the urge of putting on a nice looking logo overtook me