Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Shortbread – Create AI comics in minutes (shortbread.ai)
235 points by Fengjiao on Oct 6, 2023 | hide | past | favorite | 57 comments
Just go to the link and click on "Start Creating". No signing in required.

I built shortbread to help anyone to create comics / manga series. The onboarding process helps you kick start a page from 60%, then you can use your creativity to bring it to 1000% in a fully-controllable editor.

Tech stack:

GPT 3.5 Turbo - the comic script generation. It handled everything from layout, character, scene, SD prompts, to dialogue.

SD 1.5 - We put up SD servers on GCP. For every comic we generate one large image and crop it into panels. Per the experiments of u/Deathmarkedadc on Reddit, this massively helps with consistency. The models are trained on anime scenes tho, and might not be so great with animals.

Frontend: Next.js 13 on Vercel, React + Typescript. We built the entire editor from scratch to compose the comic (images, panels, speech bubbles, text) like a webpage. This allows you to edit and republish your comics like a website. You can dynamically generate panels as well. Try resizing a panel into a long narrow box and generate.

Backend: Firebase.

Sample comics:

a japanese couple sits at dinner table. The husband told the wife a secret (link https://create.shortbread.ai/viewer/debdf25c-3f95-492a-952a-...)

An army of male soldiers fighting against an army of female soldiers in ancient china (https://create.shortbread.ai/viewer/4566613c-7146-4ed7-9b8d-...)

a team of girls play volleyball against a team of boys (https://create.shortbread.ai/viewer/aafc2f61-d008-4f3f-aa8f-... )

Next steps:

- More pages

- Fine panel-level control. Poses, control net, etc.

- Multi-character.

- Different styles.

- Allows you to control character design.

I’m Fengjiao Peng, founder and chief engineer at Shortbread. I was previously a webtoon artist. We want to build this into something you can create entire comics series / manga / webtoons with. Criticism and suggestions welcome!




This is all super cool.

Some random suggestions:

- I dunno what diffusion framework you are using, but the AITemplate (for GPUS) or diffusers JAX (for TPUs) backend can massively increase your diffusion throughput.

- Alternatively, I believe HuggingFace already has a JAX backend for Stable Diffusion XL, so you could run a model with much better support for large resolutions/inpainting massive images at a similar (?) speed.

- There are schemes for area prompting and subject "subset" prompting in stable diffusion, as well as using images as input. As an example of how y'all might use this, you could generate a image for Character A, an image for Character B, encode them. specify that the character A prompt latents go on the left side of the image, and the character B prompt latents go on the right side of the image. And of course you can add to these area prompts, like "jumping" on the left side and "ducking" on the right side of the image. There's also a way to specify which prompts/encoded images belong to which subjects instead of manually cutting out areas, see: https://github.com/BlenderNeko/ComfyUI_Cutoff


Woaaa love these inputs. Thank you! Wasn't aware of the JAX backend will check it out. Right now we're on SD 1.5. We tried SDXL but found the quality improvement to be marginal. Yes to area prompting/regional control to help people create more complex scenes. I need some design thinking first since it's easy to over build and spit out something super complicated. Immediate next step is to def add controlnet.


> JAX

Yeah, check out their post: https://huggingface.co/blog/sdxl_jax

I dunno how expensive TPU instances are these days, but the performance is insane!

> We tried SDXL but found the quality improvement to be marginal.

Yeah, the vanilla HF diffusers pipe is unimpressive to me.

Try playing with this though, turn on FreeU and specify an anime style: https://github.com/MoonRide303/Fooocus-MRE

I have never gotten such high quality results from simple prompts, even in cloud models like Midjourney/GPT4. The question is how to port even part of that magic over to the diffusers pipeline...


Also, VoltaML has a good reference GPU AITemplate SD 1.5 implementation:

https://github.com/VoltaML/voltaML-fast-stable-diffusion/tre...

The speed jump is massive on my desktop GPU, probably even more dramatic on cloud hardware, and it may support some things (weight swapping/lora swapping/resolution changing/controlnet) better than JAX.


My issue previously with these prebuilt backends is that you can't tweak it like sdwebui does, but to make our thing work it took a thousand tweaks. Can look into this first to see how customizable it is.


VoltaML is a relatively vanilla diffusers-based backend, so its not a hairy monster to hack like you may have seen with SAI-based UIs (like Comfy, Fooocus and Automatic)

The AITTemplate code is a lightly modified version of Facebook's example dynamic AIT script, to get rid of small issues like VRAM spikes: https://github.com/facebookincubator/AITemplate/tree/main/ex...

InvokeAI is also diffusers based, but they seem to mess with the pipeline a bit more.

Anyway, all that may be better as a reference for interesting features rather than a backend to try and adopt.


It's very cool. How will you keep the characters consistent? I generated a simple 3 panel page and there were some variations between the same character across two panels. I think it could be worth it to have special character design pages (like the bonus art/mini posters many comics use now).

The only negative feeling I had was that the layouts are kind of dull, no real way to do angled panels and break up the strict horizontal/vertical flow. But since this would probably be used in combination with image editor software, that is not such a big deal.


Yeah, the characters aren't truly consistent yet. We did what we could to make them mostly look the same. I might be adding inpainting to help address this. Right now you can just regenerate a couple times in editor and pick the most consistent ones. I also like the slanted panels like in real manga. Right now you can resize the panels. I plan to add a custom layout thing on the select layout page. + allow folks to shape their panels into non-rectangular!


"A frantic person bursts open the hospital door. They talk to the receptionist, then the doctor. In the last panel we see them standing over their bed-bound partner, looking sad."

https://create.shortbread.ai/viewer/08769382-4e2f-426a-9779-...


Is this loss?


I love this. I didn't know it can do complex scenes like this yet. Thank you


I tried recreating one of my favorite strips, just to see what would result. This was the result:

https://create.shortbread.ai/viewer/41e5642c-86ee-421e-b985-...

The original (SFW, despite the warning):

https://www.oglaf.com/trapmaster/

The writing obviously will need work, but I can see this being much better in six years time. The fact we can do this at all is pretty neat. And honestly, a lot of people have a hard time going from zero to one. It's much easier for most people to fiddle and edit something, than to stare at a blank page.

I suspect this kind of workflow will permeate all kinds of work, where the generative AI gives us a starting point, and we go from here. The question then, is whether starting points matter, because it's still a far gap to get to the Oglaf strip given the current starting point. Would some people fare better by random? Or we can make AIs that have enough context to know what's fresh for people?


I think the baseline can def be improved. But I suspect the gap between 60% and 99% would be so wide that it would still take a significant amount of time for the human creator


I think the flow should be

I give summary => site gives me panel descriptions to choose from

I pick descriptions = > Site creates a "skeleton" layout option for each panel. At this point I can modify the skeleton or save them

I get skeleton layout => Site makes it look pretty

With each stage modifiable

I think this is very cool and well made.


Thanks - you think more like a real comics creator. Right now the flow is maximally reduced to give minimal friction. But for creators to make meaningful long form content, providing a storyboarding process is important. I plan to add some optional stuff in the flow to do this!


You must be running a lot of compute for this. What are the approx costs per 1000 user sessions?


just did some rough math. 1k session = 750 minutes compute time with our current GPU instance = $9.375. so gladly not broke yet! (rough math might be wrong, need to double check later)


Author here - Didn't expect the amount of traffic rn. The wait might be EXTRA long. We're trying hard to spin up more servers!!


This is cool, but it appears the consistency issue with this stuff is not solved yet, e.g. in every panel clothing and armors on the dame character are different.

Perhaps a more simplified style might work better.


Yes, we didn't actually solve consistency, although the way we did it makes the same character pretty similar on the same page. If you do some more regenerates of a single panel, you can find some generations that are similar. I MIGHT be supporting inpainting to help fix incoherence details, but need to think about it more -


Very cool. I like the editor, though I wish I could drag the bubbles without clicking on them first.

There's not a lot of plot in a single page, but even with that I wish I had more control, and for longer pieces I would absolutely want a lot more control. I'd want to see some basic text previews, and be able to control both the large scale and fine scale of the progression. I'd want to be able to control the tone of the piece, clarify points that GPT might not be picking up on, override choices, etc. I might tweak some dialog... but most of the changes I envision are before dialog, about how scenes are broken up, or the basic premise of the story/world.

Many of the other features you list (outside of more pages, of course) feel less important than the story building itself. (But that's also coming from my personal interest in the story design.)


This makes total sense, and that's how I work too. Comic artists, animators and directors all spend a lot of time on storyboarding before actual production. I'll share that the current design is to allow 99% of people to make at least one comic ;) I think it's possible for both end of the worlds to exist in the same product, but I need to design carefully and not add too much complexity too fast. And agreed on precise panel level control -> more coming soon


That makes sense; I wouldn't have wanted to use it, or had any feedback at all, until I got through the first complete generation.


Author here. Just go to https://shortbread.ai and click on "Start Creating". No signing in or anything required.



LMAO


> Fine panel-level control. Poses, control net, etc.

One thing that stuck out to me about the https://create.shortbread.ai/viewer/4566613c-7146-4ed7-9b8d-... example, and to a lesser extent the https://create.shortbread.ai/viewer/aafc2f61-d008-4f3f-aa8f-... example is that the people of opposing teams sometimes face in the same direction. Typically you'd have them physically facing each other.

But that is very much a nit-pick. And one that a human can manually fix when looking at the result by mirroring some of the panels. This is super cool!


Yes was talking to a creator about this. Getting the eyeliner to match actually uplifts the story a lot. I imagine some simple rotation/flipping image post processing can help a lot here


Was looking for something like this, i'm definitely signing up!

I'm not a comic book artist by any means but now I can add this to my skills list with this.

Is there any pricing yet on this and is this backed by YC or bootstrapped?


- I intended for anyone to be able to use this - so def no worries - No pricing yet, everything is free since it's early beta, but I'm thinking a subscription for a bunch of credits style like Midjourney. - Yes, we are YC


I'd definitely pay. I can draw and have multiple comics I'd like to do, but am not so good that I could commit to the time involved for long-form. This would be ideal as an assistive technology.


Thank you! I'd love you to play with it as much as you can. Can you go to create.shortbread.ai, click on "login to save projects", and then send me an email at fengjiao@shortbread.ai. I'll give you a bunch of free credits to play with


I like your sales technique :D


hue hue hue (evil founder laugh)


I can't log in with google rn (overloaded servers?) but I'll try again tomorrow.


Very cool! I was briefly toying with something similar just last week.

Main challenge I see is character consistency. I really like the way you set up the prompts, but even so:

Outfit - a simple black sleeveless gi with white pants, a black belt tied around his waist

In two consecutive panels, the output swaps the colors (first panel gets it right, second panel has white gi, black pants).

Curious how you'll tackle this challenge!


Hi Alex, great catch -> We didn't solve consistency, but we saw that if you regenerate a few times, you usually get something that's visually similar. Right now AI artists all do loads of postprocessing - using AI, so later we might have a "smear" feature that inpaints the inconsistent part. Let me know if you have thoughts on this


People can generate a few good samples of what they want a character to look like and then interrogate clip to get a more detailed prompt that makes it more consistent. It increases the prompt sizes a lot, but I don't think there's an easy way to solve this.

Maybe you could build a UI that semi-automates this process?


Could eventually use GPT4vision to review the output of the art to see if all the panels are consistent?


Amazing, this is the tool I have been looking for a long time. I can create a plot, but I struggle to portray it. This helps me.

ChatGPT vs Jarvis : https://create.shortbread.ai/viewer/9a646605-124d-4dfe-8dca-...


I'm always amazed when you type in things like "Gothic Quarter of Barcelona", "at a home in Arcosanti Arizona", "busy Manhattan street", "red carpet at the Oscars", etc, it basically nails the background in any style.


that's way more experimental than my prompts! glad it held up. From what I know if you try to generate animals as main characters it breaks p bad lol


right, but you can have someone walking, say, a pig on a leash and you're in business. Also try something like "man in realistic dog costume" for anthropomorphism


This looks interesting. I am trying to generate comics for my daughter to explain graphically some upcoming events so she hopefully can understand them better or is prepared.


Hope it helps!


Very cool! I wasn't able to get much other than manga though haha.


Do you super impose the text or make AI generate the text? Are there good tools to make AI generate text on images ?


Do you mean the text in the speech bubbles? Those are generated by the AI and we fill them in the bubbles. If you’re seeing text in images like a billboard, it probably isn’t that great quality. I heard Dalle3 does pretty good text in image!


Thank you very much! I was able to create comic strips from it. :-D


Saving as a JPG gave me a cropped version that cuts off the bottom, just fyi


Congrats on the launch!

Whats the SD workflow? Which models, tools (controlnet, etc) do you use?


This is really cool. :D

There’s a typo on the message founders page.


Ty just fixed it. whew


This is really impressive, well done!!


Ty Ty


I love “as seen on y combinator”


hahaha sorry the urge of putting on a nice looking logo overtook me




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: