Img2Prompt – Get prompts from stable diffusion generated images

OscarCunningham · on Feb 8, 2023

So I tried it with an image of a monkey that I often use for profile pictures (https://mathstodon.xyz/@OscarCunningham). This image wasn't made by Stable Diffusion. It gave me this prompt:

> a monkey plushie on a white background, photograph taken by steve buscemi from a zoom lens, studio lighting, ultrarealistic

Can someone tell me what Steve Buscemi is doing here?

badcppdev · on Feb 8, 2023

Can you clarify whether or not Steve Buscemi actually took the photo?

tbalsam · on Feb 8, 2023

This one also had me rolling, haha. Thank you, too. <3 :))))

the_generalist · on Feb 8, 2023

How do you do fellow humans?

rf15 · on Feb 8, 2023

This sounds a lot like the results you get from CLIP interrogation. Maybe they just use that and made another online service for it?

sahil_chaudhary · on Feb 8, 2023

It is actually based on a different approach, it uses an image-captioning model finetuned on image-prompt pairs

xkapastel · on Feb 8, 2023

CLIP Interrogator uses BLIP, an image captioning model, as well as trying a bunch of prompts with CLIP. I guess you mean that this model uses the captioning model to generate the complete prompt? Is the code for this one available?

sahil_chaudhary · on Feb 8, 2023

Ah yes, this model treats this purely as image captioning. The model isn't open source yet.

jabiko · on Feb 8, 2023

Maybe it was trained on a image set of politicians. I put in a image Dr. Evil doing air quotes and it came up with "FirstName LastName from FirstName LastName in star trek the next generation ( 2005 ) ( 2 0 1 9 )".

FirstName LastName being the name of a politician.

klondike_klive · on Feb 8, 2023

This is the thing about AI that I find simultaneously wonderful and terrifying. Something to do with me as a human, noticing a hilarious detail amidst a fathomless ocean. It affirms my humanity but the backdrop is dizzying randomness.

futhey · on Feb 9, 2023

I generated a bunch of variations and eventually got something kinda close to your source, but it seems pretty hit or miss: https://i.pica.so/b014bb2f-ca05-499a-a4da-1d178f49bb06.jpg https://i.pica.so/5ee6e122-c2a4-44a7-8ae9-ecec24da0246.jpg

sublinear · on Feb 8, 2023

Steve Buscemi -> Parting Glances -> AIDS -> monkeys ???

tbalsam · on Feb 8, 2023

I'm dying laughing right now, this is phenomenal comedy gold.

Thank you.

aspyct · on Feb 8, 2023

Pictures with zoom lenses, obviously :D

VicVee · on Feb 8, 2023

His style is just so beautifully unique

zardo · on Feb 8, 2023

A photo taken in the style Steve Buscemi would have if he were a photographer.

speedgoose · on Feb 8, 2023

Isn’t it just CLIP ? The model that made these image generation models possible.

It’s good to describe a picture but it’s not reverse engineering. The predicted prompt usually has very little in common with the actual prompt. And it’s worse when you use embeddings or fine tuned models.

Sophira · on Feb 8, 2023

What's interesting to me is that it even tries to predict the prompt on images that came straight from Stable Diffusion with no editing - which is weird because such images actually do have the prompt embedded inside of them already. (At least, that's the case for me - the prompt and parameters are stored in a tEXt chunk in the PNG file, which can be read with, for example, "pngcheck -t".)

sahil_chaudhary · on Feb 8, 2023

True, images generated through some UIs have prompts in meta data, aim here is to work on images people find online with no metadata. So it doesn't try to read the metadata but actually predict a similar prompt

CapsAdmin · on Feb 8, 2023

This is something specific to the automatic 1111 version. It's just a setting, but I believe it's on by default.

jhbadger · on Feb 8, 2023

Not just automatic 1111. Other SD forks like InvokeAI also embed the prompt in the png.

sahil_chaudhary · on Feb 8, 2023

It is based on an image-captioning model, so a different approach then CLIP interrogator, though you are correct that aim is not to get the exact prompt back but actually get a prompt to generate similar styles of images

yreg · on Feb 8, 2023

I'm surprised that the results seem much better (more detailed and sometimes closer to the original prompt) than the regular CLIP interrogation (at least based on my limited experimentation).

But as you say, even so, it still has little in common with the original prompt.

bheadmaster · on Feb 8, 2023

I usually have problems coming up with prompts to generate the kind of images I want.

This tool is useful for reverse-engineering prompts from the kind of images I want, then generating new ones in the same style.

Very cool.

sahil_chaudhary · on Feb 8, 2023

Glad you like it

martin-adams · on Feb 8, 2023

I'm having a lot of fun dropping in my Midjourney images, getting a more detailed prompt, then putting it back into Midjourney for more interesting variations

sahil_chaudhary · on Feb 8, 2023

Glad you enjoy it

isoprophlex · on Feb 8, 2023

So, people are commenting that it's not very accurate etc., but I love it. Delightfully quirky tool for exploring prompts.

Also I laughed out loud after putting a selfie into it and getting "mark zuckerberg's face reflected in a mirror, close up, realistic photo, medium shot, dslr, 4k, detailed"

sahil_chaudhary · on Feb 8, 2023

Glad you enjoy it! It hasn't been trained on non-generated images so is unpredictable when uploading a real photo, specially one with people in it

sahil_chaudhary · on Feb 8, 2023

Shoutout to https://banana.dev , couldn't have made this demo without their hosting

mock-possum · on Feb 8, 2023

you spoke too soon, POSTs to https://www.img2prompt.io/api/banana are 504-ing

sahil_chaudhary · on Feb 8, 2023

That was an issue on my end, should be fine now

batterseapower · on Feb 8, 2023

Related - here's a fun short sci-fi story about a savant ("the prompt whisperer") who is able to intuit the prompt that was used to generate things: https://interconnected.org/home/2022/08/03/whisperer

FeepingCreature · on Feb 8, 2023

Hee, cute, but predictable once the ... showed up. Honestly I think the ending was unnecessary; it destroys all subtlety.

Of course, this is not actually how latent space works. It's the AI's understanding of concepts, not the inherent nature of concepts; that's why every model has its own version of "latent space". Though the understanding of latent space in the story is internally consistent; given a superintelligent image generator, you could do prompt engineering like this.

thanatropism · on Feb 8, 2023

Hugged to death already?

I often use terms like "sexy", "risque" etc. in the process of getting images that are quite sensible (like military people playing chess). I use img2img repeatedly looking for particular photo-film aesthetics and tend to accumulate prompts. Anyway, this would open me to charges of sexism (or worse "misogyny"), and makes me uneasy about using SD.

Edit/OH: it generates prompts for like Excel screenshots but not for images made with the img2img model at hugginface. Fascinating.

thanatropism · on Feb 8, 2023

Why does it choke on img2img creations? This is just fascinating. It gives a plausible prompt to at least a handful of real non-AI photos from DSLRs and iphones alike.

sahil_chaudhary · on Feb 8, 2023

The dataset used to train this model didn’t have any img2img data so that would explain it

thanatropism · on Feb 8, 2023

I mean, it does mistake Frank Sinatra for Louis Armstrong -- I guess it makes errors. But it just refuses to process my img2img images. It breaks down. Why?

It made me so agitated I made a gallery[0]. Granted, some of those images are strange, but others are just normal people doing normal things.

[0]: https://publish.obsidian.md/zero-chroma-infinity/Image+galle...

dwringer · on Feb 8, 2023

img2img creations are probably confusing it because they are totally rearranging how the prompt emerges from what's in the latent space with respect to a new image. So if you make an image of <subject> by passing in an image of <subject>, it's going to represent <subject> in a fundamentally different way than if it relies purely on its own "imagination" for the rendition.

kir-gadjello · on Feb 8, 2023

Shameless plug: I have a similar open-source tool which uses locally executed pre-trained models, here https://github.com/kir-gadjello/extract_prompt

sahil_chaudhary · on Feb 8, 2023

Nice, does this use clip-interrogator?

kir-gadjello · on Feb 8, 2023

Yes, it's strongly influenced by clip-interrogator, but I revamped the algorithm quite a bit. I think it could be improved even further without resorting to fine-tuning the BLIP model.

mythz · on Feb 8, 2023

Yeah I wouldn't say it's very close:

Tried it on last image I generated on: https://blazordiffusion.com/artifacts/50/50418_studio-ghibli...

Original Prompt:

> Studio ghibli, rocket explosion, jungle, solar, green technology, optimist future

> 8k, Bokeh effect, Cinematic Lighting, Octane Render, Iridescence, Vibrant

> by Beeple, Asher Brown Durand, Dan Mumford, Greg Rutkowski, WLOP

Img2Prompt:

> a vehicle in the grass, colorful light dust, cinematic lighting, trending on artstation, ultra detailed, art by akihito yoshida

Looks like a decent image classifier, but not useful for extracting the original stable diffusion prompt.

sahil_chaudhary · on Feb 8, 2023

Agreed, results not good for this style of images. I have a model training on a much bigger dataset of image-prompt pairs which should perform better on this.

Der_Einzige · on Feb 8, 2023

How is this different from image captioning when the model used is a booru model? That's already a thing people do with making their training data for fine tuning these models.

sahil_chaudhary · on Feb 8, 2023

It actually works on top of an image captioning model, SD takes in keywords as well like "artstation" and "octane render" which are not covered in standard captioning so that is why the difference between using an off-the-shelf captioning model vs this

VadimPR · on Feb 8, 2023

Doesn't seem to be working - an alternative is to use https://tinybots.net/artbot/interrogate, which is a front-end to the crowdsourced https://stablehorde.net network.

hnarayanan · on Feb 8, 2023

Ok, this is very cool! I took one image from a series I generated, had it guess a prompt (very different from mine, but it doesn't matter) and have it regenerate an another image that captures the same feeling: https://imgur.com/a/Jz0mBej

sahil_chaudhary · on Feb 8, 2023

Nice, that's the aim, not to get the actual prompt back but get a prompt which can generate same style of images

shagie · on Feb 8, 2023

Something similar (image to text description) from a bit ago - Seeing AI app from Microsoft (2017) - https://youtu.be/bqeQByqf_f8

It's not prompt based intended to generate another one, but rather an accessibility tool.

And some related videos:

Seeing AI 2016 Prototype - A Microsoft research project - https://youtu.be/R2mC-NUAmMk

Seeing AI: Making the visual world more accessible - https://youtu.be/DybczED-GKE

petesergeant · on Feb 8, 2023

folks, the AI called me "beautiful" and said I look like Chris Pratt, despite being a middle-aged and overweight computer programmer. They need to monetize this immediately.

msla · on Feb 8, 2023

I tested it with an actual (albeit colorized) photograph:

https://i.imgur.com/GpGG0SL.jpg

And got this prompt:

> the man with the stupid face of a homeless person, portrait photography, 1 9 7 0 s, street photo, old photography, highly detailed, hyperrealistic

The actual description is that she was Mary Ann Bevan (1874 - 1934) also known as Rose Wilmot, a woman who claimed the title of the ugliest woman in London as she suffered from acromegaly.

sahil_chaudhary · on Feb 8, 2023

The model was trained exclusively on stable diffusion generated images, so can be unpredictable with non-generated images, specially images with people in it

philip-b · on Feb 8, 2023

Tbh I think that prompt is spot on.

tbalsam · on Feb 8, 2023

Bad form, poor taste. Preferably not anywhere, but not on Hacker News, please.

Thank you.

smudgy · on Feb 8, 2023

It's not terribly bad at guessing!

I made a few generic military guys for a side project and the actual prompt isn't too far from what I used.

henriquecm8 · on Feb 8, 2023

I sent a picture of Mr. Spock from the first pilot, looking back while walking on the transporter[1]. And it generated this prompt:

> john cena walking on stage at television talk show, very coherent!!!!!!!!!!!!!!!!!!!!!!

[1] https://i.imgur.com/L6hbWHX.jpg

totetsu · on Feb 8, 2023

interesting.. but my guess is it's using a big library of generated image and prompt pairs? So all its suggested prompts are right out of someones 'stable diffusion prompt cheatsheet.pdf' . That is to say overly outputting commonly known artists, and things like 'trending on deviant art'

sahil_chaudhary · on Feb 8, 2023

It works by using an image-captioning model finetuned on SD prompts, so it may be outputting common known artists based on their occurrence in the training data

xena · on Feb 8, 2023

This doesn't work for anything that doesn't use the upstream default stable diffusion checkpoint. I generate a lot of images with Pastel-mix, Anything, Waifu Diffusion and Counterfeit, and none of those are giving sensible results with this tool.

sahil_chaudhary · on Feb 8, 2023

The model underneath this is trained only on data from SD 1.4/5 so this would be expected. I have another model training which covers all models you mentioned which should perform well on those

thanatropism · on Feb 8, 2023

Can I have some kind of direct contact with you? Email, Twitter DM, Telegram, heck, I would download some new kind of app just to have some conversations.

sahil_chaudhary · on Feb 8, 2023

You can find my email on my HN profile

xena · on Feb 8, 2023

I'd be interested to see the results of that. My email address is on my Hacker News profile if you want to talk there.

vidarh · on Feb 8, 2023

I tried it with a cat image, and it didn't quite capture the feel:

https://m.galaxybound.com/@vidar/109829298036416109

sahil_chaudhary · on Feb 8, 2023

Interesting, it hasn’t been tested extensively for non-generated images.

vidarh · on Feb 8, 2023

It called a picture of my girlfriend "unbelievably cute" so she's now a fan.

And it described my profile picture on Mastodon as "my husband from the future that looks similar to travis scott and mark owen. he is also a good boy, very!!!, and beautiful!!!"...

Not sure how to take that ;)

worldsavior · on Feb 8, 2023

Cool! I also how a project that does image captioning: https://github.com/DavidHuji/CapDec

johtso · on Feb 8, 2023

Enjoying playing with this! It would be great if the generated images had their prompts embedded in the metadata..

vmarius · on Feb 8, 2023

Whenever I try I get 500 error :/

geepytee · on Feb 8, 2023

Can we also reverse engineer text? Would love text2prompt, super helpful to get better at prompting.

pwillia7 · on Feb 8, 2023

Can someone explain how this differs in method and efficacy to CLIP?

fire · on Feb 8, 2023

this is neat, do you have any docs/posts about it? I presume it isn't on github?

sahil_chaudhary · on Feb 8, 2023

Not yet, looking into creating a write up on how it works and possibly open-sourcing

isuleman · on Feb 10, 2023

Not working.