Hacker News new | past | comments | ask | show | jobs | submit login
Get any AI to generate an image of a glass of wine that is full to the brim (reddit.com)
63 points by aimazon 20 days ago | hide | past | favorite | 18 comments



The prompting strategies in this post made me remember a funny anecdote from this Thanksgiving. My older family members had been desperately trying to get ChatGPT to write a good poem about spume (the white foam you see in waves), and no matter how many ways they explicitly prompted it to not write in rhyming couplets, it dutifully produced a rhyming couplet poem every time. There’s clearly an enormous volume of poems in the training data written in this form, and it was practically impossible to escape that local minimum within the latent space of the model, like the half-full wine glass imagery. They only succeeded at generating a poem written the way they wanted when they first prompted ChatGPT to reason through the elements of good poetry writing regardless of style, and then generate a prompt to write a poem following those guidelines. Naturally, that produced a lovely poem on the first attempt with that prompt!

It’s pretty well known at this point, but it seems like when it comes to prompting these models, telling them what to do or not do is less effective than telling them how to go through the process of achieving the outcome. You need to get them to follow steps to reach a conclusion, or they’ll just follow the statistical path of least resistance.

Edit: the poem: https://paste.ee/d/rIbLa/0


thanks for this comment ! it clarifies the function of the llm well.

ie, use it as a template-generating search-engine helper for most common things. for uncommon things, you have to prompt-guide it to get what you want.


This aligns with my experience using image generators —- I can get them to generate really weird unique combinations of things (ie “an octopus dancing with a cat”) but when asking for a relatively common image with 1-2 unique aspects they seem to just generate the common image.

The folks who were able to get it by having ChatGPT generate a really detailed prompt is kind of interesting. When I’ve tried writing my own detailed prompts, it seems like the limit on detail is relatively low before it starts going completely off the rails.


It’s not possible if there are ~no images like this in the training set.


These AIs can generate images that are quite different to anything found in the training set. This case seems more about overfitting.


Have you read all the comments?

Some users achieved it by having the chatbot describe in detail a glass full of wine, and then use that output in a new context to generate the image.

I think that's a very interesting take away.


This is the approach I use to get ChatGPT to generate images that get around copyright. E.g.

Me: Create something in the style of Escher AI: Can't due to copyright Me: Precisely describe in detail <insert artwork here> such that I can use it as a prompt to generate an image AI: <prompt> Me: <prompt>

9 times out of 10 it works really well.


In all fairness there probably aren't any images of Gordon Ramsey riding an Ostrich on the moon in its training set either but it manages that.

I tried this prompt several times in Ideogram, both as realistic and also design-based images and it couldn't do it at all.

I haven't yet tried it with a more elaborate prompt but it's interesting to me that it can do the most incredible and amazing things but can't do something that sounds simple.


It wouldn’t be able to do that without an ostrich in the training set. There is a subtle but important difference between combining and what is being combined.


Maybe take a look through the linked thread, some did manage to do so.


I eventually went the route of describing capillary effects of water, changing the water color, changing the type of water container.

Managing the travel through the knowledge graphs is becoming a skill :)


Why would this be the case? These AI image generators can generate very weird combinations of stuff that certainly aren’t shown in a training image.


looking at the wine-red beret perched on top of the glass

https://preview.redd.it/sije3nm0imwd1.jpeg?width=1024&format...

I think the backend needs to be tuned on the fact that anything, not just hats, can have a brim


sort of nonsensical to link to old.reddit.com which shows pictures only as an `<image>` placeholder


Saved me. I am used to open a pic-link into a new tab to see pictures. The "normal" reddit is almost dysfunctional for me. (I am using noscript to narrow down to actually really needed things, but reddit wants too much and even with scripts allowed needs pletora of clicks to even only get a normal thread of comments and it seems to load half of yet another OS into my browser, which takes too long)


Everytime I somehow go to reddit without the old. subdomain I think "How does anyone use this? How is it still around." Just happened to me again yesterday.


There are various userscripts (and add-ons?) that replace the links with its inline images.

I submitted it with a normal Reddit link. @dang did some mechanism automatically replace the Reddit link with old.?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: