Hacker News new | past | comments | ask | show | jobs | submit login

This would be pretty great for generating descriptions for the vision-impaired, but it doesn't provide any profound insight beyond what you can tell from a glance.

It has a lot of "trying to sound smart" waffle, for example, it had this to say about some tree branches:

> A careful observer will also note the subtle variations in the thickness and texture of the branches, implying a natural, organic growth pattern.

Gee, thanks, I might've thought it was an unnatural inorganic tree otherwise.




My guess they had "Elaborate some subtle details of the photo" or "What conclusions can you draw from the situation?" or something as some instruction in the prompt, because it seems to try this with any photo, regardless if there are any noteworthy details or implications in it or not.

I get the idea - demonstrating some "Sherlock Holmes style" inference of hidden facts from the photo - but it gets ridiculous if there is nothing for the model to find.


Facebook has been doing for years a basic alt-text generation (not the best but better than nothing) eg: May be an image of 4 people, people smiling and text




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: