This would be pretty great for generating descriptions for the vision-impaired, ...

xg15 · 2024-12-15T18:27:34 1734287254

My guess they had "Elaborate some subtle details of the photo" or "What conclusions can you draw from the situation?" or something as some instruction in the prompt, because it seems to try this with any photo, regardless if there are any noteworthy details or implications in it or not.

I get the idea - demonstrating some "Sherlock Holmes style" inference of hidden facts from the photo - but it gets ridiculous if there is nothing for the model to find.

exadeci · 2024-12-17T00:22:08 1734394928

Facebook has been doing for years a basic alt-text generation (not the best but better than nothing) eg: May be an image of 4 people, people smiling and text