This is really cool. I posted a photo of what I think was my great grandparents into it and it explained their circumstances in fascinating ways (to the point of mentioning aged clothing, a detailed I overlooked).
I’ve been trying to figure out how to process hundreds of my own scanned photos to determine any context about them. This was convincing enough for me to consider google’s vision API. No way I’d ever trust OpenAI’s apis for this.
Edit: can anybody recommend how to get similar text results (prompt or processing pipeline to prompt)?
I use ChatGPT every day and I just throw a pic and say "alt text", that will give you insane detail, but also limited because the prompt itself insinuates a shorter description for a HTML tag.
I just threw a pic in here of my gf holding a loaf she just made and part of it said "The slight imperfections on the bread's crust indicate it's freshly baked, and the woman's posture and facial expression suggest that she is very pleased with her creation."
One company has the capacity to maintain HIPAA compliance and the other is best known for vacuuming up the entire web and users prompts. For something as sensitive as family photos, I know which company/product I'd prefer for this potential project.
Google's mission statement is to "organize the world's information", and the only thing stopping them is when they run into copyright laws or paywalls.
OpenAI (and indeed all the LLM providers) has gone almost as far as it can usefully go with bigger training sets, even without literally everything on the web, and now try to make the models smarter in other ways.
(OpenAI may also lose their current copyright lawsuits because laws don't care that both an LLM and PageRank are big matrix multiplications, they care about the impact on rights holders).
"your ancestor has this indicator of that hereditary disease, good morning your health insurance now costs you 1.5x and we don't actually have to explain why"
Yeah, I know the point of this site is to give us a dystopian shock by showing us how much information Big Tech extracts from our photos, but it's inadvertently a pretty good advertisement for Google's Vision API. It did a fantastic job of summarizing the photos I threw at it.
I mean I wouldn't trust either entity. If you're serious about maintaining some semblance of privacy then you should opt for a local solution such as BakLLaVa or Llama-3.2-Vision models.
I’ve been trying to figure out how to process hundreds of my own scanned photos to determine any context about them. This was convincing enough for me to consider google’s vision API. No way I’d ever trust OpenAI’s apis for this.
Edit: can anybody recommend how to get similar text results (prompt or processing pipeline to prompt)?