Does the vision-language-model process raw image data, or does it process OCR ch... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

Technotroll 11 months ago | parent | context | favorite | on: PaLI-3 Vision Language Models

Does the vision-language-model process raw image data, or does it process OCR character output?

bigfudge 11 months ago [–]

Gpt4v seems to be doing the former, at least in my experiments with it. It interprets plots and categorises images.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact