Hacker News new | past | comments | ask | show | jobs | submit login

Does the vision-language-model process raw image data, or does it process OCR character output?

Gpt4v seems to be doing the former, at least in my experiments with it. It interprets plots and categorises images.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
