>I could (well actually I can't) I like the idea that these models are so good a... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

jrflowers 3 months ago | parent | context | favorite | on: Vision language models are blind

>I could (well actually I can't)

I like the idea that these models are so good at some sort of specific and secret bit of visual processing that things like “counting shapes” and “beating a coin toss for accuracy” shouldn’t be considered when evaluating them.

valine 3 months ago | [–]

LLMs are bad at counting things just in general. It’s hard to say whether the failures here are vision based or just an inherent weakness of the language model.

vikramkr 3 months ago | [–]

Those don't really have anything to do with fine detail/nearsightedness. What they measured is valid/interesting - what they concluded is unrelated.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact