Hacker News new | past | comments | ask | show | jobs | submit login

>I could (well actually I can't)

I like the idea that these models are so good at some sort of specific and secret bit of visual processing that things like “counting shapes” and “beating a coin toss for accuracy” shouldn’t be considered when evaluating them.




LLMs are bad at counting things just in general. It’s hard to say whether the failures here are vision based or just an inherent weakness of the language model.


Those don't really have anything to do with fine detail/nearsightedness. What they measured is valid/interesting - what they concluded is unrelated.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: