Entertaining is indeed the right word. Nice job identifying corner cases of mode...

Entertaining is indeed the right word. Nice job identifying corner cases of models' visual processing; curiously, they're not far conceptually from some optical illusions that reliably trip humans up. But to call the models "blind" or imply their low performance in general? That's trivially invalidated by just taking your phone out and feeding a photo to ChatGPT app.

Like, seriously. One poster below whines about "AI apologists" and BeMyEyes, but again, it's all trivially testable with your phone and $20/month subscription. It works spectacularly well on real world tasks. Not perfectly, sure, but good enough to be useful in practice and better than alternatives (which often don't exist).