I still can't access the hosted model at meta.ai from Puerto Rico, despite us being U.S. citizens. I don't know what Meta has against us.
Could someone try giving the 90b model this word search problem [0] and tell me how it performs? So far with every model I've tried, none has ever managed to find a single word correctly.
Both Llama 3.2 90B and Claude 3.5 Sonnet can find "turkey" and "spoon", probably because they're left-to-right. Llama gave approximate locations for each and Claude gave precise but slightly incorrect locations. Further prompting to look for diagonal and right-to-left words returned plausible but incorrect responses, slightly more plausible from Claude than Llama. (In this test I cropped the word search to just the letter grid, and asked the model to find any English words related to soup.)
Anyways, I think there just isn't a lot of non-right-to-left English in the training data. A word search is pretty different from the usual completion, chat, and QA tasks these models are oriented towards; you might be able to get somewhere with fine-tuning though.
Try and find where the words are in this word puzzle
undefined
'''
There are two words in this word puzzle: "soup" and "mix". The word "soup" is located in the top row, and the word "mix" is located in the bottom row.
'''
Edit:
Tried a bit more probing like asking it to find spoon or any other word. It just makes up a row and column.
I'm not implying anything. It's just frustrating that despite being a US territory with US citizens, PR isn't allowed to use this service without any explanation.
Image models such as Llama 3.2 11B and 90B (and the Claude 3 series, and Microsoft Phi-3.5-vision-instruct, and PaliGemma, and GPT-4o) don't run OCR as a separate step. Everything they do is from that raw vision model.
Could someone try giving the 90b model this word search problem [0] and tell me how it performs? So far with every model I've tried, none has ever managed to find a single word correctly.
[0] https://imgur.com/i9Ps1v6