Why do you think it's probable? The much smaller llava that I can run in my cons...

_flux 3 months ago | parent | context | favorite | on: Vision language models are blind

Why do you think it's probable? The much smaller llava that I can run in my consumer GPU can also do "OCR", yet I don't believe anyone has hidden any OCR engine inside llama.cpp.