Hacker News new | past | comments | ask | show | jobs | submit login

Very curious how it performs on OCR tasks compared to InternVL. To be competitive at reading text you need tiling support, and InternVL does tiles exceptionally well.



I think CogVLM2 is even better than Intern at OCR (my usecase is extracting information from an invoice)


After some superficial testing I with bad quality scans you can find on kaggle I can not confirm that. CogVLM2 refuses to handle scans that InternVL-V1.5 still can comprehend.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: