Hacker News new | past | comments | ask | show | jobs | submit login

I've had great luck just base64'ing images and asking Qwen 2.5 VL to both parse it to markdown and generate a title, description and list of keywords (seems to work well on tables and charts). My plan is to split PDFs into pngs first then run those against Qwen async, then put them into a vector database (haven't gotten around to that quite yet).





How does the base64 output become useful / usable information to an LLM?

No idea but Qwen 2.5 VL seems to understand it all quite well.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: