Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for the answer.

Wikipedia has a lot of tables so I was wondering if content-aware sentence chunking would be good enough for Wikipedia.

https://www.pinecone.io/learn/chunking-strategies/




mwparserfromhell can parse the text content without including tables




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: