Hacker News new | past | comments | ask | show | jobs | submit login

http://tika.apache.org/ "detects and extracts metadata and structured text content from various documents using existing parser libraries". I use it all the time for input to solr.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: