http://tika.apache.org/ "detects and extracts metadata and structured text conte...

Maxious on April 12, 2012 | parent | context | favorite | on: Amazon CloudSearch - Start Searching in One Hour f...

http://tika.apache.org/ "detects and extracts metadata and structured text content from various documents using existing parser libraries". I use it all the time for input to solr.