"We're eager to hear what you think, ..." I think I will stick with pdftohtml, p...

"We're eager to hear what you think, ..."

I think I will stick with pdftohtml, pdftotext, and pdfimages https://en.wikipedia.org/wiki/Poppler_(software). These take seconds not minutes.

From user perspective I dont understand why not release the source code and let people compile a native application. (Did I miss the link to the source code.) Instead it looks like this is just a means of collecting free data (metadata, more training data, data from submitted papers by default) everytime someone submits a paper.