Yup, right now we use GROBID, do some post processing and combine the output with other extraction techniques. For instance, we use a model to extract document figures[1], so that we can render them in the resulting HTML document.
Also, we're working hard on a new extraction mechanism that should allow us to replace GROBID [2].
There's a lot of really smart people at AI2 working on this, I'm excited to see the resulting improvements and the cool things (like this) that we build with the results!
Also, we're working hard on a new extraction mechanism that should allow us to replace GROBID [2].
There's a lot of really smart people at AI2 working on this, I'm excited to see the resulting improvements and the cool things (like this) that we build with the results!
[1]: https://api.semanticscholar.org/CorpusID:4698432
[2]: https://api.semanticscholar.org/CorpusID:235265639