Or, if published, we could develop tools to normalize the data and improve the s...

chejazi · on March 15, 2016

This is the right approach. Like what OCR does for digitizing volumes of literature.

dekhn · on March 15, 2016

OCR for historical literature is a really hard problem. Beyond the fact that OCR quality is pretty low, perception of the meaning of the papers isn't addressed by that. So it makes it easier to access (can read papers on the net rather than going to a library) but that's about it.

chejazi · on March 15, 2016

> it makes it easier to access (can read papers on the net rather than going to a library) but that's about it.

I disagree. The text is indexed and searchable.

> the meaning of the papers isn't addressed by that.

I agree. I wonder if starting with research papers that have lots of symbolic logic (e.g. math proofs) would be the easiest starting point for a system like this.