Hacker News new | past | comments | ask | show | jobs | submit login

Tangentially related: is anyone here familiar with automated analysis of research papers? I feel this is a field that would perfectly lend itself to machine learning, or similar approaches.

It would require extraction of meaningful data, be therein lies the rub. There are several large projects looking into collecting/sharing scientific data, mainly focusing on persuading and empowering researchers to share their full data sets.

But what can we do where the raw data is not available? Is anyone working on ways to reliably extract data from the millions of PDFs in research databases?

There are several open-ended questions here. If anyone knows of work in these areas, or is interested in this, I'd be very keen to talk.




Only materials related, but at http://www.citrine.io/ they do something like that. I recently attended a talk given by one of the team members (Bryce) and they seemed to be quite open to discussions.


Andrej Karpathy (OpenAI) does this to an extent with machine learning papers from arXiv: http://www.arxiv-sanity.com/


The field is called text mining or literature data mining.

https://en.wikipedia.org/wiki/Text_mining




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: