Hacker News new | past | comments | ask | show | jobs | submit login

Accuracy. Every solution I've seen that relies on automatic crawling will eventually have a parsing error when someone changes their sentence structure of a press release.

It's not so obvious when you're looking at the breaking releases for a few stocks or companies, but historical records have at least 1 error per stock per year.




So split your stream:

    1. Data matching expectations (you do have a definition of correct, right?)

    2. Log for manual review -> manual inserts or correction and placed into queue for (1)
Monitor (2). When inserts start trending up, it may be time to update your processing logic.


I came up with a similar idea for a company several years ago where we had a team of people doing data entry from faxed documents. I wanted to build something that would do all the OCR it could and then display it to users to verify, which should have been a 10 times efficiency increase, not to mention speed and accuracy.

The idea was rejected, they wanted either a perfect solution or nothing. I don't know why, but for some reason the idea computers removing humans is acceptable to management, but computers augmenting humans wasn't.


right, it depends on what the reward for high precision vs high recall is




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: