Accuracy. Every solution I've seen that relies on automatic crawling will eventu...

greggyb · on June 8, 2017

So split your stream:

    1. Data matching expectations (you do have a definition of correct, right?)

    2. Log for manual review -> manual inserts or correction and placed into queue for (1)

Monitor (2). When inserts start trending up, it may be time to update your processing logic.

flukus · on June 8, 2017

I came up with a similar idea for a company several years ago where we had a team of people doing data entry from faxed documents. I wanted to build something that would do all the OCR it could and then display it to users to verify, which should have been a 10 times efficiency increase, not to mention speed and accuracy.

The idea was rejected, they wanted either a perfect solution or nothing. I don't know why, but for some reason the idea computers removing humans is acceptable to management, but computers augmenting humans wasn't.

platz · on June 7, 2017

right, it depends on what the reward for high precision vs high recall is