Shouldn't they be running each scan through the mechanical turk multiple times t...

danpalmer · on Sept 7, 2018

This is the dirty secret of Mechanical Turk – you're supposed to run it through multiple times according to much of the documentation, but people don't because then it becomes expensive enough that it's no longer an attractive option.

I work in a company who have considered it many times for various things, but between the Mechanical Turk API/tech being pretty terrible, and it all being expensive and low quality, we always either end up getting a temp in for a day or two to sit in front of Excel and tidy up data, or if it's a bigger process we outsource it to a data processing company in Bangladesh where we can have dedicated people on our account who sit in a shared Slack channel and who we can train.

swaggyBoatswain · on Sept 7, 2018

How expensive has mechanical turk gotten though over the years? I thought this was standard practice as well, running it multiple times to reduce human error and do random spot checking.

What is the current cost per HIT over years previously?

danpalmer · on Sept 7, 2018

The last time I looked at this, a few years ago, it was ~$0.10 per HIT, and 2-3 would be needed, and that was for very simple data processing. We have quite complex data processing requirements with multiple interdependent fields, and UI, which would have increased the processing time, so I'd have guessed $1 per item processed total, plus extensive integration time.

Our outsourcing gives is far better communication and the ability to train staff doing the processing over time, feedback on their performance, and help them get better. I don't know the figures, but I suspect it's a similar price but with far better accuracy, but we do have enough consistent work for this to make sense - if we were more spikey in our demand then it might not.

electroly · on Sept 7, 2018

My experience with MTurk is that 3 isn't enough runs if you need the data to be correct and can't afford to pay someone (who ISN'T from MTurk) to validate every entry.

We regularly ran into these two situations:

- All three workers got different answers

- Two of the three workers agreed on the wrong answer

I think five or more runs may be necessary for data transcription on MTurk.

ig1 · on Sept 7, 2018

You should consider using qualifications / simplifying the requests.

The error rate I get for data entry tasks is around 0.5%-1% discrepancy between double entry. If you use prior reliability of the worker to tie break between who's right it drops to <0.1% error rate.

imhoguy · on Sept 7, 2018

Does MTurk API allow to identify, rank and exclude workers? By identifying I mean get some common key for all given worker submissions etc.

RosanaAnaDana · on Sept 7, 2018

I mean, this is an issue in any annotation exercise. Most annotation work heads south due to a failure to create a entire, discrete and complete workflow/ classification.

misiti3780 · on Sept 7, 2018

yep, and then using this type of analysis to determine who is good and who is not: https://en.wikipedia.org/wiki/Inter-rater_reliability