My experience with MTurk is that 3 isn't enough runs if you need the data to be correct and can't afford to pay someone (who ISN'T from MTurk) to validate every entry.
We regularly ran into these two situations:
- All three workers got different answers
- Two of the three workers agreed on the wrong answer
I think five or more runs may be necessary for data transcription on MTurk.
You should consider using qualifications / simplifying the requests.
The error rate I get for data entry tasks is around 0.5%-1% discrepancy between double entry. If you use prior reliability of the worker to tie break between who's right it drops to <0.1% error rate.
I mean, this is an issue in any annotation exercise. Most annotation work heads south due to a failure to create a entire, discrete and complete workflow/ classification.
We regularly ran into these two situations:
- All three workers got different answers
- Two of the three workers agreed on the wrong answer
I think five or more runs may be necessary for data transcription on MTurk.