The question you have to ask is how inaccurate the data can be while passing the CAPTCHA challenge. The people trying to pass it at scale using bots or farms don't care to pass it correctly.
I agree, but since you don't know which part of the data is for validation and which one is for training, and since you can assume such a system will only accept data that makes consensus, you tend to only accept data that is valid. That's the beauty of it.