The data seems to show the opposite to me - despite scores being all over the place, the mean is very reliable. When a 2 or lower is considered a fail, those who consistently rate ~2.5 fail about half of their interviews while those who consistently rate ~3.0 fail only 10%. Of course, the probability that a candidate failed an interview approaches 1 as they are subject to more and more interviews. That the test has both false negatives and false positives does not invalidate the test. In fact, that the test is accurate despite the false positives and the false negatives ought to do the opposite. If a single bad interview invalidates a candidate for company A, that doesn't mean that the candidate won't go on to pass all of their interviews with company B.