Top papers claim 99.5% accuracy. Since this is measured on a 10,000-sample evalu...

Top papers claim 99.5% accuracy. Since this is measured on a 10,000-sample evaluation set, there are 50 misclassified results.

I'd love to see a table of them. Back in the day, papers claiming top performance on MNIST would just start to enumerate all of the misclassified test samples, e.g. https://www.researchgate.net/figure/All-of-the-misclassified... from 2002. Newer papers have whittled this down to five misclassified digits, e.g. see §4.4 from https://arxiv.org/pdf/2001.09136v6.pdf for this particular flourish, though this isn't quite standard protocol (sample is counted as correct if any models in the ensemble match, not the majority)