It's 88% correct classification on a validation data set consisting of 154 patients diagnosed with ASD. Their validation dataset did not contain a "typically developing" population since there's only one dataset for that data and it was consumed in their model generation. Their training data resulted in a 5% misclassification rate of their controls - that is 5% of patients that were not diagnosed with ASD in current tests would be classified as having ASD in this scheme.
It's not a genetic test, and it isn't a test for something that defines autism, so you're begging the question IMO. Sure, human observers are imperfect, but that doesn't mean you can choose an arbitrary test that is precise and make it the standard just because. "We need something better than human judgement, and this is something, thus it is better"
The full paper is here: https://onlinelibrary.wiley.com/doi/10.1002/btm2.10095