Huh, where does it say that in the article? I don’t think I spotted that.
All the same, it feels sort of beside the point to me. It just doesn’t feel right to take a medical diagnostic tool - whose intended purpose is for communication among doctors - and treat it as a test score. That’s just... not what it was designed for.
In page 3, "Readers rated each case using the forced BI-RADS scale, and BI-RADS scores were compared to ground-truth outcomes to fit an ROC curve for each reader. The scores of the AI system were treated in the same manner (Fig. 3)."
This isn't as clear as I want it to be, but Fig. 3 shows both "AI system" and "AI system (non-parametric)" ROC curve. My understanding is that the former is fit from discrete BI-RADS class, and the latter is "raw" output.
All the same, it feels sort of beside the point to me. It just doesn’t feel right to take a medical diagnostic tool - whose intended purpose is for communication among doctors - and treat it as a test score. That’s just... not what it was designed for.