This is correct, however, the distinction between labeling and training is artificial, and probably arises from the fact that ML came from academia, where it was not part of the business process.
I.e. a modern ML system should just plug into the business process from day 0, where the ML task should be performed by human and recorded by the machine.
After a while, the machine would train on this recorded data, and start replacing the humans.
> a modern ML system should just plug into the business process from day 0, where the ML task should be performed by human and recorded by the machine.
Ah, this is a typical thing I hear people in the Valley say: just push it all ... somewhere. No.
If we digitized all microscopy slides, it would require YouTube-scale storage several times over. People think genomics is big. People think reconnaissance imaging is big. They're big, but there's only so much of them.
IF it were digitized, there would be far more pathology whole slide imaging being generated every day. I did some estimates at one point and had to throw a couple orders of magnitude into the genomics data to even make it competitive at enterprise scale.
And keep in mind, we're talking clinical medicine. We want the data now. We're looking at the slides while the glue is still wet. You don't have the bandwidth, no one has the bandwidth, to do some of this stuff they way you propose and maintain the current "business process" of clinical medicine.
Building models and iterating, the old fashioned way, is the only way it makes sense.
They're fast, sure. But not very efficient in certain problem domains, specificially where humans are efficient (for reasons that are IMHO historical, not innate).
I.e. a modern ML system should just plug into the business process from day 0, where the ML task should be performed by human and recorded by the machine.
After a while, the machine would train on this recorded data, and start replacing the humans.
Rinse and repeat.