I'm going to sound like a skeptical jerk here, but 490,000 heartbeats is how many patients? From what I recall these public ECG datasets are like 20 patients who underwent longitudinal ECGs. 500k heart beats is like 5 person-days of ECG recordings.
Ninja Edit: N=~30 patients. For something like ECGs which are readily available, they really should have tried to get more patients. A single clinic anywhere does than 30 EKGs per day. Suggesting this is clinically applicable is ridiculous. It's way too easy to overfit. Chopping up a time series from one patient into 1000 pieces doesn't give you 1000x the patients.
I even think this approach probably will work. Very reasonable given recent work from Geisinger and Mayo. But why are ML people doing press releases about such underwhelming studies?
Yes, Table 5 shows that N is 18 without CHF and 15 with CHF. These come from separate data sets that have EKG data sampled at different frequencies.
Basically, they took 18 electrocardiographic tracings (sampled at 128 Hz) from participants without CHF, of whom 13 come from women. They compared them to 15 electrocardiographic tracings (sampled at 250 Hz) from participants with CHF, of whom 4 come from women.
A lot of machine learning people don't really understand study design or power or things like that. It's gotten a little better over the past decade or so, but this is an area where the field has a lot of room to improve.
I disagree. Machine learning education almost always involves a lot of focus on design of experiments, causal inference, A/B testing and related topics.
I could agree with your claim if you meant bootcamp programs or data science sorts of coursework, but machine learning is generally grounded in both measure theoretic probability theory and a robust understanding of applied statistics before moving on. After that will be the basics of pattern classification, clustering, regression and dimensionality reduction. Last of all will be very domain-specific tools for NLP, computer vision, audio processing involving e.g. deep neural networks.
There are plenty of academic and academic center trained physicians that understand study design and are competent in research. They aren’t typically primary care/general practitioners so you just don’t encounter them as much. And yes they are the minority. But it’s not the totality.
Clinical research that isn’t making ridiculous claims tends to get much less press.
Furthermore, of all places to lap that crap up... this hacker news site is frankly one of the worst.
I mean look at this submission. Yes it’s true people are pillorying it here (including some doctors), but i don’t recall much interesting well designed medical research being discussed here (though arguably maybe not the place for it)
I mean, the issue would be in the structure of the cross validation approach. Say, set training = 29, test = 1, build models etc: how well did you do on the one? Rinse, wash hands, repeat 30x. This is your cross validation error rate.
Ninja Edit: N=~30 patients. For something like ECGs which are readily available, they really should have tried to get more patients. A single clinic anywhere does than 30 EKGs per day. Suggesting this is clinically applicable is ridiculous. It's way too easy to overfit. Chopping up a time series from one patient into 1000 pieces doesn't give you 1000x the patients.
I even think this approach probably will work. Very reasonable given recent work from Geisinger and Mayo. But why are ML people doing press releases about such underwhelming studies?