$3 million machine learning prize

Rhapso · on Jan 29, 2011

So, essentially, this is a contest to make a way to predict who is most at risk for going back to the hospital.

while this sounds nice, there are some issues.

1. How can this do anything but hurt people? Medical professionals do all they can to keep people from returning to the hospital, explaining to patients what they should be doing in a medical sense, the only real use is to deny insurance or increase rates on "high risk" people.

2. Should they implement the winning solution, then act on it by sending additional "how to be healthy" propaganda or otherwise attempting to prevent those people, the pattern of behavior of will change accordingly, thus likely breaking the predictive capability.

This is not like the netflix "present better suggestions" problem. This does not need to be that fast, efficient, nor as creative. Just having a large set of statistics taken from the dataset (which seems rather small) and making a large Bayesian Network to crunch out the probability of needing medical care in a given time frame seems to be the best solution to the problem.

I am interested in seeing other views on these points. heavens, I might learning something about a field I am a dilettante in from a master. (ironically this is more the goal then being "right" is)

alextp · on Jan 29, 2011

You should read the new yorker story http://www.newyorker.com/reporting/2011/01/24/110124fa_fact_... . It answers your questions, mostly. For (1), if the health insurer is forced to treat those patients and acknowledges who they are they can spend a bit of money on preventive and follow-up care and save a lot of money on hospitalization, surgeries, etc. (2) This is true, but if the algorithm is retrainable (and it should be, as it's machine learning) there's the possibility that all you have to do is a bit of domain adaptation to keep things going; if this doesn't work, another contest 5 years from now will probably pay for itself.

The problem with your proposed solution is precisely that there seems to be far too little data points and far too many variables. Not only that but I expect most of the information to be in the interactions between variables and clever features that cover that. Most ways of learning bayesian networks don't work very well when you have to model interactions. I'd bet on the usual winning approaches for this sort of thing, which is clever boosting, matrix decomposition, and random forests, all of which can model interactions and somewhat deal with incomplete data.

mhb · on Jan 29, 2011

Why this will save money:

The Hot Spotters - Can we lower medical costs by giving the neediest patients better care? by Atul Gawande

http://www.newyorker.com/reporting/2011/01/24/110124fa_fact_...

On HN: http://news.ycombinator.com/item?id=2154579

bengebre · on Jan 29, 2011

The benefits of finding these folks are many:

http://kottke.org/11/01/controlling-healthcare-costs-by-focu...

mv · on Jan 29, 2011

"training dataset includes several thousand anonymized patients and will be made available"

That seems like an awfully small dataset. It also doesn't look like it is limited to one disease which would make the search space enormous especially if all the patients didn't have the same labs drawn!

If it was completely standardized data several thousand may be sufficient to train, but I think they are looking for something more 'magic' than that.

sesqu · on Jan 29, 2011

The dataset sounds so small that I'd expect the winning answer will be extensively seeded by medical doctors. Diagnostic data is usually very difficult to approach with AI, and practising doctors have good heuristics. That suggests, to me, that the best one can hope for is using this dataset to refine those heuristics.

indigoviolet · on Jan 29, 2011

Oooh. This is just begging for a privacy firestorm when someone de-anonymizes the data, which I'm guessing won't be super hard given the kind of medical features they'd need to provide to make this task useful.

tocomment · on Jan 29, 2011

This is why we can't have nice things, HN.

Dilpil · on Jan 29, 2011

I'm not sure about that- unlike social network data, there isn't a publicly availible dataset containing the names of all the people in here which could be used to de-anonymize this data.

indigoviolet · on Jan 29, 2011

You're perhaps right. Of course, these people are on things like Facebook, Twitter and blogs and if you can narrow down age, gender, location and medical condition, you might be able to correlate with public posts.

You can also look for specific people in there if you have certain kinds of prior information about them: For example, Aravind Narayanan was able to de-anonymize some part of the Netflix set [http://arxiv.org/abs/cs/0610105]. Maybe that won't translate to this data.

tocomment · on Jan 29, 2011

What makes solving this problem worth three million?

chaosmachine · on Jan 29, 2011

"The winning algorithm will be able to predict patients at risk for an unplanned hospital admission with a high rate of accuracy."

Algorithm says no insurance for you.

dkarl · on Jan 29, 2011

Yeah, can they anonymize my entry and my prize? I don't think I'd want my friends and family knowing I helped these guys out.

earl · on Jan 29, 2011

Thankfully, the US passed the recent national health care law which, I believe, bans such discrimination as of 2014 and brings us that much closer to being a civilized country.

ylem · on Jan 29, 2011

However, the lawsuit by the states (for example Virginia) is to have the statue about individuals being required to purchase healthcare declared unconstitutional. They will then use this to argue that the whole law unconstitutional if they win (no severance clause in the law). I heard this on NPR last night in an interview with the attorney general of virginia about their strategy....

l3amm · on Jan 29, 2011

Presumably if they can identify patients with high likelihoods of returning they can take extra precautions while they are in the hospital the first time. Potentially this could be used to stop unnecessary hospital visits (via education, proactive treatment, outpatient care, etc).

abhaga · on Jan 29, 2011

Without a legal protection in place which disallows something like this for deciding the insurance rates, this sounds like something which can get abused.

But I think it is better that is happens in public via an open competition rather then in a private research group funded by an insurance company. At least, everyone will immediately know what can be predicted rather then finding it out through a class action suit years later.

dantheman · on Jan 29, 2011

yeah because pricing insurance correctly is a bad idea....

earl · on Jan 29, 2011

In the US at least, we do now have such protections -- viz community rates and obamacare.

wladimir · on Jan 29, 2011

Hm sounds like an interesting challenge, can anyone register for this, or do you have to be US-based?

nazgulnarsil · on Jan 29, 2011

sorry, I don't think being born in the west entitles you to millions of dollars of medical care at other's expense when a million dollars means hundreds of lives saved.

maeon3 · on Jan 29, 2011

Doctors can tell you which patients will be back, the problem is they can't, because if they do, that will be discrimination which would be grounds for burning the doctor at the stake. The software which does exactly the same thing, however, can't be burned at the stake for discrimination because in the event where the guilty party cries fowl, you simply print out the math. It's genius.

You 1% (repeat sickly offenders) causing 30% of the medical care costs better get ready to pay your increased share to acquire that care. If it can be determined that one human would likely need 10 million dollars of medical care (on account of heavily defective dna) and another human will likely need only 200 thousand (flawless dna), the one who is likely to need more should be paying more.

JoachimSchipper · on Jan 29, 2011

Wait, should? I didn't pick my parents... (I happen to be perfectly fine, but that's at least part good luck.)

tel · on Jan 29, 2011

Unfortunately, despite popular myth, not every consequence applied to a person is a result of their choices and actions. The American Dream might be equality of people, but the harsh reality is that there's actually quite a lot of variance.

So we get to answer the really interesting question of exactly how much do we, as a country, want to spend to support a dream against reality?

Confusion · on Jan 29, 2011

  repeat sickly offenders

Wow, you have to be one of those extremists that buys everything Ayn Rand wrote without blinking an eye. Fortunately, most of us don't mind paying health insurance, fully expecting they will never need it, but still acknowledging that others that need it should receive the care they need.