Hacker News new | past | comments | ask | show | jobs | submit login

So, essentially, this is a contest to make a way to predict who is most at risk for going back to the hospital.

while this sounds nice, there are some issues.

1. How can this do anything but hurt people? Medical professionals do all they can to keep people from returning to the hospital, explaining to patients what they should be doing in a medical sense, the only real use is to deny insurance or increase rates on "high risk" people.

2. Should they implement the winning solution, then act on it by sending additional "how to be healthy" propaganda or otherwise attempting to prevent those people, the pattern of behavior of will change accordingly, thus likely breaking the predictive capability.

This is not like the netflix "present better suggestions" problem. This does not need to be that fast, efficient, nor as creative. Just having a large set of statistics taken from the dataset (which seems rather small) and making a large Bayesian Network to crunch out the probability of needing medical care in a given time frame seems to be the best solution to the problem.

I am interested in seeing other views on these points. heavens, I might learning something about a field I am a dilettante in from a master. (ironically this is more the goal then being "right" is)




You should read the new yorker story http://www.newyorker.com/reporting/2011/01/24/110124fa_fact_... . It answers your questions, mostly. For (1), if the health insurer is forced to treat those patients and acknowledges who they are they can spend a bit of money on preventive and follow-up care and save a lot of money on hospitalization, surgeries, etc. (2) This is true, but if the algorithm is retrainable (and it should be, as it's machine learning) there's the possibility that all you have to do is a bit of domain adaptation to keep things going; if this doesn't work, another contest 5 years from now will probably pay for itself.

The problem with your proposed solution is precisely that there seems to be far too little data points and far too many variables. Not only that but I expect most of the information to be in the interactions between variables and clever features that cover that. Most ways of learning bayesian networks don't work very well when you have to model interactions. I'd bet on the usual winning approaches for this sort of thing, which is clever boosting, matrix decomposition, and random forests, all of which can model interactions and somewhat deal with incomplete data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: