Are Human Experts Less Prone to Catastrophic Errors than Machine-Learned Models?

osipov · on May 25, 2008

This is a fascinating discussion, particularly in light of the human brain architecture. Back in school, my prof used to reference the distinctions between the left and right brains in context of alternative computing approaches used by each half. The right brain seems to be more in line with the statistical machine learning approach, effectively a black box data processor producing intuitive results from large quantities of data. The right brain works great in context of normal data distributions. The left brain is the logical brain and can override the right brain. The left brain is about language based reasoning and can handle the novel and unexpected situations or in other words the statistical outliers that give trouble to the right brain.

randomhack · on May 25, 2008

Is there any proof for the statements made about left/right brains? Was the prof learned in neuroscience? Or is it just some folk tale stuff?

osipov · on May 25, 2008

Nothing folksy about left/right brain distinction. Check out this tour de force TED video with a Harvard neuroscientist talking about the differences in brain halves http://www.ted.com/talks/view/id/229

nazgulnarsil · on May 25, 2008

the idea that the brain performs several different types of computation on the same data set and then compares the results is interesting and worth following up on.

tndalpaul · on May 25, 2008

A human expert may be better when you're in unknown territory (i.e., something happens that doesn't fall into the domain of expertise) because then external factors acquired over a lifetime may prove to be more useful than textbook or algorithmic knowledge.

But more importantly I _do_ know that statistical models are better at diagnosis and prediction than are human experts. Sad to say, doctors, lawyers, judges and other foolish people keep us from using them.

FuturePundit summary article "Statistical Prediction Rules More Accurate Than Experts": http://www.futurepundit.com/archives/001558.html

The FuturePundit article reviews the paper: "50 Years of Successful Predictive Modeling Should Be Enough: Lessons for Philosophy of Science" by Michael A. Bishop and J. D. Trout http://www.google.com/search?hl=en&q=%2250+Years+of+Succ...

mark_h · on May 25, 2008

Peter Norvig is "... now taking a short leave of absence from Google to update his AI textbook."

This is news to me, and potentially quite exciting (even if I have just finally acquired my own copy). Does anyone know anything else about this; what sort of new material, etc?

msg · on May 25, 2008

This comment will be a little roundabout, but it has a real conclusion.

It seems like the subject of this Google Tech Talk keeps coming up over and over again.

http://video.google.com/videoplay?docid=-2469649805161172416

Here's a summary. Yann Le Cun discusses his research on deep learning. The basic problem with standard learning, SVMs, neural networks and the like is that they are limited to coming up with shallow templates for classification, with more or less fancy versions of linear and nonlinear combinations of weights. The number of such templates you need in a highly dimensional space like computer vision is exponential (think about how much data you need to represent one object at different combinations of distance, lighting, orientation, focus, etc.).

Instead, he suggests, we need to learn how to get past shallow template-matching and train, essentially, features of features or networks of networks. This gives us a shot at discovering highly abstract features of the data we have and doing real learning.

If you have any interest in the subject, I strongly suggest you carve out an hour for the video.

TD-Gammon is another example. It is a learning program that plays backgammon. Training from simple board features, it eventually derived what you might call first-order expert features by doing shallow learning. But, when similar first-order expert features (presence of an anchor, etc.) were added by hand to the initial state space, TD-Gammon derived deeper expert features to the point that it and similar programs became better than the best players in the world.

The point here is that there was no way to transition from the shallow features to the deep features without recoding the system from scratch. That obviously won't scale to the kinds of AI problems that are interesting.

Big finish: The advantage humans have over computers (at the moment) is that we work on those multiple levels of abstraction all the time. We do deep learning in every field of human endeavor. That's what the deeply connected neural networks in our brain are all about. In fields where such expertise is possible, humans have it all over the computers. In fields where brute-force calculation can win, the computers have it all over the humans.

What that implies to me is that we can't train computers to think as deeply about catastrophes as we do. It requires a new paradigm of learning to get the computer to that point.

jsmcgd · on May 25, 2008

Although this doesn't involve machine-learned models, this story highlights the worth of good human judgment, something that currently isn't replicable in machines.

http://en.wikipedia.org/wiki/Stanislav_Petrov

aidanf · on May 25, 2008

Failure would rarely be "catastrophic" in the context of web search as described in the article. If the ML model makes an incorrect prediction on new data you have a bad search result. No big deal - just feed the new data back into the model and learn again.

If the data set is large enough then the ML model may find patterns that escape a human expert. When it comes to finding patterns in very large datasets machines scale much better than humans. Given a large enough dataset a ML approach should be less susceptible to the Black Swan phenomena than human experts.

On the other hand, if failure of the system really could be considered catastrophic then there could always be a human involved. In these cases output from ML models could be one of the inputs that the experts considers before coming to a final decision. E.g. you wouldn't want a ML model doing medical diagnosis by itself but it could be very useful for identifying patients that should be double-checked, scanning diagnosis for errors etc.

kurtosis · on May 25, 2008

It's possible to think of the process that airline safety regulators have gone through where they have observed crashes and guessed about the cause and then suggested fixes as a type of learning problem.

For goog's case it would be fun to try and build a supervised learning system whose sole purpose is to try and identify the queries which a human observer would consider a "catastrophic failure"

I've also heard of stories where decision trees supposedly outperformed human cardiologists in making diagnoses. (I'm skeptical of this claim but let's assume that it's true) If this type of advance is real then it could save a lot of lives. Unfortunately, if goog's engineering team has this kind of doubt about the machine, then I imagine that it would be easy to persuade a random jury that installing such a poorly understood black box is negligence.

tndalpaul · on May 25, 2008

"I've also heard of stories where decision trees supposedly outperformed human cardiologists in making diagnoses. (I'm skeptical of this claim but let's assume that it's true)"

If you substitute "Statistical Prediction Rules(SPRs)" for "decision trees" in your statement, then it's true in every tested field of medicine. Read the FuturePundit article (URL below) and it's links. The system _always_ outperforms the experts in it's field, without fail within the domain of expertise. Even the _best_ experts. Always.

kurtosis · on May 25, 2008

This article is very interesting, thank you for posting it. I haven't read it yet, but it seems possible that this review suffers from a kind of selection bias, in that the only SPR's which have appeared in the literature are those which were more successful than human experts. This does not mean that a widespread switch to SPR's would outperform human experts, because the average implementation could be inferior to human experts. Consider how difficult it is for IT departments to follow best practices in security.

another aspect of these systems which people may find repugnant is that they allow one to weigh the costs and benefits of different treatment decisions in a consistent way. I personally feel, after watching people I loved suffer in an oncology ward that denying treatments which are very expensive but have small probability of success would be a good thing, but I know other people don't feel this way and there would be outrage.