Stanford computer diagnoses breast cancer more accurately than human doctor

2mur · on Nov 10, 2011

I'm a pathologist and an avocational programmer. This is pretty neat material and is very relevant to me, as I have been trying to bone up my math chops with Khan Academy videos so that I can tackle some computer vision related work in pathology.

With regards to the study, I will just point out that the system is not diagnosing breast carcinoma, but rather is producing a score which reflects the prognosis as relates to overall survival of the patient (so it is not as impressive as the somewhat hyperbolic HN title makes it out to be... a better title would be 'Stanford computer analyzes breast cancer more accurately than human doctor' which is not surprising at all given the well-documented interobserver variability in breast cancer scoring, particularly in moderately-differentiated tumors). That is to say that the computer already knows the tissue that it is looking at is a tumor and not benign breast tissue. Furthermore, it is really only providing a histologic (or morphologic) score, which is to say, that it is attempting to predict how aggressive the tumor will behave based upon how well or not well it is differentiated (how ugly it is). These days, this score is actually less useful in clinical practice than other information such as whether or not the tumor cells are expressing estrogen/progesterone hormone receptors, or if it is over-expressing Her2-neu protein, as these are possible paths for cancer therapy (anti-estrogen drugs vs. Herceptin) in addition to being prognostic indicators (tumors which express ER/PR generally behave better than tumors which are ER/PR negative and express Her2), as well as the stage (or how far the cancer has already spread) at the time the patient is diagnosed. There are a bunch of companies which are getting FDA approval for computer vision related algorithms for scoring immunohistochemical assays for ER/PR/Her2 [1].

So I am actually far less concerned about a computer doing my job very well, which is actually looking at a piece of tissue on a slide and making a tumor versus not-tumor distinction. This is very hard to do and I think will continue to be even harder for computers/computer-vision/AI to do for a long time to come. I am far more concerned about molecular diagnostics. That is the true-future for making cancer determinations and may even eliminate the part of my job where I tell you if something is benign vs. malignant.

[1] http://www.aperio.com/pathology-services/analyze-slides-auto...

law · on Nov 10, 2011

So I am actually far less concerned about a computer doing my job very well, which is actually looking at a piece of tissue on a slide and making a tumor versus not-tumor distinction. This is very hard to do and I think will continue to be even harder for computers/computer-vision/AI to do for a long time to come.

As a machine learning researcher, I do not disagree that it's "very hard" to make a single tumor-versus-not-tumor classifications. However, machine-learned classifiers are becoming much more adept at making these types of classifications, and will only improve with time. I think most of the problem right now is in creating a corpus with enough attributes and precise/reliable measurements to accurately train a classifier. Presently, computer vision algorithms are quite sensitive to noise; the human eye is much better at separating this noise from the actual signal.

Right now, I'm doing a much different kind of research: applying machine learning algorithms to legal documents (court opinions, statutes, patents, etc.) to generate legal analyses. To not worry about a computer "doing your job very well" is, in my opinion, a naive perspective. It's indisputable that one day, computers will be doing classification-type jobs very well. Embracing the technology will keep you ahead of the curve; dismissing it is suicide.

2mur · on Nov 10, 2011

Thanks. I probably should have caveated that statement with some qualifier. I'm not particularly worried about it in the short-term timeframe in which I expect to be earning my bread as a practicing surgical pathologist. I absolutely expect at some point, machine learning will have progressed to the point to be able to perform my job as well as I can. I just think that using it in a clinical setting is a long way off, and far before we get to that point, we will be using pure molecular techniques to diagnose and classify cancer. If I had to list the tasks that I would imagine machine learning to be able to do at an expert level before my eye/brain combination could be replaced for diagnosing cancer on glass slides it may look like:

  - Segregate tissue from non-tissue (ie. blank slide space)
  
  - Classify the tissue (epithelial versus mesenchymal/stromal, mucin/matrix,
  exogenous material, etc).

  - Decide if it is normal or abnormal 

  - Decide if it is native to whatever organ that it reportedly came from or is it
  a metastases from somewhere else.

  - Select appropriate immunohistochemical stains or other ancillary tests (eg. 
  molecular assays).
  
  - Appropriately interpret the immunohistochemical stain (eg. is it a nuclear stain
  or cytoplasmic, or both, or membranous).
  Is that staining true staining or artifactual.

  - Interpret that in the clinical context of the patient's history.

And it has to do all of those things and correct and account for bad preparations from the laboratory, inconsistent staining from day-to-day and slide-to-slide, non-representative biopsies from the surgeons etc.

I think at the end of my career, they will eventually haul my carcass off of my microscope (hopefully!) in 40 to 50 years hence. I imagine that we will have made a ton of progress with computer vision in pathology, and will be using it in an adjunct fashion. To help with analyzing results and other tedious tasks that computer vision is well-suited for. But a lot of medicine, and especially pathology, is making intuitive connections and conclusions with inconsistent and incomplete evidence in day-to-day work on individual patients.

law · on Nov 10, 2011

Yeah, that's exactly the type of 'noise' I was referring to. Improper histological staining, poor imaging skills, etc. are all things that the human mind can immediately discover. The human mind is smart enough to not rely on the assumption that all random variables implicated are conditionally independent of each other. Computers, on the other hand, use algorithms rooted in these assumptions.

nitrogen · on Nov 11, 2011

Right now, I'm doing a much different kind of research: applying machine learning algorithms to legal documents (court opinions, statutes, patents, etc.) to generate legal analyses.

I hope that one day we can apply your (or similar) classification technology to run semi-automated regression tests on new legislation. For example, each time a new law is passed, the classification software flags any court cases that may have a different outcome as a result.

codeslush · on Nov 10, 2011

You say "particularly in moderately-differentiated tumors" -- I have a few questions.

1. Why did you specifically point out "moderately-differentiated?" I ask because my wife has a tumor classified as poorly-differentiated. I'm wondering if the middle ground is unique/harder to diagnose in some way.

2. Would something like this come into play during the initial biopsy or after the tumor is removed? I ask because she has a mastectomy next week. I kind of assumed the nature of the cancer was already figured out with the estrogen+ and her2/neu tests. We never really received a "score" - just a breakdown of the good and the bad characteristics and the suggested treatment plan post surgery - which includes both anti-estrogen drugs AND herceptin.

3. Does it make sense, at this point, to try to get her into Stanford for this C-Path test?

Any input is appreciated.

2mur · on Nov 10, 2011

First, my sympathies for your wife, I wish the best outcome for you both. Second, please do not take any of the following for medical advice, I intend to speak generally.

I mostly said moderately differentiated, because for a lot of pathologists, if you give us a three tiered system for grading some type of cancer (and there are systems for almost every type of cancer) that we'll put most things in the middle. I personally believe two tiered systems work better for most everything. Most studies have shown that breast cancer scoring (in the US, most use the Nottingham modification of the Bloom-Richardson system) is only moderately reproducible anyways [1][2]

I tend to only fully grade the tumor after it has been resected, because there is not much point in grading it on the biopsy (ie. it won't change management, most patients are still going to have surgery) since sampling error may influence the final grade if you are discrepant from the biopsy.

I always tell friends and family that if they have any medical procedures, and most especially those for cancer, to always get copies of the operation note and the final pathology interpretation. The operation note will be written by the surgeon and will detail everything she did during the operation, what was removed, what was placed, etc. The final pathology will be the best place to get detailed information about what the tumor, where it is, the pathologic stage, etc. Your discussions with all of the other doctors will all basically be dictated by this report. There may be multiple of them, one for each procedure. So get the biopsy pathology report, the pathology report from the mastectomy, etc. They should also report out the results of the ancillary testing to (ER/PR/Her2) since they are the ones who did them. Most of the cancer reports (in the US anyways) should be written in accordance with the protocols from our professional organization and can be found online [3]. You'll probably find them somewhat tedious, but there is a wealth of information in there [4]. You are entitled to those reports and you really owe it to yourself to get a copy. If your doctor/doctor's staff won't get you one, then you could contact the pathology group directly to obtain one, don't hesitate.

In general, the grade of the tumor is far, far less important than stage of the tumor at diagnosis (most importantly the status of the axillary lymph nodes) and also the ER/PR/Her2 status of the tumor.

[1] http://www.ncbi.nlm.nih.gov/pubmed/15920556

[2] http://www.ncbi.nlm.nih.gov/pubmed/7856562

[3] http://www.cap.org/apps/cap.portal?_nfpb=true&cntvwrPtlt...

[4][pdf] http://www.cap.org/apps/docs/committees/cancer/cancer_protoc...

codeslush · on Nov 10, 2011

Thank you very much for this info. I would not have known to ask for the reports and greatly appreciate the time you took to respond.

klbarry · on Nov 10, 2011

I also feel like an upvote is not sufficient to express my appreciation that you took the time to write this empathetic and informative post.

Gatsky · on Nov 11, 2011

Again, best wishes for your family.

I think that C-Path is interesting, but it is not well validated at present. It is also not clear how to use the results to plan treatment.

HER2 status is your ideal marker of risk because it gives information about prognosis AND how to treat (give herceptin). A bad stroma score or whatever you want to call it on C-Path doesn't necessarily tell you what to do about it. I would say that C-Path would add little to your wife's care if the cancer is HER2 positive, as treatment in this case is usually indicated.

malingo · on Nov 10, 2011

with your knowledge as a pathologist, would you agree that the success of this approach depends on the quality of the biopsy (i.e. obtaining a sample that faithfully represents the entire volume of tissue being investigated)? From my experience with prostate cancer screening research I know that biopsy scoring of prostate tumors can be very inaccurate compared to pathology scoring after a prostatectomy.

bayesftw · on Nov 10, 2011

Not the OP, but a pathologist in training. Yes, quality is essential because depending on how the sample is prepared the image can be very difficult to interpret. Biopsy scoring of prostate tumors is difficult because a lot of the scoring takes into account features that may not appear on the biopsy because of insufficient sampling.

bh42222 · on Nov 10, 2011

So I am actually far less concerned about a computer doing my job very well, which is actually looking at a piece of tissue on a slide and making a tumor versus not-tumor distinction. This is very hard to do and I think will continue to be even harder for computers/computer-vision/AI to do for a long time to come.

Can you elaborate?

marshallp · on Nov 10, 2011

I've seen a lot of doctors chime in various threads and say their jobs couldn't possible be done by machine learning. The same thing was said about self driving cars before the darpa challenges - when some profs actually put their mind to it, it was done in a couple of years. . If the data was available, there is probably quite a few people who can actually detect cancer in slides.

tptacek · on Nov 10, 2011

That's not at all what he said. He said that this particular research involved a simpler (for CS) problem than either the title or his day to day job tackles.

Do you have a background in bio or medicine or computer vision? It's very interesting to see two informed people disagree about applied computer science, so I'd love you to contribute something more specific to the thread.

marshallp · on Nov 10, 2011

Actually, he also said that he is not concerned that machine learning could initially detect cancer anytime soon. Also, if you've been machine learning trends recently(past 5 years), you'll see that deep learning methods (hinton, lacunn, ng, bengio) have actually made a huge leap over what came before, and are believed to be that "final" in some sense algorithm that can allow to tackle any learning problem. These just haven't spread widely anough yet.

apu · on Nov 11, 2011

As a computer vision researcher, I'm not at all convinced that deep learning methods will be "final" in any sense. I know that in the past, neural networks were "final", and then graphical models were "final", and so on.

And while deep learning methods have indeed shown remarkable improvements recently, they're not yet state-of-the-art on the most important/relevant computer vision benchmarks.

marshallp · on Nov 11, 2011

As a computer vision researcher it must be pain you to see that all your learnings are for nought when faced with deep learning methods which can get amazing performances from raw pixels (see mnist results for example). Also see ronan collobert's natural language processing from scratch paper where handily beats the past few decades of nlp research in parsing (in terms of efficiency, and probably performances soon too). Or see the microsoft research speech recognition swork which has beaten out all previous by a significant margin using deep learning.

apu · on Nov 11, 2011

Not at all! I'd love for vision to be solved, no matter what the method. I'm more than happy to move onto another field if that's the case.

But I don't think it is. MNIST data is not particularly challenging. It's great that deep learning methods work there -- they must be doing something right.

Come back and taunt me when deep learning methods start getting state-of-the-art results on, e.g., Pascal VOC: http://pascallin.ecs.soton.ac.uk/challenges/VOC/

marshallp · on Nov 11, 2011

getting best results on the harder vision challenges is simply a matter of let the computers run long enough. Collobert's work for example took 3 months of training. I don't see why vision challenges should any different. Perhaps the vision researchers, of which there are many more people than the few deep learning groups should try it.

pak · on Nov 10, 2011

Cars can currently drive themselves in certain limited environments tracks at specially designated competitions. How long do you think it will be before the country has the physical and legal infrastructure to support general-purpose automated cars?

Two thought experiments: 1) Do you think the general public would support the use of self-driving cars on public streets as they operate today, even after seeing the DARPA results? 2) Do you think the general public would support the use of computers to diagnose cancer without involving human doctors anytime within the next 50 years?

Remember that when [specialized worker X] says their job can't be done by [new technology Y], they aren't just referring to the technology being unable to fulfill the task. There is a whole economic, political and sociological matrix on top of the job market that prevents technology from displacing workers, and certain regulated industries are more sheltered than others. The hospital is probably one of the most insulated working environments for technological advances (just take a poke at any of their EMR systems to see what I mean.)

doctoboggan · on Nov 10, 2011

Google's self driving car[1] has logged almost 200,000 miles on real roads. It has a better record than the average driver. A judge in California has deemed that Google is allowed to test on the road as long as they are responsible for the damages. Nevada has already passed laws saying that self driving cars are legal. So in answer to your question, we already have the physical and legal infrastructure to support general-purpose automated cars, and we have the technological capacity.

This shouldn't be a question of the general public supporting it, it should be a statistical question: Are our silicon counterparts better equipped to do the job? If so, then we should have them do it. The day when computers can diagnose cancer better than human's is not far off, and we should welcome it as an indicator of more precise identification rather than shun it out of fear.

[1] http://news.discovery.com/autos/how-google-self-driving-car-...

pak · on Nov 10, 2011

So in answer to your question, we already have the physical and legal infrastructure to support general-purpose automated cars, and we have the technological capacity.

That is such a stretch from the four sentences before it. You are discussing 1) a prototype vehicle that is not available to consumers and requires supervision by a cadre of engineers and 2) a recent law in just one of the least populous states of the country. How about a few choice details from that article you cited:

"... with only occasional human intervention."

"Before sending the self-driving car on a road test, Google engineers [have to] drive along the route one or more times to gather data about the environment."

"...there are many challenges ahead, including improving the reliability of the cars and addressing daunting legal and liability issues."

You must have read it with unrestrained optimism. I also applaud your idealistic notion that statistics matter more than public opinion, but the country isn't run by scientists and mathematicians (that's actually a good thing in certain respects). The reality is the general public does have to support changes that affect society, like laws and the development of physical and legal infrastructure, and there are many ways of formulating reasonable policy arguments with or without statistics.

Gatsky · on Nov 11, 2011

I think you are misunderstanding the nature of the problem. There is no easy way to assess how good some mythical algorithm will be at interpreting pathology slides. Therefore, you are in essence asking doctors and patients to accept another non-human opinion about what is going on. So why should I accept your algorithm's opinion? I would rather have a human who has enough insight to say they are not sure and can discuss the case with me, and also understands that life changing decisions are being made on the basis of what they say.

Anyway, pathologists are most useful in unusual or difficult cases, which by definition have little available data. You want me to trust an algorithm trained using some kind of statistical mechanic on a dataset to interpret an edge case?

pak · on Nov 10, 2011

Please: there ought to be a rule on HN that instead of linking to a popsci re-interpretation of a scientific result, you instead link directly to the paper.

http://stm.sciencemag.org/content/3/108/108ra113.full

If it is behind a paywall, as it probably is for this paper if you are not at a university, perhaps look around for the least hyperbolic re-interpretation of the paper, and link that instead.

http://www.genengnews.com/gen-news-highlights/image-analysis...

Even if you do not have access to the full paper because of the paywall, you should be able to still read the abstract and pick the popsci article that is fairest to what the paper actually says.

Otherwise, this happens: http://www.phdcomics.com/comics.php?f=1174 and I really don't like it when HN blindly follows the hivemind in furthering that phenomenon.

Let's examine what the authors actually say.

To directly compare the performance of the C-Path system to pathological grading on the exact same set of images, we applied standard pathological grading criteria to the TMA images used in the C-Path analysis. [...] the pathologist grading the images was blinded from the survival data. Although the C-Path predictions on the NKI data set were strongly associated with survival, the pathologic grade derived from the same TMA images showed no significant association with survival (log-rank P = 0.4), highlighting the difficulty of obtaining accurate prognostic predictions from these small tumor samples.

That's it. They make one remark about it, and do not focus on this at all elsewhere in the paper, because it was not the point of their study and the methodology for this little result is far from robust. Note they used one pathologist to run this little test. Also note that a high p-value is not evidence that the null hypothesis is true--it is quite possible that there is a relationship but the study is underpowered; this is a frequent point of confusion.

Let's please keep scientific statements in context. The original paper says nothing tantamount to computers diagnosing breast cancer more accurately than doctors. It is, principally, about a new morphological feature that the researchers believe is tied more strongly to survival according to their computational model.

fgimenez · on Nov 11, 2011

I'm a colleague of the author, and I have talked to him quite a bit about C-Path since I work on something related. While I agree with much of what you are saying, I disagree slightly with your comment that the article is "principally, about a new morphological feature that the researchers believe is tied more strongly to survival according to their computational model."

The significance here is that he extracted 6000 low-level morphological features without any pre-conception about their usefulness. He then used GLMNET (logistic regression with L1-regularization) to automatically pick which of these features was important. Then, the craziest part is that the most informative features were not even cancer cells, but rather, surrounding stromal tissue. To quote from the paper, "Pathologists currently use only epithelial features in the standard grading scheme for breast cancer and other carcinomas. Our findings suggest that evaluation of morphologic features of the tumor stroma may offer significant benefits for assessing prognosis." He essentially took a completely blinded, machine learning technique to find features that have been relatively ignored in pathology.

I think this is more indicative of a new paradigm in computer vision and machine learning in general that finely-tuned, human-crafted features can be beat with more automatic methods. Whereas before, we have tried to program features that characterize what we see, now we are finally looking at image features that can characterize what we're missing.

nazgulnarsil · on Nov 11, 2011

The cottage industry of drawing fanciful conclusions from tiny data sets needs to die in a fire. It is pure noise.

euroclydon · on Nov 10, 2011

Reminds me of a comment by _delirium a few days ago, in the "AI is killing jobs" thread:

dermatologists are fairly worried that "upload a photo and we'll diagnose your mole", either via software or outsourcing, will cut out a significant percentage of their business.

http://news.ycombinator.com/item?id=3204466

AndrewDucker · on Nov 10, 2011

Also announced today: http://news.techeye.net/science/ai-used-to-hunt-dinosaur-bon...

those darn robots are everywhere!

artmageddon · on Nov 10, 2011

"We may never have to dig again" -Dr. Grant, Jurassic Park

kenjackson · on Nov 10, 2011

One thing the article doesn't point out is that AI methods in the past did a poor job of detecting breast cancer. This is an important new result and big step for the field.

_delirium · on Nov 10, 2011

Unfortunately this still doesn't look like it's detecting breast cancer; it's scoring the severity of tumors, given prior knowledge that it's being presented tumors to analyze (but tumors of unknown severity).

melling · on Nov 10, 2011

You can learn from some of the experts in the field:

http://ai-class.com - Peter Norvig and Sebastian Thrun

http://ml-class.com - Andrew Ng

I signed up 4 weeks into Andrew's class. Both of these are excellent.

Btw, the technique in the article is a classification problem, right? :-)

barry-cotter · on Nov 10, 2011

Is there any reason to watch the videos if one has AIMA? Could one do the assignments without without ever watching a video?

signa11 · on Nov 10, 2011

a multiclass-classification but yes. i am enrolled as well ;)

danso · on Nov 10, 2011

This kind of development is not that new, conceptually, at least (obviously, implementing it is another thing). One of the most consistently verified themes of study is that human experts can be very inconsistent, and in some cases, expertise can be a detriment when it leads to overconfidence. Radiologists, for example, have been found 20% of the time to render two different verdicts when looking at the same X-ray at two different times.

And of course there's the recent study showing how judges were consistently more likely to deny parole during hearings that happen in the afternoon: http://blogs.discovermagazine.com/notrocketscience/2011/04/1...

dave_sullivan · on Nov 10, 2011

They kind of hint at it in the article, but sounds like automatic feature detection really helped here. That's really the future (or present): learning algos discovering better features automatically, significantly cutting down the labor involved in feature engineering. Pretty neat.

Here's a video talking about current research into this at Stanford: http://www.youtube.com/watch?v=ZmNOAtZIgIk&feature=youtu...

bh42222 · on Nov 10, 2011

This should not be surprising for anyone who understand software. And software identifying breast and skin and other cancers is going to continue getting better.

Now the bad news is that human medicine moves super slowly due to very strict regulations, and I expect doctors to resist this.

I hate to say this, but I think health insurance companies are our best hope to push this technology into greater use.

tryitnow · on Nov 10, 2011

I agree. A lot of these technologies will lead to improved outcomes at lower costs (for certain sets of basic problems).

Guild professions (like doctors) are inclined to keep doing even the basic simple-minded aspects of their job because they get paid "economic rents" for doing so thanks to the regulations that insist even basic tasks must be done by someone with 10+ years of education.

Insurance companies on the other hand ultimately have to respond to employer demands for lower premiums (unfortunately this process is slow and HR departments are usually horrible at keeping costs under control). Insurers and to an extent employers are going to be the impetus for a lot of improvements in effectiveness and affordability.

Read Clayton Christensen "The Innovators Prescription" for more on how this might play out.

pigs · on Nov 10, 2011

Although he's not listed as an author, I find it interesting that Andrew Ng has been using "malignant or benign tumor" as an example of a machine learning classification problem at http://www.ml-class.org

allenp · on Nov 10, 2011

"...but imagine if you had a C-Path machine in every town or city."

How near-sighted of the author. An article about computer vision, AI, machine learning, and no thoughts of tele-pathology or remote reading?

mrsebastian · on Nov 10, 2011

hat tip

My thought was that such such a machine would basically have a built-in, automated biopsy thing. Like, the machine really would do everything from taking a biopsy, looking at the sample, and then diagnosing.

Tele-pathology is cool, but you still need to get the tissue sample somehow -- which I imagine is hard in a town/village without a hospital.

You could have some kind of automated biopsy unit that then sends data back to a central C-Path server, I guess.

_juof · on Nov 10, 2011

There's an indian company selling biopsy helper robots. it robots simplify the process and enable technicians to take the biopsy instead of doctors, and the whole process is faster.

see : http://www.perfinthealthcare.com/Procedure_Videos.html

Looking at the videos , it seems that the whole process can be done by robot, But maybe patients and other stakeholders feel this way is safer/more economical.

separated · on Nov 10, 2011

The funny thing is, most rads physicians would probably welcome this, as mammography is often the least preferred sub-specialty (my wife is a radiologist and feels this way, and I've encountered many radiologists with the same outlook).

roentgen · on Nov 10, 2011

The system in this article is not interpreting mammograms, which is something that radiologists do. This is looking at tissue samples on slides, the domain of pathologists.

hugh3 · on Nov 10, 2011

mammography is often the least preferred sub-specialty (my wife is a radiologist and feels this way, and I've encountered many radiologists with the same outlook).

Why? Because breast cancer is hard to diagnose? Because nobody likes looking at floppy old-lady boobs all day?

JabavuAdams · on Nov 10, 2011

Wow! The medicalese description (abstract of the original paper) seems completely unrelated to the computerese description.

I need to learn medicalese ...

or train some ML algos.

davidhansen · on Nov 10, 2011

This does not address the issue of feature detection, but it's interesting nonetheless, inasmuch as well-trained human doctors are quite bad at probability, with the specific case of breast cancer diagnoses. The first 1/8th or so of Eliezer's layman's Bayes document:

http://yudkowsky.net/rational/bayes

maeon3 · on Nov 10, 2011

Looks like weak AI is making inroads all over the medical industry. Here is predictive ai technology for heart failure that works better/faster than human diagnosis.

http://en.m.wikipedia.org/wiki/Multifunction_cardiogram