FDA permits marketing of AI-based device to detect diabetes-related eye problems

ksolanki · on April 17, 2018

As a founder of another company in this field, let me start by saying that this approval is a big deal. Kudos to IDx. This is the very first time FDA has approved a fully automated CADx (computer aided diagnosis) device. Eyenuk is also on it's way to an FDA approval and it is a lot of work conducting the prospective clinical trials.

There are some misconceptions on the thread, so let me help clear them up. A screening test is indicated on an annual basis for anyone with diabetes who does NOT have visual symptoms. Diabetic retinopathy (DR) progresses without any symptoms and is preventable if detected early, but despite its preventable nature, DR is the leading cause of blindness in working-age adults even in the developed world.

The test is for screening rather than providing a full diagnosis and is not intended to replace a dilated ophthalmologist examination. You don't need a specialist to screen, but you need a specialist to diagnose and treat. Sensitivity implies the percentage of times the test is able to correctly identify the presence of more than mild diabetic retinopathy (in this case, 87.4 percent of the time) and specificity is the percentage of times the test was able to correctly identify those patients who did not have more than mild diabetic retinopathy (in this case, 89.5 percent of the time). Note that neither sensitivity nor specificity implies accuracy. The sensitivity and specificity generally compare well to that achieved by humans.

drta · on April 17, 2018

> DR is the leading cause of blindness in working-age adults even in the developed world.

It is changing though - it's not been true for the UK since 2014 - https://www.gov.uk/government/news/diabetes-no-longer-leadin...

ksolanki · on April 17, 2018

That's correct. UK is the only major (in some vague sense) Country where diabetic retinopathy is not the leading cause of preventable blindness in adults. This is very likely because they are able to screen more than 80% (nearing 85%) of their diabetic population, an impressive feat. This leads to another issue: they need to consistently grade the retinal images of over 2.2 million patients with diabetes. This is where AI could help -- in improving the consistency and turn-around time and we are working with the NHS UK to explore this.

return1 · on April 17, 2018

what is the screening like? is it possible there will be automated self-service 'kiosks' for that?

ksolanki · on April 17, 2018

I believe the self-service kiosks would be very much feasible. There are two key components: (1) automated non-mydriatic (not requiring dilation of the pupil) retinal imaging and (2) automated grading of images using AI.

The technology is there but there would be more work needed for a self-service kiosk to be FDA approved. Another thing that is not clear is whether it is commercially a good idea at this time, given that only a single disease (diabetic retinopathy) is approved. I can see a future where one can use such kiosks to look for multiple conditions and assess risks for various diseases including cardio-vascular disease, neurodegenerative diseases, stroke, and hypertension.

homero · on April 17, 2018

Since it's not dilated, it's possible

YeGoblynQueenne · on April 17, 2018

>> In one clinical trial that used more than 900 images, IDx-DR correctly detected retinopathy about 87 percent of the time, and could correctly identify those who didn’t have the disease about 90 percent of the time.

I read that as .87 accuracy, .9 specificity (True Negative Rate). However, I can't find in the link provided in the article above the sensitivity (recall or True Positive Rate).

I'm guessing it goes a bit like this (assuming perfectly balanced classes which in reality they aren't):

           Predicted + Predicted - Total
  Actual + 378         72          450
  Actual - 45          405         450
  -------------------------------------
  Total    423         477         900
  
  Accuracy:             0.8700
  Error:                0.1300
  True Positive Rate:   0.8400
  True Negative Rate:   0.9000
  Precision:            0.8936
  Recall (TPR):         0.8400
  F-Score:              0.8660

I'm not sure how good or bad is 10% false positives and 16% false negatives are for that kind of diagnosis. The linked trial page says that 40% of diabetes patients have some degree of diabetic retinopathy (DR), that early treatment reduces vision loss "by as much as 52%" and that only some 50%-60% of people with diabetes have a yearly eye exam.

Off the top of my head, it looks like automated screening will do some good and probably more good than harm, but without knowing how doctors judge good vs harm there's no way to know for sure how useful this device will really be.

kendallpark · on April 17, 2018

"For moderate or worse DR, the majority decision of ophthalmologists had a sensitivity of 0.838 and specificity of 0.981"

Note that this is "majority decision" of multiple ophthalmologists, which is going to be better than the average ophthalmologist's single diagnosis.

http://www.aaojournal.org/article/S0161-6420(17)32698-2/abst...

ksolanki · on April 17, 2018

This is not completely correct. Sensitivity (87% in this case) is not same as accuracy, nor is specificity (89.5% in this case) same as true negative rate.

Sensitivity generally gives an indication of safety of a medical device, and specificity gives a general indication of effectiveness.

YeGoblynQueenne · on April 17, 2018

The link changed since I posted my comment :)

The original article on Verge reported "87% accuracy" and 90% what sounded like TNR. The new link points to an FDA page that makes it more likely that ".87" is actually sensitivity:

Dx-DR was able to correctly identify the presence of more than mild diabetic retinopathy 87.4 percent of the time and was able to correctly identify those patients who did not have more than mild diabetic retinopathy 89.5 percent of the time.

So, I guess, something like this:

           Predicted + Predicted - Total
  Actual + 393         57          450
  Actual - 47          403         450
  -------------------------------------
  Total    440         460         900
  
  Accuracy:             0.8844
  Error:                0.1156
  True Positive Rate:   0.8733
  True Negative Rate:   0.8956
  Precision:            0.8932
  Recall (TPR):         0.8733
  F-Score:              0.8831

Closest I can get with exactly 900 cases :0

ksolanki · on April 17, 2018

Thanks for this analysis. The balance in real world is more like 20-80, ie 20% of typically screened patients would have referable retinopathy (screen positive).

j32fun · on April 17, 2018

Ultimately, you will need a doctor somewhere along the diagnostic process so that someone is there to assume liability for incorrect diagnoses.

bena · on April 17, 2018

What's crazy is that if AI is better than doctors by some significant degree, what do we do when the doctor and AI disagree? Like if doctors are right 85% of the time, but the AI is 90%.

I guess we treat it as another doctor? Like if we have 4 opinions that agree, we go with that one regardless of the source of those opinions (as long as they meet some minimum competence threshold).

zaphod12 · on April 17, 2018

As others have said - perform more tests. I've worked around molecular pathology testing for cancer (which comes up with a diagnosis based on data analysis after running DNA or RNA sequencing). If the molecular report differs from what the surgical pathologist saw when looking under the microscope, it's typically not 'it's cancer' vs 'it is not cancer', but more 'this is molecularly non-small cell lung cancer' vs 'this appears to be prostate cancer'. So what will happen is they'll do more staining on the sample, specific to what the molecular report came out with - and a lot of times - bam. That tumor in someone's prostate is actually lung cancer and needs to be treated that way.

gowld · on April 17, 2018

Perform more tests. Start preliminary treatment and continue to monitor. Medical care isn't a 1-bit decision process.

j32fun · on April 17, 2018

That's an interesting thought.

Currently, what happens is that if a diagnostic test comes back and it suggests something serious, say cancer, and the doctor does not pursue it, then the doctor would be liable if it did turn out to be cancer.

So if a machine disagreed with a doctor, then I would assume that the doctor will grudgingly have to investigate further until there is enough evidence to rule out that diagnosis.

#headache

What I can see happening is that patients will go to this machine for a second opinion. And if an opinion then returns that contradicts the primary physician, then an entire can of (legal) worms will be open.

--

Addendum:

To elaborate further, there is sometimes what's called the benefit of history.

Say a patient visits 10 doctors. The 10th doctor has an unfair advantage to the first 9 simply because he/she will have the prior knowledge of which diagnoses and treatments were incorrect.

Similarly for an AI vs Human Doctor situation, the incorporation of additional information (for the AI) would require considerable amount of big data to train in order to be able to recognize prior history, failed treatments, and such.

For image specific diagnoses (eg. recognizing melanoma, retinopathy), these do lend themselves to AI very nicely. For other diagnoses that contain a significant amount of, shall we say, "human factors", then less so.

PeterisP · on April 17, 2018

Doctors aren't liable for failing to predict the future or making imperfect diagnosis.

If a doctor reviews the available data, reasonably concludes that it shouldn't be pursued further, and it later does turn out to be cancer, then that by itself does not mean that the doctor is liable for anything. Malpractice requires actual culpable negligence, such as missing something obvious, not interpreting a questionable situation in a manner that turns out to be wrong. The existence of a second, contrary opinion doesn't change that.

ska · on April 17, 2018

This isn't a new issue, there have been CAD systems that outperformed average clinicians (on very specific tasks) since at least the mid 90s. At the end of the day in some jurisdictions liability drives the resolution process, efficiency in others.

YeGoblynQueenne · on April 17, 2018

>> Like if doctors are right 85% of the time, but the AI is 90%.

There is always the possiblity that the doctors and the device are both right 90% of the time, but not the same 90% of the time.

Or that either the doctors or the device are right most of the time for the most severe cases but the other party is right only for the milder cases, etc.

It's not easy to look at absolute numbers here.

rhacker · on April 17, 2018

I know that this is a totally whack-job comment, but the TV show "the Good Doctor" is kinda leaning this way. Instead of relying on a ton of personal bias - the main character is generally more similar to how a ML would diagnose things, obviously there's no way to establish any foundation to this since it's based on a TV show. But it offers a vision of what you're saying but instead of AI it's a Savant syndrome individual that is making better judgments than the rest of the doctors. That being said, I would imagine a Savant is being placed in a role like that is less likely than the show portrays so where does that place AI?

whafro · on April 17, 2018

This is definitely going to be an issue. Even in cases where you're measuring your tool against "expert consensus" (often 3-5 physicians), there's a reasonable likelihood that the consensus may be wrong in certain types of cases.

Though even in those cases, you might be looking to show that your tool agrees with physicians at least as often as physicians agree with each other. Malpractice is usually about failing to offer the standard of care, and if you can show a reasonable level of concurrence with the standard of care in research and trials, you may be able to move forward and reach those higher levels of accuracy.

SilasX · on April 17, 2018

In practice? Pessimistic answer:

a) Usage still requires the presence of the Doctor.

b) The doctor does nothing but relay the AI’s message.

c) The doctor continues to charge the same and treat the same number of patients.

d) Everyone who expresses “hey isn’t the doctor redundant now? Shouldn’t we be treating more patients for cheaper” gets ridiculed as “one of those people”.

e) Edit: Also, the doctors’ association devotes significant resources to come up with memetically virulent reasons why the world would end if we took doctors out of the loop.

I mean, that’s how a lot of obviated jobs are currently treated...

Maybestring · on April 17, 2018

The doctor isn't going to be making decisions without access to the computer diagnosis.

The computer aided diagnosis isn't another doctor, it's another stethoscope.

thefifthsetpin · on April 17, 2018

Insurance companies assume the liability for the doctor's diagnoses. I'm not sure why they'd be unwilling to do the same for the software's diagnosis. Somewhere, an actuary is willing to estimate that risk.

asperous · on April 17, 2018

The company is saying you don't need a specialist, but after bayes theorem (using 90%TN 87%TP and D(A)= 200,000 complication / 29,100,000 diabetes), the chance you have this condition after the machine says you do is 0.83%.

maxerickson · on April 17, 2018

They are marketing the device strictly to make referrals to specialists. The FDA press release says as much anyway:

https://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/u...

If the images are of sufficient quality, the software provides the doctor with one of two results: (1) “more than mild diabetic retinopathy detected: refer to an eye care professional” or (2) “negative for more than mild diabetic retinopathy; rescreen in 12 months.” If a positive result is detected, patients should see an eye care provider for further diagnostic evaluation and possible treatment as soon as possible.

Are you interpreting the Verge or talking about some other statement?

asperous · on April 17, 2018

Yeah from the article:

> IDx-DR founder Michael Abràmoff told Science News. “It makes the clinical decision on its own.”

But I would guess a specialist would still need to be involved since it's not a fool-proof system. A specialist might take other symptoms or variables into account when making the diagnosis or order further tests. While this tool might be useful for blanket screening considering that it is harmless, it seems like it's hardly going to be "making the decision on its own" and prescribing treatment.

maxerickson · on April 17, 2018

It's specifically making the clinical decision I quote from the FDA press release, whether or not to refer to a specialist for further diagnosis.

It's not something primary care doctors currently do.

amelius · on April 17, 2018

I'm more worried about the false-negatives.

What if you have the condition, the GP runs the test, and it turns out negative, and then the GP decides to not send you to a specialist ...

Can anybody infer what the confusion matrix looks like from reading the text?

jadedhacker · on April 17, 2018

I think you might have inverted the parameters. I think the numbers given are TN and TP.

Eridrus · on April 17, 2018

I think the number is closer to 6%.

90% of people are accurately detected as not having the disease, i.e. 10% FP rate. So ~3m people would be falsely diagnosed as having the complication, to 200k/3m have the disease.

10% isn't a great number, but it isn't clear from this coverage whether this complication is generally asymptomatic or not. If there are symptoms to go with it, the numbers may be far better.

epmaybe · on April 17, 2018

Also keep in mind that if there are positives general practitioners would refer to a specialist anyways for treatment. These specialists would be more than equipped to detect false positives. Teleretina imaging is becoming more and more prevalent as well with eyePACS and Welch-Allyn having dedicated interpretation services, so patients wouldn't have to necessarily go somewhere for verification.

I'm more worried about the 87% accurately detected as having the disease, i.e. 13% of false negatives (FN). I don't know how many general practitioners would actually send a patient to a specialist if the device did not detect changes.

The retina seems well suited to AI approaches, though, so I'd be interested in what comes next from companies like this, DeepMind, and other researchers/organizations (look out for Lee et al over at the University of Washington)

lallysingh · on April 17, 2018

I guess the baseline for comparing the 13% would be the nonspecialist rate of the same.

epmaybe · on April 17, 2018

Perhaps, that's one way to compare non-inferiority. Few if any primary care physicians take the time to look in ones eye (don't know how to use direct ophthalmoscope, don't have other specialized fundus cameras in clinic). Given that, if this tool forces them to take more retinal photographs of all patients, maybe we could detect diabetic disease before it is usually seen.

The standard practice today is that if a patient is determined to be diabetic then they get referred to an ophthalmologist visit once a year. In that case would comparing those rates of diagnosis be useful?

wycs · on April 17, 2018

Why are specialists immune to those same statistics?

marcosdumay · on April 17, 2018

They aren't. With some luck their errors do not correlate with the ones of the machine, that gives us the best of both worlds: cheap diagnosis for healthy people and precise diagnostics for unhealty ones.

pontus · on April 17, 2018

Keep in mind that the test wouldn't be administered on any randomly chosen diabetic patient but presumably rather on those who already exhibit some type of vision imparement consistent with the disease. As a result, the prior of 200k / 29M is not quite right and I'm guessing that the true prior is likely much higher.

drta · on April 17, 2018

>Keep in mind that the test wouldn't be administered on any randomly chosen diabetic patient but presumably rather on those who already exhibit some type of vision imparement consistent with the disease.

Not true. The aim is to treat patients before they become symptomatic. Outcomes are much worse otherwise.

2min promo on Diabetic Eye Screening: https://www.youtube.com/watch?v=PK1Y-1BKFn0

pontus · on April 18, 2018

I didn't know that. Thanks for pointing this out!

raverbashing · on April 17, 2018

Any machine or specialist that wants to do a diagnostic based on a test result will have to consider other factors apart from the result.

The false positive bayesian math is a good illustrative example, but reality is more complicated. And no doctor will base their diagnostic solely on one number.

asperous · on April 17, 2018

Yeah I was wrong on the math:

P(H) = .0069

P(D|H) = .87

P(D|H')= .1 (10% chance the test says you have it but you don't)

P(H|D) = 0.057 or about 6% probability you have complication after machine says you do

YeGoblynQueenne · on April 17, 2018

Results are reported as .87 accuracy, not true positive rate, at least in the Verge article.

I guess they may be misreporting sensitivity as accuracy.

dang · on April 17, 2018

Since a lot of the discussion was objecting to the title and/or exaggerated coverage, and the press release is more factual (let that sink in for a second), we changed the URL from https://www.theverge.com/2018/4/11/17224984/artificial-intel... to the press release.

ekianjo · on April 17, 2018

Does the FDA conduct code reviews? And how do they guarantee that the code or the training data does not change over time without them knowing?

jdiez17 · on April 17, 2018

Generally, the FDA (or the government body responsible for certifying medical devices) does not conduct code reviews in the sense of looking at the code and trying to find bugs.

The way it works is: the manufacturer of a medical device assesses the harm that can be caused by a software malfunction, and assigns it a safety classification (class A, B or C). Class A is used when no injury is possible, and class C is used when death or serious injury is possible (e.g. a surgical robot). The manufacturer also provides a "failure modes and effect analysis" document that looks at everything that could go wrong, what is the likelihood of the failure happening and what is the effect on the patient.

Based on the safety classification, IEC 62304 requires different levels of rigour. For example, the standard only requires blackbox testing for class B software, whereas for class C software it requires whitebox tests as well.

The manufacturer also needs to come up with a software development plan that ensures that all of the requirements of the standard are met, and an "argument" (supported by test reports, process documentation, source control history, etc) that the software was developed according to the plan.

And that is what the FDA audits: they look at the development process of a given feature and they check that the plan was followed. I think they rarely delve into the details of the implementation and are generally just checking that the safety arguments are sound and supported by evidence.

ska · on April 17, 2018

Worth noting that this is a rough description of how IEC 62304 looks at the problem, but adherence to that standard is not required by many regulatory bodies (including FDA, although there is guidance). It's a good approach to this, but there are others.

More generally, the regulatory body will be looking for you to have a formal engineering process in place and be able to demonstrate its efficacy. Part of that will be looking for how you do hazard and risk analysis, how you handle CAPA (corrective and preventative actions in FDA-speak), how you do system trace, design history file generation, etc. etc. That you have a software development plan and can demonstrate how you follow it.

So they aren't really interested in code reviews per se, but they are very interested in how you view code reviews, how you perform them when you do, what gets documented, how you perform trace an V&V etc.

JumpCrisscross · on April 17, 2018

> class C software it requires whitebox tests as well

What does this mean?

gowld · on April 17, 2018

test the internal components, not just the externally visible performance.

rhacker · on April 17, 2018

Seems like they are using this to get a proper referral to a specialist rather than using this as sole diagnosis. The code itself is probably 15 lines using Tensorflow or other framework I'm guessing, but could be wrong.

amelius · on April 17, 2018

In that case, shouldn't the training data be open source? Seems like somebody got an unfair advantage here.

maxerickson · on April 17, 2018

In that case, the training data is the product.

They intentionally built an unfair advantage so that they could sell it.

arfar · on April 17, 2018

Oddly enough that's what they're aiming to claim a monopoly over.

Here's[1] a patent they have filed towards the system. Claims 1-18 and 20 are focused on the training of the neural network. Looks like Claims 1-18 are going to be granted soon largely in that form also from looking at PAIR[2].

[1] https://patents.google.com/patent/US20160292856A1/en?q=AI,ar... [2] https://portal.uspto.gov/pair/PublicPair

monkeynotes · on April 17, 2018

If it is that simple I am wasting my time in my current job.

Larrikin · on April 17, 2018

I got a book on TF last year and saw all the examples were fairly short and was impressed with the power. However it took me a good three months before I could reason about why reach line was there. It's pretty complex stuff that takes a while to even understand, let alone actually write.

monkeynotes · on April 17, 2018

After I wrote the above comment I remembered that just because there are few lines of code doesn't mean it's trivial to compose those actual lines.

Some of the most elegant code is terse, but it takes years of education, experience, and intelligence to be able to produce that logic.

opportune · on April 18, 2018

A lot of machine learning in particular is like that. The best practitioners have very strong intuitive understandings of the data and the modeling problem, which is what allows them to construct effective models (wrt capacity, training algorithm, topology, use of certain optional training procedure additions). I would liken their job to that of a doctor writing a prescription: it doesn't take much to write the prescription itself, but it takes mastery to be able to analyze the problem and write the best prescription

mickronome · on April 17, 2018

The algorithm could be simple. Evaluating all failure cases, analysing them, and getting the whole lot approved certainly isn't simple.

killjoywashere · on April 17, 2018

They generally treat compute devices as black boxes. Performance is all that matters. But once an artifact is submitted, it's locked down. That and only that will be approved. You change a resistor, an if-then, it goes back to FDA.

jhokanson · on April 17, 2018

What about learning? Would an update with more data and updated coefficients require going back to the FDA?

ekianjo · on April 18, 2018

Yeah that was what I was getting at.

ska · on April 17, 2018

See other comment about code reviews.

In general though approval to market for particular indications is for one fixed configuration of a product, so your model parameters won't change.

All of this is in the process of being hashed out, but I expect for a while at least if you are doing on-line learning it will be in non-clinical configurations only and you will end up releasing an update periodically. Depending on the changes this may need a new 510(k) or not, but would definitely need a formal release.

laythea · on April 17, 2018

I wonder if doctors will become like pilots? Ie. Almost redundant, but essential, at the same time.

kendallpark · on April 17, 2018

As someone in this field (AI+medicine), I think this is the best analogy. Though a key distinction is that the human body is also a person whereas a plane is not. Physicians are there to promote the health of the person not just their mechanical parts. I've seen doctors fudge billing codes to help poor patients afford care. I've seen doctors pick up on domestic violence situations based on small social cues. There is a certain degree that healthcare relies on the humanness and empathy of physicians to promote human flourishing.

bobowzki · on April 17, 2018

I'm a doctor and very much pro ML/AI. I'd love to have an autopilot I could watch in awe. Still a lot of practical tasks though that will be much harder to automate. And for the first few generation of AIs I guess someone will have to babysit them.

YeGoblynQueenne · on April 17, 2018

Ah, but who's going to treat the next few generations of AI, when there are no more doctors left who can "fly" without their "autopilot"?

kendallpark · on April 17, 2018

I think a lot of commercial pilots would take offense to this. Commercial pilots can still fly without autopilot. Just as much as you can drive on the highway without cruise control.

https://pilotjobs.atpflightschool.com/2016/07/27/how-much-do...

laythea · on April 17, 2018

I would imagine having such an autopilot would make one less professed at any particular subject.

bobowzki · on April 17, 2018

Or perhaps it could lead to new insights?

kendallpark · on April 17, 2018

For those interested in the research side of this, Google Brain actually published a study in JAMA on the same topic. They did clinical trials in India and should be publishing those results eventually.

https://static.googleusercontent.com/media/research.google.c...

In terms of how well the "experts" perform: "For moderate or worse DR, the majority decision of ophthalmologists had a sensitivity of 0.838 and specificity of 0.981"

http://www.aaojournal.org/article/S0161-6420(17)32698-2/abst...

And here's a video that describes what's going on in plain English:

https://www.youtube.com/watch?v=oOeZ7IgEN4o

jpredham · on April 17, 2018

There are several comments about the accuracy of the algorithm, but Doctors also struggle with diagnosis. In the Deep Mind study on DR they found that for about 20% of referable cases doctors disagreed on the diagnosis about 40-50% of the time. In order to combat this they had at least 3 and up to 7 opthamologists grade each image.

Source: https://jamanetwork.com/journals/jama/fullarticle/2588763

jadedhacker · on April 17, 2018

I wonder what factors made this decision possible? I love the idea of automated diagnosis but the performance rates are 87% true positive and 90% true negative in the article. Seems a bit low.

Maybe people aren't getting diagnosed at very high rates? That would be a reasonable justification for deployment with somewhat less than perfect accuracy. Anyone have any insight?

jforman · on April 17, 2018

For new stuff, FDA performs a holistic cost/benefit analysis.

In this case, they might weigh: * How many new cases are caught by expanding access to specialist tools * What fail safes exist in current course of care — how does a false negative result in a worse outcome for a patient than if they had had no diagnostic at all * etc.

The summary of their decision is public record, but not the detailed analysis.

Ensorceled · on April 17, 2018

I used to work in medical startups and twice we were approved for simply being better than standard of care.

If you can show a statistically better chance of a good outcome with a small chance of significantly worse outcome, the FDA will often approve.

hanklazard · on April 17, 2018

It also seems like a quick way to screen for this condition, vs the level of concern that a primary care doc might need to have before referring someone to an ophthalmologist. In other words, you could screen more people, even those that are asymptomatic (but have diabetes) and potentially catch retinal disease earlier. Even without mind-blowing sensitivity / specificity, this could preserve a lot of people’s vision.

maxerickson · on April 17, 2018

The status quo is screening at the eye doctor. This enables screening at primary care visits. People mostly go to primary care more often than the eye doctor.

The medical risk is that people will forgo other screening for 12 months when given a negative result. The cost of additional screening for false positives is the other big downside (this is all the machine does, recommend a specialist visit or to rescreen in a year).

gowld · on April 17, 2018

> People mostly go to primary care more often than the eye doctor.

Really? Standard advice is 1/yr for both.

tchalla · on April 17, 2018

> But of course, not having a specialist “looking over the shoulder,” as Abràmoff puts it, raises the question of who will be responsible when the diagnosis is wrong

Ultimately, accountability and transparency will be the Achilles heel.

monkeynotes · on April 17, 2018

Ultimately it has to be the company providing the technology that is liable, but I bet you they have a terms of use clause that puts the responsibility on the clinics using the technology.

dgacmu · on April 17, 2018

The more important question is what they're actually held to. In the case of a diagnostic test like this, the most reasonable thing is to hold them to the false negative and false positive rates that they say the product has. No real-world diagnostic test is perfect. Instead, we design the systems around the tests to accommodate the fact that they do have errors. As long as the device manufacturer is correct about their probabilities of errors, then we can incorporate them into a system that works better than what we had before. Or we can know that we cannot use them in a useful way, and just ignore it.

TaylorAlexander · on April 17, 2018

I feel like this has the same promise as self driving cars - raise the floor for the quality of service while also experiencing random unexplained failures that its human supervisors fail to notice in time.

dboreham · on April 17, 2018

Based on my experience watching doctors (good ones) trying to diagnose "weird things", they are just manually executing an expert system algorithm anyway. They aren't doing what an engineer might expect -- working from a basic understanding of how the body works and looking for plausible explanations. They're instead simply pattern matching against a database of facts.

nradov · on April 17, 2018

A lot of diagnostic skill in complex cases is based on clinical intuition developed over many years of practice. That's qualitatively different from an expert system executing defined rules against facts.

dboreham · on April 17, 2018

You'd think, but honestly in my experience it wasn't like that. An expert system would have done as well, or better. Possibly my experiences were colored by the cross-specialization nature of the issue -- specialists seem reluctant to engage any thinking outside of their area, I found. Like a software engineer would never consider that the problem they're investigating is caused by memory bits flipping randomly.

PhillyPhuture · on April 17, 2018

The company's board (politcos) and syndicate (there is none) is a bit weird for a FDA approval...maybe there are some caveats or scope notes that I have not seen.

ujal · on April 17, 2018

"The 207,130 images collected were reduced to the 108,312 OCT images (from 4686 patients) and used for training the AI platform. Another subset of 633 patients not in the training set was collected based on a sample size requirement of 583 patients to detect sensitivity and specificity at 0.05 marginal error and 95% confidence. The test images (n = 1000) were used to evaluate model and human expert performance." --

make3 · on April 18, 2018

Jeff Dean spoke about this topic in his keynote for the Tensorflow Dev Converence https://youtu.be/kSa3UObNS6o?t=27m32s

randomerr · on April 17, 2018

I don't see the big deal. We've been using 'AI' in medical equipment for nearly decade. Look at DeVinci surgery devices. Its just until recently the technology has been called AI.

lucio · on April 17, 2018

>AI software that helps doctors diagnose diabetic retinopathy like specialists is approved by FDA (theverge.com)

derefr · on April 17, 2018

Well, sure, but the software's architecture probably isn't too particular to this use-case; it's just computer vision. Given FDA approval for using CV for this, we'll probably quickly see many other companies attempting to drive similar technologies to market.

Or, rather, we would if there were any existing companies in this space that could take advantage of this. The creator of this device, IDx (https://www.eyediagnosis.net), seems to be rather unique in being an entrepreneurial medical-device manufacturer; MDMs are a rather hide-bound lot.

Honestly, if there's going to be a wave of innovation in this space, I might expect it to come from the inorganic-chem-focused pharma companies, since they have the expertise in both materials science and machine-learning (from doing novel small-molecule detection studies) required to come up with the innovations. I expect they'd likely partner with one or more of the MDMs to build the hardware, but they would write the software.

Herodotus38 · on April 17, 2018

I think the title is fine, but a lot of the comments are applying what the software does to medicine in general.

What this software is used for is very specific, but also very useful in that it is a common medical problem. It is used only to help diagnose diabetic retinopathy (ie eye damage caused by diabetes).

This is AI Vision software used to analyze a photograph of someone's retina to detect damage. In essence it is much more like the programs that are used to analyze chest X rays to detect pneumonia that have been recently published. Where this is useful is that it can probably cut out a lot of human work in diagnosing retinopathy, however it is an incremental step. Even when I was a resident in a primary care clinic years ago the process was somewhat automated like this, with our medical assistants taking a photograph with a special machine, and then this photograph would be digitally sent to a specialist (I presume an opthalmologist but I could be wrong, maybe optometrists can be licensed to do this) for interpretation.

What this isn't, is diagnosing a patient based on taking a history and inputting examination findings and labs, etc... We are still quite a bit of a way from that but I'm sure people are working hard on that as well.

EDIT: In my opinion, where AI could really make a huge difference for my work as a hospitalist (a doctor that admits and rounds on hospital patients) is in voice recognition software, with eventual language processing to help me write notes faster. First, give me a program like dragon dictate but which I can use in the patient's room (obviously one would have to figure out the HIPAA compliance issues) that transcribes my voice and the patient or family member into a readable text file that I can review when I write their note.

Next step would be that same program can give me its attempt to summarize our interview into a reasonable note, which I can edit for accuracy. This would be in effect an AI scribe. A scribe, for those who don't know the medical jargon, is a hired person whose only job is to listen to a doctor interview a patient and help write medical notes, they are usually young pre-medical students. It's a relatively new position that became created as the burden of documenting in electronic medical records limited the amount of time providers could spend with patients. Very common in Emergency Medicine where high output is needed, sometimes also in primary care.

Next step, is you have a company with all this protected medical transcription data and eventual medical outcomes, and you use ML to find algorithms to try and tease out what variables ended up being the most useful for accurate diagnosis. Before that you could have the program prompt the doctor for questions that it thinks would be helpful, etc... Again, huge medico-legal barriers to this but there is a roadmap to becoming a billionaire in my opinion.

nradov · on April 17, 2018

Eric Schmidt (former Google chairman) proposed building an AI scribe during his HIMSS 2018 conference keynote address. He claimed it should be possible within 10 years. I hope he's right, but I suspect he underestimates the difficulty of building reliable clinical systems.

http://www.himssconference.org/education/sessions/keynote-sp...

Herodotus38 · on April 18, 2018

Thank you for the link, this is what I'm interested in.

gumby · on April 17, 2018

> What this isn't, is diagnosing a patient based on taking a history and inputting examination findings and labs, etc... We are still quite a bit of a way from that but I'm sure people are working hard on that as well.

This is one of the classic AI application domains -- for example this is why Stanford's Knowledge Systems Lab was near to the medical school and why the mainframe Zork was developed was owned by MIT's Medical Decision Making group.

ska · on April 17, 2018

PHI is a real barrier to your wish list, it's harder to get right than most people think and it is very very hard to get right in a distributed environment. And all the current advances in these general areas (voice recognition, transcription, semantic reasoning, etc.) are leaning heavily on distributed processing.

It's definitely a well identified market and there are people working on it, but I haven't seen much progress (not that I'm looking terribly closely at the moment).

From the point of the view of the last step, the ML one, I suspect the consenting is as much an issue as the PHI. Getting any real traction on this will likely require massive data sets and significant clinician time to aid training.

Herodotus38 · on April 18, 2018

The barrier is high enough that I heard that Microsoft is working on this with UW by using pretend patient encounters in an attempt to bypass PHI and get a database.

dboreham · on April 17, 2018

Hmm...I wonder how many patients actually diagnose themselves using the Internet (or their relatives do so)?

Source : personal experience.

icebraining · on April 17, 2018

Jerome K. Jerome identified the problem some people have with self-diagnosis - and the Internet only makes it worse.

http://three-men-excerpt.pen.io/