Use of artificial intelligence for image analysis in breast cancer screening

honksillet · on Sept 2, 2021

Some background info on mammograms (x-rays of the breast). For whatever reason breast cancer is highly political in the US. As a result there are federal laws regarding communication of results and scheduling follow-up exams that apply to mammograms only and no other radilody or other medical test. To comply with these laws radiologists set up a uniform grading system with highly specific jargon. As a result the formatting and language used in mammogram reads by radiologist will be much more consistent than, say, a chest x-ray screening for lung cancer. This makes mammograms the ideal studies to be tackled by AI

tedivm · on Sept 2, 2021

This is actually why it isn't ideal for AI to tackle. The fact is there's no market for the product.

Right now much of the mammogram reading is extremely straight forward. Radiologists can fly through those exams with high quality, especially when you consider the macros they have setup that allow them to write most of the report with with a two second voice command.

This is one of the biggest things radiologists complain about- from their perspective most AI is solving problems that aren't actually useful to radiologists. I think the companies that will do best are the ones that augment radiologists with a focus on making them more accurate and efficient, rather than trying to cut them out of the loop (I'm a bit bias though as my previous company, Rad AI, is taking the augmentation approach and it has worked out rather well).

tw04 · on Sept 2, 2021

> This is actually why it isn't ideal for AI to tackle. The fact is there's no market for the product.

There's a difference between: not ideal to tackle and not ideal to build a business around.

It is absolutely ideal to tackle which will hopefully build confidence in a system that can then be expanded to solve harder problems. If you can't even solve this problem I don't see how you're getting public confidence in the harder ones.

antupis · on Sept 3, 2021

Yeah this also third world needs this kinda solutions where you can replace specialist with algorithm.

earthboundkid · on Sept 3, 2021

Why? The defining feature of the third world is cheap labor.

The third world might need more effective training, so they can convert more cheap labor into specialists, but I don't see why per se they would prefer to use a computer sold at Western prices to a local specialist.

Siira · on Sept 4, 2021

"Cheap effective training;" Pick two.

ska · on Sept 2, 2021

> The fact is there's no market for the product.

That's empirically not true; the first such product approved for clinical use entered the market in the late 90s, and they've had small-to-medium success since. Definitely none of them changed things radically, but it's pretty common CAD functionality now and has both detractors and champions.

> Radiologists can fly through those exams with high quality,

This is also not entirely true, one of the biggest problems for screening (not diagnostic) programs is that the false negative rate for radiologists is high. They do fly through these scans, but that's to make it economically feasible. Also, the average radiologist performance on screening mammo is not very good - most have to do it regularly (probably 100s per month) to do it well.

You are right that augmentation is the more plausible path to clinical success, but that's been what most of the clinical systems aim to do anyways, provide a sanity check on the radiologist.

ethbr0 · on Sept 2, 2021

I have no idea why everyone tries to solve the hardest problem (true positives, no false positives) with machine learning, instead of solving the much easier problem (true negatives, no false negatives) that often delivers similar value.

E.g. If a system could tell me which 967 results out of 1000 are normal, and send the other 33 for human review.

True positives require solving every problem in the solution space. True negatives only require solving the most frequent problems.

ska · on Sept 3, 2021

If I remember correctly this was tried in Europe ages ago, liability concerns made it impractical in US.

hellbannedguy · on Sept 3, 2021

Why do I feel the AMA was a bigger problem than liability?

Plus--wouldn't the software company be liable? (I don't think the tech is ready for office use, but when it is--I still think it will be a battle.)

Kinda like all medical devices. Whenever there's a problem, the equipment is looked at first?

Personally, I don't think the AMA wants computers taking away their doctor's precious income, and will only acquiesce when the tech makes the doctor more money, or is so good politicians/insurance companies start demanding it.

ethbr0 · on Sept 3, 2021

Insurance companies will be the ones to push it. And they'll push it when the government (through Medicare) mandates it / cost reductions.

But IMHO medical law in general has a big device / software blind spot. It's become decent at figuring out liability for humans, while permitting normal operations.

But all that machinery and case law doesn't really exist for black box software systems (no human in the loop). And if there's one thing that medical providers and insurance companies hate, it's unbounded liability.

version_five · on Sept 2, 2021

Just commenting to agree with this. I've done a lot of ML use case ideation work, and what happens so often is the ML folks latch on to something that ML may be able to do, but that doesn't move the needle at all in terms of doing the work faster or better.

Often, the ML provides an extra layer that slows things down, even if it's working at state of the art AUROC or whatever we're trying for.

version_five · on Sept 2, 2021

I'm really surprised by the pushback you're getting. I'd be curious the background of the people that have come back with replies about how it's an ideal use case, etc. This misunderstanding of the problem domain is exactly why these projects always fail - people come up with a "solution" that solves something that isn't the problem, despite whatever clever horse analogies people can come up with.

antasvara · on Sept 2, 2021

Is there an argument to be made that decreasing the workload on radiologists could improve their results in more difficult cases? I don't think it's a stretch to imagine that less tired/busy doctors would be able to perform their job more effectively.

There's also the fact that radiologists get paid a pretty decent salary. If I could devise a system that can do a highly paid worker's job faster, cheaper, and at a similar level of effectiveness, why wouldn't I? It also opens up that skilled worker's schedule and allows them to tackle more difficult to interpret scans.

tedivm · on Sept 2, 2021

That argument is exactly the one made by my former company, Rad AI. Rather than focusing on trying to "solve" radiology they're focused on augmenting radiologists. Their main goal is to reduce the amount of time a radiologist spends on each report while increasing accuracy and reducing errors.

The point I'm making is that focusing on mammograms is not the way to do that. It's already a very optimized specialty. If AI researchers worked more closely with practicing radiologists (not academic ones) I imagine there's all sorts of areas where existing technology today can improve things.

cameronh90 · on Sept 3, 2021

"If AI researchers worked more closely with practicing radiologists (not academic ones) I imagine there's all sorts of areas where existing technology today can improve things."

That's almost certainly true, and surely there must be people in the medical industry already working on this. In hospitals, there are internal organisations that vaguely attempt to link up technology workers with practicing doctors to improve medicine using technology. One of my most competent former developer colleagues retrained as a doctor and worked in medical digitisation, as well as being a practicing doctor.

However:

* Often coming at a problem from another angle and ignoring existing professionals can be useful too. Existing workers are very often stuck in their ways and unwilling to accept the way they look at a problem may not be the best. I have heard from many in medicine, that doctors are extremely distrustful and hesitant towards technology, as well as fiercely protective of their jobs.

* The medical industry is internally highly dysfunctional and trying to work with practicing doctors can be extremely politically difficult. The hospital bureaucracy often actively tries to prevent practicing doctors from working directly with technologists, preferring instead to silo them and direct all communication via senior management - who are often neither technology nor medical experts. Consequently it seems more likely that major changes will come from outside.

* AI has occasionally (rarely?) shown great success in solving problems without needing domain experts. See AlphaGo.

xvilka · on Sept 3, 2021

> AI has occasionally (rarely?) shown great success in solving problems without needing domain experts. See AlphaGo.

Games are always orders of magnitudes easier then analyzing reality. They are highly constrained to the narrow domain and space of possibilities. Reality doesn't. There is work on understanding physical laws by the AI but it's very long path and they are much less successful than their artificial reality colleagues.

sdenton4 · on Sept 3, 2021

In just a few decades we've gone from 'chess is ai complete' to 'go is ai complete?' to 'games are too easy to be meaningful.'

In game domains we have the ability to generate infinite meaningful training data and have easily measured outcomes. Waymo is generating near infinite data for driving, and doing quite well with it. In medicine, we have piddly ass datasets because of privacy concerns. 'Reality is hard' had nothing to do with it.

danielmarkbruce · on Sept 3, 2021

Reality is hard is still a thing. Google has the ability to spends billions (or 10's) on something like Waymo and they still have to spend billions (or 10's) more to properly commercialize it.

This isn't going to be a commonly done thing.

tedivm · on Sept 3, 2021

> The medical industry is internally highly dysfunctional and trying to work with practicing doctors can be extremely politically difficult. The hospital bureaucracy often actively tries to prevent practicing doctors from working directly with technologists, preferring instead to silo them and direct all communication via senior management - who are often neither technology nor medical experts. Consequently it seems more likely that major changes will come from outside.

This is one of the areas where working with radiologists actually makes things easier. Radiology groups are normally independent practices that contract with multiple hospitals- they link in with hospital systems but mostly run their own thing. They also tend to have really solid IT departments and enough funding to make things work. Working with hospitals is a nightmare though, as you say.

antasvara · on Sept 2, 2021

I think my point is that mammograms are a (relatively) easy to solve problem using today's technology. Why wouldn't it be worthwhile to solve that problem such that it has very limited human involvement?

It's the reason professional baseball is experimenting with electronic strike detection, despite the fact that umpires are very accurate (~94%) calling pitches.

I get that there are useful applications of AI that could augment a radiologist. I'm simply pointing out that if one could completely remove a radiologist from the equation, that could be a super helpful tool. I suppose my view is that removing/limiting the human element in routine inspections is a very worthwhile goal.

There's also the side benefit of increasing trust for these technologies. I think a lot of people are rightfully scared of AI technologies; showing that they can be as effective as a radiologist for routine screenings would be very beneficial.

908B64B197 · on Sept 2, 2021

> I don't think it's a stretch to imagine that less tired/busy doctors would be able to perform their job more effectively.

You are suggesting something that's against the groupthink for that profession. You can't graduate from an MD program while holding these thoughts.

I'm being 100% serious here.

danachow · on Sept 3, 2021

> You can't graduate from an MD program while holding these thoughts.

By your logic I must not exist then.

Like who do you think is labeling your training images?

Look at the authors of pretty much all cited studies in that meta-analysis. Sure see a lot of MD and MD-PhD there.

didntknowya · on Sept 3, 2021

problem is unless this AI is 100% perfect it will not provide any value to a radiologist. it is their reputation on the line with each interpretation.

to have them read an AI's diagnosis before having to assess a mammogram themselves to confirm the AI's finding is adding an unnecessary step to their workflow.

quietbritishjim · on Sept 2, 2021

> This is actually why it isn't ideal for AI to tackle. The fact is there's no market for the product.

I think you're disagreeing with a point that the parent comment didn't make.

It's worthwhile in general to continue AI research. The properties of this application make it ideal for testing and progressing AI technology. But if AI doesn't end up being used in this particular area in the short or medium term, that's another matter entirely.

kn1ght · on Sept 2, 2021

Hey I've also worked in a previous company that was doing exactly this for DK, UK, etc. As far as I remember one of the bigger things there was tissue density and masking effects where it might become not as clear cut for a radiologist. We saw a lot of opportunity in that area and in prognosis- the whole health economics aspect of frequency of visits.

jacquesm · on Sept 2, 2021

> tissue density and masking effects

But those are also harder for software based systems.

nl · on Sept 3, 2021

There are a bunch of countries outside the US where access to high quality radiologists is a significant problem.

But yes, augmenting human intelligence instead of substituting it often does work better.

scoopertrooper · on Sept 2, 2021

> Right now much of the mammogram reading is extremely straight forward. Radiologists can fly through those exams with high quality, especially when you consider the macros they have setup that allow them to write most of the report with with a two second voice command.

That to me sounds like an ideal automation use case. You have an entire class of highly skilled professionals performing a repetitive task. Sure, they may be able to do it quickly, but I imagine they must have to do a bunch of them each day.

You could imagine a world where hospitals didn't have radiologists on staff, but rather they acted in a consulting capacity for difficult cases.

acituan · on Sept 3, 2021

> That to me sounds like an ideal automation use case. You have an entire class of highly skilled professionals performing a repetitive task

Walking on variable surfaces is also a highly repetitive task and automation is yet to master it.

throwaway4220 · on Sept 3, 2021

Very few hospitals have radiologists on staff

bob33212 · on Sept 2, 2021

Good point. Similar to intellisense while coding.

908B64B197 · on Sept 2, 2021

But coding is a creative task. Breast cancer screening is more like classifying breeds of dogs or determining if there's a fire hydrant in the picture.

908B64B197 · on Sept 2, 2021

[flagged]

caraffle · on Sept 2, 2021

This shows clear ignorance of what radiologists do, it's like a fish telling someone how to improve land transportation.

For some reason CS people fixate on radiology for automation just because it is imaging, but there's a lot of context behind it. There's a reason a radiologist/pathologist is called a doctor's doctor, and no one in the field is worried about automation.

For perspective: Training to be a radiologist is 5+1 years, your family doc trains for 3.

908B64B197 · on Sept 2, 2021

> For some reason CS people fixate on radiology for automation just because it is imaging

Precisely because it's imaging. Training data is abundant, and has the potential to be well labelled. And one of the most active field of AI research is... computer vision. So it's no wonder the low hanging fruit would be medical imaging.

> There's a reason a radiologist/pathologist is called a doctor's doctor, and no one in the field is worried about automation.

Spoken like Garry Kasparov.

> For perspective: Training to be a radiologist is 5+1 years, your family doc trains for 3.

What I love is how the immediate knee jerk reaction isn't to explain why the ML approaches won't work but to immediately retreat behind gatekeeping, in this case the tittle and number of years of schooling.

ipaddr · on Sept 2, 2021

They are a specialist. A family doctor would order tests that a radiologist would read.

Replacing them with AI would seems like a straightforward process.

What behind the scenes context would make it impossible?

kspacewalk2 · on Sept 2, 2021

> What behind the scenes context would make it impossible?

The MD context. One first needs to train AI to perform the job of a general MD before you can get into stuff like radiology (that is, their real job, not what some novice CS grads, or not so novice AI experts like Hinton imagine it to be - I.e. not segmenting things into funky shapes or running some funky black box magic that spits out "tumor/not tumor" with no context whatsoever, no. Actually diagnosing real people, where a life is on the line, and if you fuck up enough times, your career).

908B64B197 · on Sept 3, 2021

> The MD context. One first needs to train AI to perform the job of a general MD before you can get into stuff like radiology

That's true. Waymo had to train their AI to perform the job of a firefighter for them to recognize a fire hydrant and a firetruck.

> I.e. not segmenting things into funky shapes or running some funky black box magic that spits out "tumor/not tumor" with no context whatsoever, no.

... But that's the goal right? There's a tumor right there at these coordinates or not.

jmhmd · on Sept 3, 2021

No, identifying the tumor is only step 1, and is the easiest step. Most non-radiologists can identify whether a tumor is present. The harder part (and the true value of radiologist reads) is everything that comes after finding the tumor: what structures are the tumor invading? Is there spread to lymph nodes? Are there secondary findings that might affect the diagnosis or treatment?

These questions and their relevance changes for every individual case, and while each question by itself may be approachable with AI, getting a detailed and relevant report without meaningless noise from an AI ensemble is a very very hard problem.

908B64B197 · on Sept 3, 2021

Finally an answer that's not just throwaway accounts flagging a submission!

These are all interesting problems where I could see an AI struggling. I guess the next step, once tumor identification becomes a solved problem, will be to train the AI on treatment data and follow-up, ie, this is an example where there was spread to lymph nodes.

Interesting times ahead!

mattkrause · on Sept 3, 2021

> But that's the goal right?

Part of it.

Human doctors will not only tell you if there's something funky in the image, but will also interpret it in light of a patient's medical history, symptoms, possible diagnoses, etc.

Subtle shading near some structure involved in one of two possible conditions might be very important, but an obvious cyst in an unrelated organ likely means nothing. People are weird close-up!

Saptarishi · on Sept 3, 2021

This is an excellent point. An AI may be able to do a diagnosis to you, but a doc could do diagnosis+post diagnosis, with the foresight of medical history. Theoretically an AI could possibly do this as well, but we're nowhere close.

mcguire · on Sept 2, 2021

"Replacing them with AI would seems like a straightforward process."

And yet....

908B64B197 · on Sept 3, 2021

"A machine will never beat a human at Go, the problem space is just too vast."

908B64B197 · on Sept 2, 2021

The years of schooling, of course!

throwaway4220 · on Sept 3, 2021

This AMA cabal/Luddite radiologist narrative is ridiculous. Many work on ML actively, we would love to have assist technologies, radiology residents are scared shitless about AI. I would really recommend you speak with more practicing radiologists outside of academia.

divbzero · on Sept 2, 2021

Though it’s plausible that ML could eventually take on some screening or diagnostic tasks from highly trained radiologists or pathologists, there is the hurdle of accountability that will need to be tackled. As with self-driving technology, the models will have to be good enough to take on the liability risk at scale.

908B64B197 · on Sept 2, 2021

> the hurdle of accountability

The medical field is notorious for delivering "efforts", not results (unlike engineering). No SLA here.

divbzero · on Sept 2, 2021

It depends on the specific situation. Screening mammography for instance carries significant financial (and emotional) risk of missing a call that ends up being cancer.

908B64B197 · on Sept 2, 2021

Risk for the radiologist or the patient?

markus92 · on Sept 4, 2021

Patient outcome, financially for the radiologist.

minsc__and__boo · on Sept 2, 2021

Liability will be delegated away from the hospital and onto the service provider, same way they are with diagnostics on clinical lab equipment. Hospitals love to continue to get paid while shouldering less risk.

scoopertrooper · on Sept 2, 2021

They can destroy as many textile mills as they like, but the economic incentives of technology will win the day.

bjornsing · on Sept 2, 2021

Probably. But it can take a long time.

Calloutman · on Sept 2, 2021

I'm puzzled at how breast cancer could be a political issue. How so?

randcraw · on Sept 2, 2021

Breast cancer isn't political. But there's continuing debate in the US medical establishment over the proper way to screen for it -- how to test, how frequently, at what age range, what constitutes a positive result, and how should the doctor follow up if the test is positive? The recommended answers to these questions seem to change every few years, and the substantiation behind the answers is often inconsistent.

The same issues bedevil prostate cancer screening.

himinlomax · on Sept 3, 2021

The science is pretty well established on *generalized* breast cancer screening not improving health outcomes despite being costly. (In short: cancer is rare, tests have false positives that trigger biopsies, biopsies are invasive and can be harmful, overall biopsies and pointless surgery on slow growth cancers do as much harm as cancer.)

But since screening has been framed as caring for women, pointing out the flaws with screening is automatically seen as hating women.

317070 · on Sept 2, 2021

While Ronald Reagan was in office, he had a colectomy for a colon tumor in 1985 (turned out it was not cancer) [0], and his wife had a mastectomy for breast cancer in 1987 [1]:

While Reagan did a lot to try to defund cancer research regardless, the first lady's mastectomy drew a lot of attention to a previously taboo topic.

[0] https://www.nytimes.com/1985/07/16/us/reagan-s-illness-medic... [1] https://www.latimes.com/archives/la-xpm-1987-10-18-mn-15261-...

dekhn · on Sept 2, 2021

It kills large numbers of middle-to-rich people who are highly motivated and make advocacy groups that are effective lobbyists.

Back when I was a life scientist the best advice I got was to work on diseases the family members of congresspeople had.

kaesar14 · on Sept 2, 2021

Americans have a weird relationship with breasts. Also issues that only affect women (mostly, I know men can get it in rare cases?)

ars · on Sept 2, 2021

Breast cancer is the cancer that raises the most money (that pink ribbon you see everywhere). That money goes somewhere.....

This is not a "weird relationship" thing, rather women got organized and: https://en.wikipedia.org/wiki/Breast_cancer_awareness

ben_w · on Sept 2, 2021

While all that is true, America absolutely has a weird relationship with breasts.

lostlogin · on Sept 2, 2021

> While all that is true, America absolutely has a weird relationship with breasts.

It’s more broad than that, it’s the relationship with women’s health as a whole.

darkhorse13 · on Sept 3, 2021

What is exactly a "normal" relationship with breasts for a country to have?

teekert · on Sept 3, 2021

Things like: It's normal to breast feed in public. Or the nation does not go bananas when a nipple is exposed on TV. Or how normal it is to sunbathe topless. Arguably this also relates to how normal it is for kids to walk around naked and families to be naked around each other.

If you're really interested, watch some older Dutch movies to see the normalness of naked, like Turkish Delight [0]. Or even have a closer look at the relation between the Professor and Raquel in the recent Casa the Papel. It's different from US series. More respectful, mature wrt women if you ask me. More emphasis on intelligence. Or on a beautiful woman in her 40s with normal wrinkles. The US, like on many other topics, seems to be polarized, caught between the hyper sexuality of Cardi B and the prude nature of American culture in general.

[0]: https://en.wikipedia.org/wiki/Turkish_Delight_(1973_film)

gweinberg · on Sept 2, 2021

That depends on what you think a normal relationship with breasts would look like, I suppose.

ben_w · on Sept 3, 2021

In particular, the pearl-clutching gender-specific moralising: it is absurd that an organ whose specific purpose is to feed infants by being inserted into their faces is censored specially from minors, whereas the non-functional copy of the organ in men which can’t even do that is apparently acceptable (modulo dress code, weather, etc, but you can walk around or swim or sunbathe topples as man, not as a woman).

teekert · on Sept 3, 2021

I'd say it's only the nipple. The contrast in response to side boob and nipple seems to be off the chart in the US.

yholio · on Sept 3, 2021

> While all that is true, THE WORLD absolutely has a weird relationship with FEMALE BODIES.

Fixed that for you.

thomasjudge · on Sept 2, 2021

Also the largest and best known breast cancer fundraising organization, Susan G Komen, has, in the words of wikipedia, "been mired by controversy over pinkwashing, allocation of research funding, and CEO pay."

https://en.wikipedia.org/wiki/Susan_G._Komen_for_the_Cure

SpicyLemonZest · on Sept 2, 2021

I don’t think “political” is meant to mean “controversial” here. There were just some extremely successful awareness campaigns for it in the 80s and 90s (to the point where it’s the stereotypical example of an awareness campaign for many of us), so people care a lot about it.

tialaramex · on Sept 2, 2021

One thing that will cause mammograms to be political, in several countries, is that there's a difference in perception of the downsides vs upsides of such screening programmes, and we're bad at communicating the trade to a population who lack the statistical literacy to understand it intuitively.

So our best shot from a public health perspective is to say "Here's what we recommend for everybody" and pay for that.

All screening programmes have two difficulties, which must be balanced against the benefit, and this trade is somewhat personal, so when the balance is quite fine the arguments can be vociferous as a result.

1. The screening itself may seem unpleasant. One woman may find it a very mild annoyance, a drive ten minutes out of her way, the staff are very pleasant, the scan itself is far less traumatic than a bra fitting, and she receives easy to understand results after not very long and isn't anxious about them; but for another maybe it's an hour's bus journey to the city hospital, the staff there are short-tempered and say she has the wrong paperwork, then another hour in a queue, she feels like she's just meat, squashed around for the convenience of the machine for what seems like forever, and then after anxiously waiting for what seems to be too long the results are confusing to her and she has to have a friend interpret them.

2. Over-treatment is always a problem. Screening by definition detects something that isn't causing noticeable symptoms. If you have a noticeable lump, or mysterious bleeding, you don't need screening you need a doctor's appointment. So a positive screening result might be nothing important. However either you've now got the burden of a diagnosis you ignored or, you accept the medical advice and are treated, even though it's possible (not likely, but possible) that you would have been just fine without treatment.

So, screening programmes are set up based on guessing how to trade these factors plus a third, how much should we spend on this medical intervention? After all, in some sense every dollar doing breast cancer screening is a dollar you don't have to cure blindness in poor orphans (or of course, to bomb somewhere)

If your experience of a screening programme is that it's a minor inconvenience at most, and yet you know people who died of undetected disease, more screening seems like a no brainer. Particularly if you live somewhere where screening stops at age 50, and somebody you know died of undetected disease aged 54, you might reason that the screening should go to age 55 or 60 to detect such cases, no matter the public cost.

On the other hand if your experience is that it's an awful ordeal even when negative, and you know people who spent their last years horribly scarred by surgery as a result of suspected disease but then they died in their sleep from something else anyway, you may feel that there's already too much screening and it should be trimmed back, not to save money but the extra money for other programmes is welcome.

bluGill · on Sept 2, 2021

The other issue is the screening isn't risk free. I don't recall the exact numbers, but for every 3000 cases of breast cancer detected one is in someone who wouldn't have got breast cancer at all if she wouldn't have got all those screenings. It is still worth doing because it saves a lot of lives, but the more you do it the more cases you will cause so you need to find the right balance.

cogman10 · on Sept 2, 2021

The screenings right now are something like once every year. Doesn't seem like you'll get a lot of cancer from that.

I certainly don't worry about the cancer I could develop as a result of an xray in a regular dental cleaning (even though that's a possibility).

bluGill · on Sept 3, 2021

Most women don't get breast cancer, the 1 case in 3000 (again I don't know the exact number, but this is reasonable for range) caused by xrays includes screenings of all women, including those who never get it.

However if you are a doctor trying to figure out how often to screen women, the danger if xrays is a strong reason to not do it too often. Daily screenings would catch breast cancer a lot sooner on average, but just isn't worth the risk even if the screening was otherwise free.

cwmma · on Sept 2, 2021

male doctors have not historically treated female patients great.

ars · on Sept 2, 2021

That's not really true in this case. There was no effective treatment for a long time, until suddenly there was. See: https://en.wikipedia.org/wiki/William_Stewart_Halsted

cratermoon · on Sept 2, 2021

We're not even talking about treatment: just regular screening and detection, the same as you get when you go to your doctor and he sticks a finger up your butt and tells you to cough.

mrguyorama · on Sept 2, 2021

>sticks a finger up your butt and tells you to cough

You may want to consider finding a different medical professional!

cratermoon · on Sept 2, 2021

Not at the same time! Referring to the two common tests that doctors routinely perform on men, every physical checkup

CyberDildonics · on Sept 2, 2021

Whoever did that to you was not a medical doctor

cratermoon · on Sept 2, 2021

Not at the same time! Referring to the two common tests that doctors routine perform on men, every physical checkup

cratermoon · on Sept 2, 2021

There's a lot of literature about the systematic biases in medicine that exclude a lot of women's health care from serious consideration.

hellbannedguy · on Sept 3, 2021

It's definitely not brest cancer now. It's very well funded now.

caycep · on Sept 2, 2021

also, maybe there's more number of scans, so higher N datasets to train on?

Our hospital bought an AI stroke detector (viz.ai). The "AI" part is laughable but it allows the hospital to collect some extra HHS/medicare fee for using "imaging algorithms" or something. I suspect that company is potentially using anonymized scans to continue training their data, because the initial product was trained on some tiny sample of brain CT scans, like under 500.

The one plus for actual physicians and nurses is that at least they wrote a non-sucky PACS imaging interface for the iPhone so we can just pull up the plain scans and view them easier.

cratermoon · on Sept 2, 2021

Wow, so a medical device which doesn't work is being used in what is essentially an unsupervised, non-random, non-blind trial?

Why would that even pass the FDA smell test?

inter_netuser · on Sept 2, 2021

non-invasive devices are way easier to get approved, they are a lot less tightly regulated. you'd be surprised.

beerandt · on Sept 2, 2021

Is the xray not considered part of the device, or is a dose of radiation considered non-intrusive?

caycep · on Sept 4, 2021

It piggybacks onto the standard CT software - no additional scans require, it just pulls data off of the PACS.

crazygringo · on Sept 2, 2021

Is this what you're referring to?

https://en.wikipedia.org/wiki/Mammography_Quality_Standards_...

rhcom2 · on Sept 2, 2021

https://en.wikipedia.org/wiki/Mammography_Quality_Standards_...

256lie · on Sept 2, 2021

Additionally, breast cancer screening is a high-volume and low-prevalence task and CAD applications has been developed for decades (although not with the performance of latest CNN algorithms).

ska · on Sept 2, 2021

> This makes mammograms the ideal studies to be tackled by AI

This looks good from 70,000 ft, but in practice you need labels much more specific than BIRADS.

TravelPiglet · on Sept 2, 2021

Isn’t most science political in the US?

YossarianFrPrez · on Sept 2, 2021

This type of evaluation (both the meta-evaluation and the underlying studies themselves) are the exact sort things AI enthusiasts should want researchers to do. It cuts through the hype, and gives everyone a clear assessment of where the technology currently stands.

Clearly there is room for improvement. Maybe this study will also spur the development of new types of systems which augment human/radiologist decision-making.

snarf21 · on Sept 2, 2021

AI isn't a stand-alone panacea for this. We've done lost of studies with other radiologists. Taking the union of two "bad" radiologists outperforms a single "good" radiologist. Humans aren't machines, we have bad days. I think the long term benefit of this kind of AI is as an audit AFTER the radiologist has done their assessment. If there is a mismatch, it should trigger a follow-up with a different a radiologist (without telling them the result of the 1st or the AI) to build consensus.

ajmurmann · on Sept 2, 2021

This! I wish we did this type of usage a lot more in a variety of areas. I previously proposed this at a job to enhance filtering out of false-positive security scan results. I think it doesn't get used because it would increase the immediate cost, not reduce it.

tsjq · on Sept 3, 2021

>I think it doesn't get used because it would increase the immediate cost, not reduce it.

indeed that is the problem in many areas. Almost always mgmt looks with "tech = fewer human = instant cost savings" expectations.

and anything that increases cost in the short term are shot down. sad . all blame on the cost-center vs profit-center mindset.

YossarianFrPrez · on Sept 2, 2021

I completely agree. In fact, one wonders if the biggest use of AI/ML in medicine will actually be in medical notes and record keeping -- voice transcriptions, medical scribing, and interpreting writing, free-form text into standard diagnostic categories.

Aspos · on Sept 2, 2021

AI which is slightly less effective than a well-educated, well-trained, expensive radiologist is better than nothing. There are places in this world which could afford a $XXK x-ray machine, but don't have access to a good radiologist.

divbzero · on Sept 2, 2021

Agreed! It’s easy to get sucked into the (intriguing) human vs. machine question and forget that the best solution is often both or some orthogonal improvement.

spockz · on Sept 2, 2021

I think getting a second “opinion” is pretty standard for biopsies?

TuringNYC · on Sept 2, 2021

My first step in doing such a meta-study would be to throw away any paper with <1000 (or choose your threashold) training sets. If the n value is insufficient to train the number of parameters on the model, the paper isn't good.

When I was CTO of DocHuddle (ML + Radiology), we reviewed every paper and preprint we could find and a huge number appeared to be overfit on tiny datasets.

Any meta-study which doesnt throw away obviously poor attempts will find bad-skewed meta-findings.

YossarianFrPrez · on Sept 2, 2021

That's a real shame. It's been years since "The Unreasonable Effectiveness of Data" has been published and out in the world. You'd think that using a large N for statistical learning methodologies would be standard practice by now.

Do Hospitals keep radiological images? Are they researching this stuff? Presumably they'd have a large enough sample size.

TuringNYC · on Sept 2, 2021

Yes, hospitals keep imaging on their PACS systems and VNAs (Vendor Neutral Archives). Large systems have millions of images, though not necessarily same body part. Very large systems have millions of the same body part alone.

Getting images is one thing. Getting labels is harder. Getting annotations on regions-of-interest is yet harder. Often the labels are stuck in unstructured data (notes.)

When I was doing my startup (https://www.dochuddle.com/) we trained our classifier and object detector on 1.2 million images. We worked with a large sovereign on getting the images, reports, and annotations.

Just dealing with 20+ TB of images was a job in its-self...but it was indeed unreasonably effective. The success was only technical and ML success.

Even harder -- commercializing it effectively after accounting for legal/contract costs. Yet harder -- getting past conflicts of interest inherent in the US medical system. We were not successful on this front. Possibly too early (we started in 2014)

YossarianFrPrez · on Sept 2, 2021

Interesting; thanks for sharing. It also seems like AI/ML could be used to extract labels from unstructured notes.

TuringNYC · on Sept 2, 2021

Yes, you can use NLP. One trick would be to use a medical vocab like https://www.snomed.org/ if anyone is doing this

https://en.wikipedia.org/wiki/SNOMED_CT#Semantic_tag

bgandrew · on Sept 2, 2021

Actually, a real shame is a total lack of really big (millions of records) data sets on pretty much anything medical. And this is of course due to our love with privacy. Probably millions of people died too soon because of that, and many more will follow in the future.

TuringNYC · on Sept 2, 2021

I'd agree that millions suffer due to a lack of care, or a lack of affordable care, or a lack of early diagnosis.

I'd disagree this is due to privacy. IMHO it is due to conflicts of interest in major medical systems (certainly in the US) where incentives are skewed away from efficiency towards more billing.

tialaramex · on Sept 2, 2021

I remember a friend in an allied discipline talking to me about what they did for cervical cancer where again the initial AI outcomes are weak. I don't know if this is now gold standard clinical practice, or still an experimental, but here's how I remember it:

We screen people with a cervix for cancer by scraping away a tiny sample of the cells periodically and having that examined at a laboratory for anomalies. If caught early, cervical cancer isn't fun but it's extremely survivable. You don't technically need a cervix (after all about 50% of humans don't have one) so in the worst case hysterectomy (removal of the womb and cervix) is an option.

Machines aren't very good at looking at a slide full of cells from the human cervix and giving it a score like 1-5 where 1 is "Fine" and 5 is "Cancer". This is a standardised task that her (human) team do every day, and she works with other bodies across the continent to ensure they're all doing a roughly similar job by looking at each others examples and checking they get the same number, so this way a doctor in Paris and one in Birmingham should get interchangeable results despite using different labs to process the samples.

However, it turns out the machines are stellar at a related task. "Is this sample infected with HPV?". Humans do not find this task easy and historically this test wasn't done anyway.

So maybe a human would do the first thing, and, if it was a bit borderline, then they'd check the second thing. Cervical cancer is almost always caused by HPV, so if you don't have HPV then you almost certainly don't have cervical cancer. But since the machines are great at that second problem you can reverse things. The machine processes every sample for HPV and then a human only looks at the positive ones, rating those.

colechristensen · on Sept 2, 2021

What seems to be coming up repeatedly is that machine learning techniques are pretty good at being almost adequate replacements for recognition problems. Able to catch blunders but also missing at a rate high enough that you need the expert human anyway. In reality just replacing human kinds of error with machine kinds of error.

Automation though can also make experts stupid depriving them of the practice to keep skills sharp.

256lie · on Sept 2, 2021

It's known as automation bias and a problem in pilots as well as doctors. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7651899/

randcraw · on Sept 2, 2021

That's why most uses of AI in medicine that I've seen have focused on performing a sanity check on the physician (e.g. checking a new prescription for contraindications like conflicting meds or maladies). Medical insurance companies have embraced this practice for obvious economic reasons.

bakuninsbart · on Sept 2, 2021

Doing this early also helps people get used to the idea of AI replacing/complementing doctors in the future. Objectively, it would be better to be diagnosed by a machine instead of a human doctor, if the machine is better on average at giving the right diagnosis. In practice, there is a large legal and emotional gap to close to get acceptance for this.

The gains could be incredible though, not only would we get better diagnosis, it could be done faster, cheaper, and remotely.

cratermoon · on Sept 2, 2021

> Objectively, it would be better to be diagnosed by a machine instead of a human doctor, if the machine is better on average at giving the right diagnosis.

"On average"?

What if the machine is better than average for common things but consistently, 100%, misses uncommon conditions with very high short-term mortality rates?

FiberBundle · on Sept 2, 2021

I'm also generally sceptical of the current AI hype, but I don't see exactly how these results cut through the hype. Why isn't it seen as an accomplishment that 6% of those systems performed better than an actual radiologist?

256lie · on Sept 2, 2021

Meta analysis is more common in medical journals than computer science conferences.

The evaluation of AI medical devices is determined by regulatory agencies like the FDA.

Traster · on Sept 2, 2021

I think this headline is bad. When you read the abstract you find that what this is really saying is that there are essentially no studies that are fit for purpose to evaluate AI systems for breast screening in general and those that are the closest to being suitable show the worst results. That's the real take-away here - we're not even at the point where we're collecting good data about AI accuracy, let alone actually producing accurate AI.

I would suggest these two things might be linked - the only way you get publishable results is by failing to do rigorous studies.

dang · on Sept 3, 2021

We've since changed the title. The submitted title ("94% of 36 AI systems evaluated were less accurate than a single radiologist") broke the site guidelines by editorializing. Submitters: please don't do that.

"Please use the original title, unless it is misleading or linkbait; don't editorialize." - https://news.ycombinator.com/newsguidelines.html

jszymborski · on Sept 2, 2021

You can get away with publishing models with worse accuracies if you can pitch a more fair evaluation scheme, but it's way harder, and you don't get to do flashy presentations where you state you're SotA.

I hope I don't offend any other people in the field, but I think historically fields like histology/radiology/MRI screening via AI have been subject to less scrutiny by virtue of their multi-disciplinary nature... particularly in the past, it was hard to find reviewers that both (A) understood the biological/clinical validation (B) the technical validation.

I think things have drastically changed over even the last 5 years, which is why you're more likely to see headlines like these. We have a lot of discussions like these during our lab's journal clubs. The optimists among us (e.g. myself) argue that this opens up a lot of opportunities whereby more fair validations/benchmarks allows us to compete with "SotA" methods that are actually fragile and easy to surpass on even ground. More pessimistic members are quick to note that it's far harder to introduce new, fairer validations and that reporting lower metrics is far less buzz worthy (all true points).

This reminds me of a fundamental error that was recently found in methods that tried to predict protein-protein interactions. It was found that the train/test/validation method used by ostensibly every paper was leaking huge amounts of data [0]. When we plugged that leak, we saw much more modest metrics, but focusing on regularisation allowed us to beat the competition [1].

The kicker? The information leak were identified in 2012, yet you'll see papers written every year that have the same leak and report >90% accuracies, and >0.9 ROCs.

[0] https://www.nature.com/articles/nmeth.2259

[1] https://www.biorxiv.org/content/10.1101/2021.08.13.456309v1

joe_the_user · on Sept 2, 2021

The headline communicates that AI has effectively failed so far. That's a good thing to communicate.

256lie · on Sept 2, 2021

Also publications are not what determines if AI get deployed in clinical practice. That's the job of the FDA and million of dollars spent on validation like clinical trials and quality management systems.

cratermoon · on Sept 2, 2021

I might be misreading your comment, but, are you saying that this study is not rigorous? There's a bit of a disingenuous tone in that assertion.

Traster · on Sept 2, 2021

No, this paper is a review of other studies, and what I'm saying is it finds that there are quite a few problems with the studies they're reviewing - they point out that the smaller studies that show stronger results are actually less likely to be generalizable, and the studies that are broader show weaker results but even then they still have flaws in the method. There is quite a good table (figure 2) that shows how they rate their "concerns" for a number of metrics, and it's a sea of red where they have lots of concerns.

Jonanin · on Sept 2, 2021

This implies that there were a couple of AI systems that actually beat a radiologist, which I take as extremely promising for the field of AI radiology.

Like any domain in applied AI, there will be a lot of approaches that miss the mark, or are simply stepping stones to better approaches. There are thousands and thousands of papers on language modeling, but we only needed one superior approach (GPT) to change the game entirely.

The search through any cutting edge problem space is messy and full of failure, and that's fine. You only need one breakthrough.

titzer · on Sept 2, 2021

> This implies that there were a couple of AI systems that actually beat a radiologist,

Without any more details about the error rates, we can't be sure how likely this is due to chance. I would caution making any conclusion about AIs without better understanding the underlying statistics.

FTA:

> Thirty four (94%) of 36 AI systems evaluated in these studies were less accurate than a single radiologist, and all were less accurate than consensus of two or more radiologists.

So yeah, no AI system beat consensus of two radiologists. That's pretty damning.

curioushacking · on Sept 2, 2021

Depending on how correlated the verifications between the human and AI system are, this could be used as a verification system to determine if consensus needs to happen. I.E. Always run the ML system and only ask for a consensus if the ML system disagrees with the diagnosis. This could still provide a lot of value I would assume.

spyder · on Sept 3, 2021

Not a single AI model is better, but what about the consensus of the 36 AI models? Ensembling different models is a common technique to improve machine learning models, did they test that?

computerphage · on Sept 2, 2021

> That's pretty damning.

Indeed. And we all know how quickly radiologists are improving at their job. At this rate the 6% of AI systems that beat one radiologist will be down to 0% in no time.

arcticbull · on Sept 2, 2021

I'd push back on the 6% of AI systems being better than a radiologist and calling that a success, but you are right in the meta.

It's fair to say that yes, AI systems aren't good enough yet. On the other hand, it's pretty clear some technological approach will outperform a radiologist at pattern recognition at some point in the future - whether that's "AI" or "if statements" or some third option.

It's just a matter of time.

thethimble · on Sept 2, 2021

Another interesting subtlety is that there are only a finite amount of radiologists and they’re generally concentrated in wealthy countries/areas.

AI based analysis - whether it’s better than a human radiologist or not - is far more scalable and cost effective. Even if used as a screening mechanism to be escalated to a human radiologist, this approach will be very helpful to much of the world.

1024core · on Sept 2, 2021

> I'd push back on the 6% of AI systems being better than a radiologist and calling that a success,

How much time does it take to train a radiologist to that level of performance?

How much time does it take to clone that ML model?

arcticbull · on Sept 2, 2021

I just meant that it's not clear from this that the 6% are 'overall better' just that the 94% are 'overall worse.' More data is needed, but it does appear that progress is being made, and I'm excited by that.

After all, no AI beat two radiologists.

netcan · on Sept 2, 2021

It depends on context.

Here, the context appears to be a somewhat arbitrary selection of published algorithms. All they've really determined is that at least 94% are not ready to replace radiologists.

That's pretty much confirmation of the default assumption. If they were, they'd all be trying to get these into hospitals, and they're not.

okl · on Sept 2, 2021

You could also say: There exist at least some radiologists that can be beaten by an AI system. :D

I guess a good questions could be: Will they be more reliable than radiologists in scenarios other than the ones studied?

fighterpilot · on Sept 2, 2021

And how do we know that this small handful that beat the radiologists didn't just get lucky? You really need to know the sampling distribution of what's being measured here.

TaupeRanger · on Sept 2, 2021

It doesn't imply anything of the sort. Until a well powered randomized controlled clinical trial shows an overall mortality benefit from an AI screening program, the field hasn't contributed meaningfully to medicine. I'm not saying it won't happen, but we are almost certainly very far from that goal.

256lie · on Sept 2, 2021

Clinical AI (which is currently regulated as a CAD medical device by the FDA) won't replace radiologists but treated as an additional clinical vendor application integrated into existing software. Similar to speech recognition diction that has been provided by Nuance for decades.

duckmysick · on Sept 2, 2021

The study mentions that all of them were "less accurate than consensus of two or more radiologists".

meow_mix · on Sept 2, 2021

anon gets fooled by randomness

rafaelturk · on Sept 2, 2021

Tks for this wellwriten, fact driven comment.

nickysielicki · on Sept 2, 2021

I think the problem with this is that the AI is being used on metrics and images and tests that have meaning to humans. But if we started to take diagnostics that had higher dimensions and resolution, and just trained blackbox AI on that data, it probably stands to reason that it could do a better job. Especially with a tight feedback loop that would identify interesting regions and rescan them instantly, instead of having to bring patients back into the room at a later date.

It's the same problem I have with self-driving cars. We are teaching cars to behave around (and like) humans, instead of just fencing off the highways and inventing an autonomous system of vehicles that doesn't have this much tougher constraint. I think AI can do a lot of good things today, if the people trying to apply it looked to completely redesign existing systems around AI, instead of trying to replace the humans operating the existing systems. The latter is much more difficult.

mcguire · on Sept 2, 2021

"But if we started to take diagnostics that had higher dimensions and resolution, and just trained blackbox AI on that data, it probably stands to reason that..."

...that we would have no way to evaluate the systems' accuracy because no other non-statistical system can understand the input data?

The Oracle at Delphi is always correct. Even when the Oracle is wrong, the Oracle is correct.

"...just fencing off the highways and inventing an autonomous system of vehicles that doesn't have this much tougher constraint...

Kind of expensive to build a second interstate highway system, though.

nickysielicki · on Sept 2, 2021

> we would have no way to evaluate the systems' accuracy because no other non-statistical system can understand the input data

No, we can always evaluate the accuracy based on our current human-understandable images and scans and biopsies.

bluGill · on Sept 2, 2021

We can evaluate over time. If the system repeatedly predicts people you will get lung cancer in 10 years based on scans, and in 10 years we finally detect that cancer that says there must be something even if we don't know what. Finding that what then becomes important (along with figuring out how to treat lung cancer 10 years before we can currently detect it)

Of course this is all assuming there is something there. It might be there is nothing to find. I wouldn't be surprised either way.

Swizec · on Sept 2, 2021

> instead of just fencing off the highways and inventing an autonomous system of vehicles

As a popular joke likes to say: The best place for AI cars is on special roads. You can then make those cars drive very close togehter, even touching. Oh and you can then move the special road behind buildings instead of the front. Give room for more pedestrian friendly pleasant streets. Then you may as well move the road underground.

You have invented the subway.

Zababa · on Sept 2, 2021

We have automated subway where I live, and while they're nice and modern, I understand why people would want the privacy and comfort of a personal car. That's one of the points that isn't mentionned in your joke.

Another is that you could have your car use its AI in the special roads in the city and highways, and the rest of the time drive yourself, where AI is harder to implement. A bit like some hybrid cars where you can do most of the day to day commute on electricity, but for longer road trips you can use gas.

cratermoon · on Sept 2, 2021

Sort of. I think there's a good argument to be made for part-time self-driving cars with specially-designed highways in places where it makes sense. So you'd drive your car the old-fashioned way from your home on the residential streets until you get to the Automatic Interstate. There, you maneuver into the "pattern" lane and the road and the AI pull you into traffic and drive you to your exit. At that point you're directed out of the pattern and control is given back to you, where you take the side streets to wherever you were going.

It has the advantage that it covers the very common use case where someone says they can't take public transportation because they need mobility at their destination.

_lqaf · on Sept 2, 2021

I kind of agree, in that I think there's a great argument to ban them from anywhere pedestrians are found.

cratermoon · on Sept 2, 2021

Pedestrians and police cars, apparently https://edition.cnn.com/2021/08/30/business/tesla-crash-poli...

deelowe · on Sept 2, 2021

Is there not a space between "fully automated subway" and "automationless road" where practical solutions exist? Case in point, what if we changed the materials composition of road paint? Or, what if cars were designed to better alert other cars of their location and speed?

mandevil · on Sept 2, 2021

Are you going to attach these transmitters to every deer and every pedestrian? If not, we're back to 'elevated or buried road so we don't have to worry about animals and people- why not just build a train?' or 'AI which can solve the whole problem on its own' or 'If I recognize something unusual hand off back to a human, hope that they were paying attention and know what to do.'

And that still leaves aside tricky things like construction sites, double parked cars, etc.

Commercial airline autopilot has a lot more straightforward job: at normal operating altitudes there are no animals to worry about, no construction or other unexpected obstacles, and few other aircraft all of which can be essentially guaranteed to be squawking transponder codes of their own, and they still are mandated to have two pilots in them at all times. This is not because airlines like it- look how fast the flight engineer disappeared once flight management and navigation software became sufficient. It is instead because airlines have learned over the decades, from a lot of blood, that sometimes you need to have a pilot ready to intervene RIGHT THIS INSTANT, not 3 minutes from now when you've regained situational awareness after zoning out for a bit- and the AF447 disaster shows that they are right about that. And the only way to make sure you have such focus and alertness over many hours is to have two pilots, so they can trade off the responsibility of maintaining SA.

And that's a much easier problem which has been actively attacked for decades.

deelowe · on Sept 2, 2021

Are pedestrians and deer an issue today? I thought it was more specific than this.

coryrc · on Sept 2, 2021

Nobody poops in my car, busks, or gropes women. LMK when American governments can and do enforce minimal behavioural and safety standards in public.

coryrc · on Sept 2, 2021

Let's call street harassment what it is: an epidemic. According to a survey of subway riders conducted by the Office of the Manhattan Borough President in 2007, roughly two-thirds of respondents, mostly women, experienced sexual harassment on the subway; 69% reported feeling sexually threatened

https://www.mic.com/articles/120898/7-simple-ways-to-stop-se...

01100011 · on Sept 2, 2021

> instead of just fencing off the highways

Well, if you did that, you probably wouldn't even need AI for the control system. If you know the parameters and have some control over unknowns, you want a deterministic control system. "AI" will hopefully handle the cases where you are in an uncontrolled environment. Current "AI" doesn't seem quite up to the task of self-driving. It will probably get there, but when is anyone's guess.

cratermoon · on Sept 2, 2021

> you probably wouldn't even need AI for the control system

Wouldn't need it, no, but could use it to analyze traffic patterns and optimize flow for fuel efficiency, travel time, and safety. Those are much more constrained problems for which we have some useful tools.

Zababa · on Sept 2, 2021

As with most things, trying to automate everything is really hard because things are rarely built to be easily automated. And sometimes it's very hard to extract the easy to automate part from the whole. Driving is a very good example.

cratermoon · on Sept 2, 2021

So you're saying, "If we make up new measures that the AI can score well on, AI will score well"?

nickysielicki · on Sept 2, 2021

Sort of, yeah. What I'm mostly saying is, "We don't (yet) take measures that humans can't understand."

cratermoon · on Sept 2, 2021

What's the point of taking measures that humans can't understand? How can they be evaluated for correctness?

nickysielicki · on Sept 2, 2021

By simultaneously taking measures that humans can understand.

cratermoon · on Sept 2, 2021

So, they just become proxies, but then we have to figure out how good they are as proxies.

randcraw · on Sept 2, 2021

> It's the same problem I have with self-driving cars. We are teaching cars to behave around (and like) humans, instead of just fencing off the highways and inventing an autonomous system of vehicles ... to redesign existing systems around AI, instead of trying to replace the humans ...

Of course, fencing off existing roads to exclude human-driven cars is an economic impossibility, like converting all our electrical systems to use compressed water instead.

Regardless, today's autocars are clearly NOT ready to fly solo in the absence of humans, as the Tesla that crashed into an overturned truck on a clear day a few months ago so clearly demonstrated.

Hermitian909 · on Sept 2, 2021

I think the tight feedback loop for multiple scans is a good idea worth exploring, but the larger idea of building alternate systems has a problem that there's no clear path to a viable end-state.

With diagnoses, collating a consistent set of data is not a requirement you can impose because of how fractured the healthcare system is. If you had great AI systems in place maybe they could be convinced to invest in these systems, but you don't and so creating them is quite difficult.

Similarly, in the case of cars having their own highways and roads: how do we do this in a way that people find acceptable? If you prevent current highways from being driven on by normal cars, you cut off the majority of people from their daily commute which is a nonstarter. Making new highways in most places will also be a nonstarter for both cost and space reasons. At best you might hope that decades from now every single car has the means for self-driving in the limited environment and then one day the government turns it on, but that also feels iffy to me.

Building parallel systems incrementally creates harder technical problems but allows you to side step even harder coordination problems.

Saptarishi · on Sept 3, 2021

When AlphaGo went against Lee Sedol, it often made moves in the later stages of the game which were completely inexplicable to almost all game analysts and commentators, it seemed like a bad/stupid move, or at the very minimum, it seemed like a move which was hard to build upon. But it eventually won. I think we should let an AI model do its thing, even if it possibly seems absurd to traditional human understanding. The aim shouldnt be to make them like us(or better than us), it should rather be to make them the best form of themselves( which I hope will eventually be better than us )

ZeroGravitas · on Sept 3, 2021

Also, presumably xrays are harmful if done too often or too strongly, so being able to replace those with something less harmful (like some kind of 3-D sonogram) but that AI can interpret to the same degree would be a net benefit potentially becoming a home device for people with suseptability.

If you shift the curve you could detect earlier and reduce false positives.

256lie · on Sept 2, 2021

It's not an issue of resolution but of generalizability. Populations and scanners shift over time and the biggest issue in clinical AI is the changing data distribution, such as data acquired at different times at different institution. Medical devices (which AI software is considered) is also more regulated than self-driving cars.

toast0 · on Sept 2, 2021

> I think the problem with this is that the AI is being used on metrics and images and tests that have meaning to humans.

Who would submit to a test that nobody knows what it means? What ethics department would let you run that test?

Now, if you want to use metrics already on file that radiologists don't generally consider, that might be interesting.

ClumsyPilot · on Sept 3, 2021

"just fencing off the highways and inventing an autonomous system of vehicles"

And then they can platoon together to reduce air drag 4x, be electric and be built on a special surface to reduce friction 4x compared to 'normal' road?

I think that's called a train

macksd · on Sept 2, 2021

I was a bit confused by exactly what the percentage in the headline was, so here:

>> In two of the largest retrospective cohort studies of AI to replace radiologists in Europe (n=76 813 women), all AI systems were less accurate than consensus of two radiologists, and 34 of 36 AI systems were less accurate than a single reader

I'm still a little unclear how you measure the accuracy of a consensus of radiologists, without relying on the consensus of other radiologists. Maybe just a bigger consensus, or by using known end results but looking at imagery that could have been an early warning?

Areading314 · on Sept 2, 2021

The methodology in this paper compared the results of the radiology report to a follow up biopsy.

matchagaucho · on Sept 2, 2021

Or inversely, can this be interpreted as 2 of 36 AI systems are better than humans?

Therefore Hospitals should be investing in those 2 systems?

dbt00 · on Sept 2, 2021

Only if you started with a hypothesis that those two AIs were the best. If you did not, then you're just p hacking.

macksd · on Sept 2, 2021

Not necessarily. If you were running a casino and someone was able to significantly beat the odds on 34 of your 36 tables, would you assume they cheated at all other tables but not those two, or just that other players at those tables were really good?

It could be that 2 of those systems are actually better. It could also be that the state of the art in deep learning just isn't there yet for radiology. But this is a strong suggestion that research is being dramatically overstated, so I would get real careful with my investments in this area overall.

pessimizer · on Sept 2, 2021

Unless the next time you test, it's a different 2.

cratermoon · on Sept 2, 2021

Or that 2 of those systems got lucky, none of them actually work for their stated purpose, and investing in radiology AI is about as wise as investing in the Keely Motor.

dang · on Sept 3, 2021

We've since changed the title. The submitted title ("94% of 36 AI systems evaluated were less accurate than a single radiologist") broke the site guidelines by editorializing. Submitters: please don't do that.

"Please use the original title, unless it is misleading or linkbait; don't editorialize."

https://news.ycombinator.com/newsguidelines.html

honksillet · on Sept 2, 2021

Well, to train the AI you would be using radiologist reads anyways so I'm not sure that's really a problems.

cratermoon · on Sept 2, 2021

Oh yes, just that easy! Here, AI, he's some radiologist reads. Learn them, we're going live in Q1 2022.

michaelmrose · on Sept 2, 2021

You could simply see if the person had actually gotten cancer with the benefit of perspective.

qwertox · on Sept 2, 2021

Today I got my chest x-rayed to search for a broken rib. When we looked at the x-rays, I was wondering how the hell someone could recognize anything there. The doctor didn't find anything and just said if it still hurts in 3 weeks, we'll do another one, maybe we missed it.

Same last year, when I clearly broke a rib: first x-rays revealed nothing, one week later a second set revealed a really nice oblique displaced fracture, which was still hard to find.

Today the tibia also got x-rayed and it was a pleasure to look at, but as soon as it has to do with the chest, where all the organs are messing with the x-ray, one can really hope that AI will finally solve this issue, since every city has a couple of doctors with x-ray machines, contrary to MRI/CTs.

jcims · on Sept 2, 2021

Come on, most of us are just sitting around burnishing our fingertips on a small piece of glass. Any more details on how you wound up looking at an xray of your ribs twice in ~1 year?

qwertox · on Sept 2, 2021

I started mountain biking around 2 years ago. I think my front suspension is too soft for my weight, so both times the front wheel got in a trough(?) and the suspension gave in too much, making me fly over the handlebar. The tibia was because the bike landed on my leg after flying over me.

I'm reconsidering my life choices regarding MTB, since I'm already in my 40s and made zero exercise for at least 20 years.

But this time I learned. I will never risk it in the summer, since those sunny days are way to valuable for spending the time on the bike instead of recovering. I told me this last year, and this time it definitely sunk in. "Be careful, you're not a teenager".

But it makes so much fun. It is so good to breathe in that fresh air and to exercise hard, I always end up with a smile. Too bad I didn't start with this 20 years ago. I had the chance.

jcims · on Sept 3, 2021

Thank you!!! I'm in your same boat, maybe worse b/c I'm deep into my 40's, but just never really did much outdoor activity. When I was a kid there was all sorts of federal lands around that we could really roll around in however we liked. Where i'm at now it's a 40 minute drive to trails, so it's just streets and other people's property. :/

Just now getting set up to get more time outside . Sounds like i should take a lesson from your pain. :) Hope you get feeling better soon and get back out there (if maybe a tick lower kinetic energy :)

(also i meant to put a smiley in my first reply, sorry for coming across as a dick)

tsjq · on Sept 3, 2021

>one week later a second set revealed a really nice oblique displaced fracture

was that taken in different angle / perspective ?

908B64B197 · on Sept 2, 2021

Stage 0: "A computer will never beat a human at chess, the game is simply too complex"

Stage 1: "So one managed to do it, but 96% of implementations still can't"

Stage 2: "Well, looks like chess is pretty much dominated by computers. But the Go Game is still way too complex for a mere machine to tackle: only the human mind can"

Stage 3: "It was one tournament that AlphaGo won and really..."

Stage 4: "Yeah, the AI can play better Go and Chess than humans. These are solved problems."

I guess we're at stage 1 for self driving and apparently for mammograms as well. Only difference is, chess players didn't have a legalized cartel working for them in the 90's.

n8henrie · on Sept 2, 2021

I wonder how long it took for the written interpretation in each case? I work in a rural ER where the "stat" reads often take about an hour, sometimes up to 5 hours, and the quality is... such that I've gotten a lot of practice at doing my own reads.

Something that is "not as good as a radiologist but maybe better than me," and with results in minutes instead of hours, would be HUGE, if even to just prompt me to review the images myself before (instead of after) seeing the next patient.

xyzelement · on Sept 2, 2021

I actually have a bet with a radiologist that AI will be as good or better than radiologists at reading images in a fairly near future.

The reason I think this (as a non-specialist) is this: when I upload a random photo to Google or Facebook, those systems recognize the faces of people in those pictures without any prompt. This seems like a much more constrained problem (given a pretty specific set of images of chest cavities, state the probability of a cancerous tumor.)

I am guessing that there's nothing inherent in this problem that is much more difficult than other image recognition problems. I suspect that this outcome is because we have fairly new technology competing against highly specialized humans doing a very specific task and doing it well.

I suspect that will happen is that the technology will relatively quickly catch up to humans and do equally well, after which point it's just a speed of response and economics question, where the technology wins over the person.

mmanulis · on Sept 2, 2021

I'm with @greazy on this one. The assumption made is that a set of labeled images is enough. Was the training set validated against a diagnosis based on biopsy of some kind? I.e. the images that were trained on, marked by some radiologist (more likely multiple radiologists), how were they validated to contain cancer or be cancer free? Did someone follow up a few years later and confirm the diagnosis was correct?

Also, different modalities will produce different quality images, how was that accounted for in training models? Did they use all the images for a single scan or a subset of the images of a single scan?

The problem is you're trying to train a model where you have many images for a single scan, like slices. Depending on the modality, you'll get different resolutions, different "visual" inclusions, etc. etc. So labeling the individual images and labeling the entire collection is really hard.

greazy · on Sept 2, 2021

I'll take that bet against you.

> I am guessing that there's nothing inherent in this problem that is much more difficult than other image recognition problems.

This is factually wrong. You're assumption I guess is that its a data issue: if we can just get more data we'd solve this issue.

The reality is that these diseases are complex, their presentation is complex, and compounded by different technology. There's also the issue of data complexity, x-rays contain less information compared to a picture of your dog.

We need better algos, not more data.

colinmhayes · on Sept 2, 2021

Human faces are complex too. Yet computer vision has no trouble recognizing you out of the billions that exist.

greazy · on Sept 3, 2021

You've misunderstood my point I think. How a disease presents is different issue than the complexity of faces. A photo of a face has more information encoded than an x-ray image. This lack of information is part of the problem that I think more data doesn't solve and why humans are able to identify cancer. We have better image recognition 'software'. And that's my point, better algos, not more data

inshadows · on Sept 3, 2021

What sort of better algos?

rcarmo · on Sept 2, 2021

There are two sides to this: enhance human decision (by accelerating it) or try replacing it.

I played around with a number of image segmentation models a couple of years back and I can only imagine how much more subtle densitometry stuff can be, but overall I'm not surprised accuracy can be way off because _any model_ can be way off depending on what it's presented with.

Most commercial Machine Learning doesn't actually learn on the job, and is actually positioned for the triage/faster handling scenario. It will spot what it was trained for, but _most likely only in the conditions it was trained for_. Change the angle, change the patient's age, add any other visible medical conditions, etc., and ratios are likely to drop.

Then again, maybe the study could be a little more systematic (as other comments point out). I'd like to see larger numbers, if only because it would help statistic significance.

karmasimida · on Sept 2, 2021

But they are probably like 90% cheaper, and 99% faster than a radiologist, not even going to talk about the throughput.

It seems really good news to me actually.

jacquesm · on Sept 2, 2021

If only it was as simple as that. In medicine a wrong diagnosis can have very bad real world consequences so 90% cheaper and 99% faster has a very real possibility of being a net negative.

TameAntelope · on Sept 2, 2021

There are lots of ways of lightening the load with AI aid, though without fully removing the need for human eyes.

jacquesm · on Sept 2, 2021

Yes, and these are typically the best systems, where the doctor is assisted rather than replaced. I've looked at several over the last years and the comparison doctor+augmentation tool to doctor alone and tool alone is invariably superior. For now I think that is the most productive and the safest route.