Hacker News new | past | comments | ask | show | jobs | submit login
FastMRI initiative releases neuroimaging data set (facebook.com)
125 points by moneil971 on Dec 12, 2019 | hide | past | favorite | 59 comments



> Apply for Access

> The application process includes acceptance of the Data Sharing Agreement (found below) and submission of an online application form. The application must include the investigator’s institutional affiliation and the proposed uses of the data. NYU fastMRI data may be used for internal research or educational purposes only as described in the data use agreement and may not be redistributed in any way without prior permission.

> Read and agree to the data use agreement below to apply for access.

Seriously?!?!?!?!?! You call that open source?!??!?! OK, let's leave the semantics aside for a moment. Yes, a dataset like that would be very interesting and I would happily play around with it for a few weeks and see what I can come up with. If it's something worth exploring further, I'll happily document it and open source it. But that isn't something I can really estimate without being to explore the data and see first hand what it is. For that reason alone, myself (and dozens off the top of my head) will roll their eyes and pretend like it doesn't exist.

False claims, over-hyping with no real understanding and bureaucratic crap like this is what is slowing everyone and everything down, the sooner people understand it, the better.


>Yes, a dataset like that would be very interesting and I would happily play around with it for a few weeks and see what I can come up with.

Spare-time tinkerers aren't really the intended audience here:

>>The neuro data set will allow researchers to test their models with data from additional machine types, new sequence types, and different coil configurations that were not present in the previously released fastMRI knee data set. Radiologists also look for different diagnostic properties (such as contrast in texture between different neural tissue) in brain MRIs. These differences present an interesting and challenging machine learning problem to solve and will help researchers develop models that generalize to more clinical settings.

It's unlikely in the extreme that anyone in that audience will be stopped by a simple data sharing agreement. And it's also unlikely in the extreme that anyone outside that audience will know what to do with a bunch of raw k-space MR datasets. Domain knowledge is an absolute necessity with this data.


Domain knowledge, while useful is not a necessity - I'm fine with contacting a specific domain expert for help and even pay for his or her services(with my own money at that), though none of the ones I've ever contacted ever wanted anything for the time they spent on a problem I presented them with. To put it as question: do I really need to explain how open source works in theory and practice on a place like hackernews? Especially given that most of the world's infrastructure in practice runs thanks to the collaboration of millions who have built most of what we use in their "spare time" as you call it.

Same applies to DNN's(if not more so) - take any large DNN with the papers, data set and code to the author and ask them to give an explanation as to why it works as well as it does while it performs terribly on a different data set, even a similar one. "Well yeah, it's curve fitting which works here but doesn't work there". Why? ¯\_(ツ)_/¯

My rant concerns a different problem - if you have such data and you want to share it, just go ahead and do. 1 of every 10000 might do something useful with it but we aren't talking about nuclear experiments where something can blow up, are we? Worst case scenario someone's cpu or gpu might overheat, big deal. Just ditch the entire bureaucracy crap, we have enough of that as it is in our daily lives.


This is human subject data and there are good reasons to set legal limits on how that data is used, even if anonymized. Your "rant" is not well informed.


>Domain knowledge, while useful is not a necessity - I'm fine with contacting a specific domain expert for help and even pay for his or her services(with my own money at that),

Nah, it's the other way around. Specifically in this domain, innovation is definitely driven by domain experts (researchers, medical equipment manufacturers), and they contract out or hire technical expertise as they need it.

>To put it as question: do I really need to explain how open source works in theory and practice on a place like hackernews?

Your definition of open source is much too narrow and does not include what these researchers meant by it.

>Especially given that most of the world's infrastructure in practice runs thanks to the collaboration of millions who have built most of what we use in their "spare time" as you call it.

Certainly not in medicine.

>Same applies to DNN's(if not more so) - take any large DNN with the papers, data set and code to the author and ask them to give an explanation as to why it works as well as it does while it performs terribly on a different data set, even a similar one. "Well yeah, it's curve fitting which works here but doesn't work there". Why? ¯\_(ツ)_/¯

You've identified the precise reason why all this "machine learning" stuff has been such a dud when it comes to medicine. That is not good enough and never will be, particularly the "works so well on the dataset you overfitted on while performing so terribly on other datasets".

>My rant concerns a different problem - if you have such data and you want to share it, just go ahead and do. 1 of every 10000 might do something useful with it but we aren't talking about nuclear experiments where something can blow up, are we? Worst case scenario someone's cpu or gpu might overheat, big deal. Just ditch the entire bureaucracy crap, we have enough of that as it is in our daily lives.

They spell out the reasons for requiring a data sharing agreement in the data sharing agreement itself. They want you to cite it, they don't want you to sell it, they want you to confirm that you understand you're getting a dataset with no warranties, FDA approvals, etc.

And as I said before, the chances of you doing something useful with it may be 1/10000, but the chances of you doing something useful with it while not being motivated enough to sign a simple agreement and wait for the download link is essentially 0.


These are technically medical datasets. Unlike the weather, there are serious privacy laws in play here.


Though there is something to your claim, the fact that it's been made "public" suggests that participants have signed a consent form(possibly multiple).

This is not an isolated case - large portion of those "open source" data sets are "Please sign up, send us an email and mail a DNA sample". And I'm talking about data sets which hold 0.0 personal or classified information.


I will reiterate my comment on these types of projects. The regulatory pathway established by the fda for these types of products is woefully inadequate and they are very very hard to properly validate.

I think any application of deep convolutional neural networks should be alongside a radiologist. If we speed up scans and make up for it with convnets it is very hard (practically speaking: impossible) to properly validate that they will not hallucinate away rare abnormalities. It will also be impossible for radiologists your spot errors like this in the wild because of the reduction in quality of the scan.

What happens when the scanners change their behavior in some subtle way that is unaccounted for by FastMRI? It could start erasing a ton of subtle abnormalities and this would not be possible to check for since the original scan will be lower quality.


>It could start erasing a ton of subtle abnormalities and this would not be possible to check for since the original scan will be lower quality.

Radiologists are notoriously conservative for that very reason. Dreamy-eyed image processing and computer vision researchers have been trying to get radiologists to abandon some of their caution for many decades. All in vain. "Hard pass", they say most of the time. The dream usually doesn't get as far as their malpractice insurers, but it certainly dies there if it makes it that far.


> I think any application of deep convolutional neural networks should be alongside a radiologist. If we speed up scans and make up for it with convnets it is very hard (practically speaking: impossible) to properly validate that they will not hallucinate away rare abnormalities. It will also be impossible for radiologists your spot errors like this in the wild because of the reduction in quality of the scan.

The benchmark for this technology is not perfection. The benchmark is human radiologists. Yes, this technology will miss things, so do humans. But if it's performing better than the humans we should prefer it, even if it's not perfect.


>"The benchmark is human radiologists."

I think you have misunderstood the application, possibly because of the way the parent framed it.

This is not a project to interpret MRI data, it is a project to apply ML to accelerated scanning, i.e. inferring data that is not actually measured.

So it's a real problem, if a systematic bias attenuates some signals that would be interesting, there will be nothing there for a radiologist (or other ML system) to perform on.

Think of this as more of a "algorithmic super-resolution" approach.


I would probably use the term "data-driven machine hallucination" - which is pretty awesome. Though, I can see why radiologists would be wary of such an approach.


I will not say i understand it, but when an MRI aquires an image it is a rich k-space dataset. This data is then reduced to a single number (ie intensity) for each voxel in a MR image. Usually the k-space data is discarded after it is used to predict the voxel's intensity.


This is already theoretically a bit of a problem with conventional acceleration, which is arguably manageable through understanding of the physics.

I agree the validation and regulatory path is really problematic for less common presentations.

It's probably more interesting to use for artifact detection / QC issues, especially when you do it quickly enough to initiate a re-scan. More problematically but still interesting to do artifact removal. Acceleration is tricky though, for the reasons you mention.


The theoretical problems with deep convnets are far worse than with traditional reconstruction algorithms. They have a far greater capacity to hallucinate or have unexpected behavior on new inputs (see GANS, or the imagenet-c dataset where by simply adding a little motion blur they nuked resnet performance)


We are agreeing, I think.


Got it! I’ve heard points similar to yours raised in defense of these sorts of projects. “What about sparse reconstruction why aren’t you worried about that”


I've come across this issue in some tools in neuroimaging which allow you to make sub-voxel super-resolution volumetric inferences based on a trained model. This is all fine and good if you experimetal population matches the training population, but that is rarely the case in disease/disorder/developmental neuroscience. If you cant see it in your data then its only possible if the assumptions hold.


most places call this type of thing "clinical decision support". nobody in their right mind wants to remove the human doctors from the process... yet.


And yet, for the reasons I stated, this is exactly what FastMRI aims to do. Speed up the scan. There will be no way for Radiologists to oversee the reconstruction and make sure subtle abnormalities are preserved.


Ideally a proper DNN reconstruction would learn the mapping from the raw-space to image-space. See, for example: https://www.nature.com/articles/nature25988 .

There is just too much redundancy in MRI data, and initiatives such as FastMRI are fundamental for us to learn what the limits are of feasible acceleration. Also, some MRI scans take forever and cannot be used in vulnerable populations because of, e.g., breath holds, the need to stand still, etc. The image quality, perhaps counter-intuitively, in some situations improves with acceleration.


Can you explain why mapping from raw space gets rid of any of the concerns I raised?

It’s interesting research for sure. I hope it stays far away from actual clinical use for a while, for the reasons I highlighted. I’d like to see convnets work alongside radiologists for a while and prove robustness to dcanner changes in the wild before we start shoving them deep in the stack where radiologists can’t review what’s happening.


Radiologists don’t usually oversee reconstruction currently even when it’s real-time (whatever that means in MR, generally it means immediately after acquisition).


Exactly and that’s why this is not a good first place for convnets to be used in the workflow of a radiologist. They should be working alongside the radiologist, not somewhere where it’s impossible for the radiologist to examine what happened.


I think the point is that radiologists and technicians never see thr kspace data, just the image reconstruction of each voxel's intensity.

Such that i am not sure the risks are the same as say convolution nets reconstructing large brain structures.


Yes we do (I’m and MR tech) and it’s a part of trouble shooting on GE scanners. There is also a weird bug on a release I use of the Philips platform that allows a visualisation too. Working with the data in this form is not an everyday thing but it is useful.

K-space data is also saved for reconstructions and processing later on, though everyone prefers to avoid that as it’s horrible and lots of storage is required.

I’ve also worked at a university site where the raw data was collected and used on a daily basis, but that is presumably less common.


I meant radiologist in the sense of a physician which inspects MR images for abnormalities.


I'd even want to use similar tech on a personal level. I can't do a brain scan myself, but I can take a picture of a mole. I'm at high-risk for skin cancer, but I honestly wouldn't go see a doctor just for a change for a skin lesion that I'm not sure about. There's too many, and even with an above-average knowledge of what to look for, I have little confidence in my own observation. But if a neural network could tell me it's a higher risk, I'd for sure not wait until my next physical. Gotta be careful people who don't understand the risk & statistics don't depend on negative diagnoses too heavily, though.


I think you’ve misunderstood. This is not for diagnosis. This is so that they can run a crappier lower quality scan and then “enhance” it using neural nets.


No, I'm replying to the parent comment which is about critical decision support vs. removing doctor's from the decision process. I don't believe they were commenting on FastMRI vs. full MRI.


Haha "yet". Removing human doctors from the process is sci-fi, and not really 'hard' sci-fi either, as it stands now. Lawyers and CEOs will be far easier to automate.

Rather than dreaming about that, the focus should definitely be on "clinical decision support", i.e. "something useful that will save a radiologist some time and won't just get in the way". Not too many examples of that exist right now. Even speech-to-text is not a solved problem in their domain.


Current MR physicist / data scientist here. There seems to be a lot of misapprehension in this thread.

First, this work is about taking data in the sensor domain ("k-space") and reconstructing it into an image. Doing this with partial k-space data and hand-coded heuristics is a completely standard part of the MRI research agenda and has been for quite some time. See, for example, http://mriquestions.com/k-space-trajectories.html. Further, several of these techniques have already made it into routine clinical work, and this acquisition-side stuff generally happens before the radiologist even sees the image (reliable acquisition is in the interaction of radiographer with the scanner manufacturer's software).

There's also various claims here that seem to imply learned reconstruction inherently implies the risk of hallucinations without recourse. Naturally, one should be careful about this, but it's just a matter of careful cross validation: hold out examples of abnormal anatomy for the test set. There's other ways to attack this problem too: training can be done partly or mostly on synthetic data because we have reasonably good forward models of the physics. In this case, one could choose a wide variety of arbitrary synthetic anatomies during training, to further ally the fear of always hallucinating the "typical human brain" from any scan.

Slow acquisition and image artifacts in MRI are a fact of life for people in the field and I believe there's huge scope for improvement if we had more intelligent reconstruction and acquisition. Ideally the reconstruction would feed dynamically back into the acquisition to gather more context as necessary; the MR machine is, after all, one giant programmable physics experiment. This is already done in a limited way, but in what I've seen it relies on a lot of hand-coded heuristics. And guess what's the logical step after hand-coded heuristics? Yes, learned models where you objectively optimize for a final result, rather than hand-coding based on a few examples.

Final note - publicly releasing human data is a massive effort in data cleaning and careful anonymization. Not to mention that the acquisition of each sample is extraordinarily expensive. So bravo to these guys for going to the effort.


This is a typical misapplication of machine learning. It's important to realize what information can possibly be learned by a trained network. In this case, the only thing that can be learned is which components of Fourier space can be ignored for a given imaging problem. Such a question is far more rationally addressed by a deterministic algorithm, if the goal is to speed up acquisition for any specific anatomy. But to generally assume that an imaging protocol can ignore parts of frequency/phase space without generating artifacts is not only wrong, but very dangerous for patients. I can easily play with Fourier space and generate the appearance of pathological conditions that don't exist -- or disappear ones that do! Not good!


If it is so easy, as you say, then if you can release examples where you do that using their training data, then you would probably get some acclaim, and I personally would applaud your effort.


I'll spend time on this if I can, but for the moment just imagine that in frequency space, where MRI acquisition occurs, all I need to do to blank out an imaging anomaly is discard spatial frequencies that correspond to the principal spatial frequencies of the anomaly. This is analogous to blurring your speech by notch filtering the principal frequencies of your voice. With appropriate filtering, I can make your voice nearly indistinct from background noise, so that it appears not be there.


As someone with an undiagnosed neurological illness, it would be fun if I could run my MRI backups through it.


I believe the main aim of this project is to accelerate the imaging process itself and not diagnostics.

It's a natural research direction to accelerate imaging; MRI is fundamentally limited in its data acquisition rate. You are essentially giving protons a wack into an excited state and extracting spatial information by listening to how they decay. There is a time constant associated with this decay that just comes from the physics and can't be reduced, and you have to wait for the decay to be more or less complete before you can give another wack and acquire another slice.

MRI already relies on classical compressed sensing to overcome this limitation and make imaging feasible at all, but there is good reason to think that you can do better than classical compressed sensing by making use of prior knowledge of the physics of the imaging process and the structure of natural tissue.


I've been thinking about this type of use case recently. I'm curious, how many MRI scans do you have? How do you store and view them? Have you applied any computer vision techniques to them?

Thanks!


Sorry for the delayed response. I have multiple MRIs of my brain and the entirety of my spine, most of them at 3T. I got them stored as ISO copies of the CD they made for me. I have no idea where to begin by applying computer vision. If you can point me in that direction, I would be happy to try.

EDIT: As for viewing, they come with a proprietary viewer loaded on the CD. I do have the ability to export them as JPG image stacks though.


Thanks for the info! Regarding using computer vision, the question to ask is: what do you want to know about the content of your scans? For example, if you have multiple scans over time, do you want to see how the structures are changing? Or if it's just one or two points in time, do you want to know the names of the structures?

I've been working in medical imaging and deep learning for years, and have recently become disillusioned by the technology's potential to disrupt the radiology industry. But I wonder if there aren't alternative use cases for e.g. educational purposes. I'd love to know more about what you wish you could do or know! Please feel free to email me.


How do you know you have an illness if it's undiagnosed?


When you have chronic (years) of decreased sensation in the limbs, coordination and balance problems, half of the throat paralyzed, constantly pissing oneself, cognitive issues, etc, and they can't find a reason for it then it's an illness with verifiable symptoms but has not been diagnosed.

People with spine issues, MS, Lyme, and Lupus for example usually refer to it as "limbo land". Neurological diseases are notoriously difficult to diagnose and often go without an official diagnoses for months, years, and even decades.


Fantastic!! Accelerating MRI with ML is an idea I've had in my little idea book for years and I'm delighted to see it getting some mainstream attention!

It's a serious technical challenge but the benefits could be enormous.

IIRC the vast majority of the cost of an MRI is the amortized cost of the imager, so faster scans should hopefully directly reduce the cost to patients, perhaps to the point that regular full-body MRI scans for preventative healthcare could be feasible.


> perhaps to the point that regular full-body MRI scans for preventative healthcare could be feasible.

This is an interesting problem. Speaking with a number of physicians among the family, there's a perspective that having a population performing all these tests can possibly cause more problems than they solve. If the goal is to holistically make a person healthy and happy, discovering diagnoses that have no practical effects on someone's well-being can result in reducing their well-being just by knowing about it. Humans are notoriously bad this way. Tell someone their liver is somewhat different from an average adult liver and they'll start assigning symptoms to it.


It’s worse than this. Unnecessary biopsies, extra tests with associated risks and misdiagnosis etc are all part of any screening procedure. Whole body MR scanning also lacks the resolution and quality of dedicated body site scans. As with everything, if you ask a good question you get a better answer. ‘Is there a problem in this person?’ Is not a good question.


I don't know, that line of thought seems myopic to me.

I'm more thinking, wow you could really do a lot in the way of automated diagnoses if you had longitudinal data sets like that.


Yeah. Certainly not saying it's an open and shut case on the right way to test. Obviously we do whole-population preventative medicine for many things.


"IIRC the vast majority of the cost of an MRI is the amortized cost of the imager"

This is certainly false for clinical MR imaging in Canada. The biggest component is the cost of interpreting the scan, i.e. the radiologist's time.

"regular full-body MRI scans for preventative healthcare could be feasible."

What's the point? What are you looking for? What is the false positive rate on that? Screening for the sake of screening is a mirage[0].

[0] https://sciencebasedmedicine.org/a-skeptical-look-at-screeni...


Would they be filling in missing data with AI? Would they be doing something similar to DeepFovea? If so, I would be concerned about accuracy.

https://ai.facebook.com/blog/deepfovea-using-deep-learning-f...


Is this accessible only to people affiliated with some research institution as it seems[1]?

[1] https://fastmri.med.nyu.edu/


No, it's available for all researchers, and you don't need to have a certain affiliation. Thanks!


All 'researchers' or everyone?


More details here: https://fastmri.med.nyu.edu/


This was a very simple question and instead of answering it directly you linked a page with 2000 words of disclaimers and license agreements.


From TFA: "The application must include the investigator’s institutional affiliation and the proposed uses of the data. NYU fastMRI data may be used for internal research or educational purposes only as described in the data use agreement and may not be redistributed in any way without prior permission."

Take 10s to read yourself instead of complaining about others not reading for you. It's at the top of the page currently, under a heading, "Apply for Access".


Huh, I was part of a study during my brain MRIs at NYU Lagone. I wonder if my brain is in there.


... are these people aware these images of their bodies are publicly available?


Was this at NEURIPS?


Get ready to learn to code diagnostic radiologists.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: