Hacker News new | past | comments | ask | show | jobs | submit login
Deep interpolation – denoising data by removing independent noise (github.com/alleninstitute)
65 points by gtsnexp on Oct 18, 2020 | hide | past | favorite | 28 comments



Interesting. When I was in high school, I wrote some software to denoise datasets with a genetic algorithm (it's what we had before deep learning was a thing ;). I showed it to my math teachers and they were horrified; they told me that you cannot change data once it's been collected.

I realize now that their take is kind of wrong -- pretty much every instrument in the world performs filtering on what it samples before it presents you a number. Saying you can't filter data kind of misses the point, because almost every real-world system is bandwidth limited and acts as a low-pass filter. But sometimes there are different noise sources, and you can probably write an algorithm to attack them without being "inaccurate". (For example, ever measure very low voltages at relatively high frequency? You'll see a nice 60Hz component in there, because you are surrounded by a 60Hz electric field. That's noise, not data. Apparently averaging through multiple power line cycles is fine, but writing a program to do the same thing is wrong. Seems weird to me.)

Anyway, I'm still bitter about it. (That, and how in 4th grade I got a C in science because we had to make a flipbook about earthquakes that had to be titled "quakin' shakin' earth" and I spelled "quakin'" and "shakin'" wrong. Still don't know how to spell either of those contractions, or what flipbooks have to do with science. They don't come up in real life much. But I sure am bitter, 26 years later.)


All of what you said was true. Scientists try and find "original sources" for data, but they're usually analyzing measurements that have been altered in some way before they receive them... usually in the measurement hardware itself. One can't really avoid it completely. The flip side is that noise can actually be helpful, especially in situations where the signal itself is too weak to reach some detectable threshold on its own.

I generally try hard to remove bias/noise from data before analyzing it, but I also add noise back in later on... e.g. jittering data for plots, or putting a noise feature into a model to establish a threshold for useless features. I'm also a bit miffed that I had to discover this all on my own... Noise is useful and interesting! In business-oriented analysis, noise can be even more important, since changes in "data noise" can indicate important changes in business patterns, or problems with how they're collecting data (companies seldom share this information with analysts beforehand). I'd go so far as to say that understanding the noise in a dataset is the primary concern for most of my analyses these days.

Here's a good book with more pop-sci noise examples: https://www.amazon.com/Noise-Bart-Kosko/dp/0670034959


Regarding power supply noise, specifically, the easiest way to get rid of it is pretty simple: use a battery instead of AC.


The wires in walls provide some of the noise. Your body, when you hold the oscilloscope probes, work as an antenna for them.

The real remedy is using a differential input that cancels out identical noises induced in both wires of a pair.


Any education before college is training, not teaching. They train you, like a dog, to pass a set of exams. Learning something in the process is irrelevant for the "teachers".


Not really. I should have gone to the physics den instead of the math office to show it off, though.

In math, there is no uncertainty. The real world has annoying quirks, however. (When I was in college, they built a new engineering laboratory complex, and everyone moved to it. Some people doing laser experiments had to go back to the old building, though, because the new one vibrated too much.)


Reminds me of a story a friend told me, about how their university management wanted to save money and shut the AC off during holiday break. This almost destroyed a $1M research laser; after the break, the researchers came back to their lab, only to discover the air inside is just a little short of hitting the dew point and covering all the expensive optics in water.

(AFAIK that same team also had a problem when moving to a new building, because it wasn't constructed level enough.)


Isn’t this just a specific application of an autoencoder though? Because random noise isn’t learnable, it gets filtered out if you teach the network to compress/decompress a sample with itself as target.

https://blog.keras.io/building-autoencoders-in-keras.html

With that said, it’s amazing how effective this concept is for cleaning up scientific data. Many have also used variational autoencoders to take it one step further and also cluster data by latent space features. I’ve used it myself to uncover various groups of behaviors in time-series from other cellular processes.


I think if you take "just" out of your comment you're spot on. :) Also, I know what you mean but for completeness, random noise is learnable, we do this all the time. You can identify the onset and offset not just of noise but different types of noise. If you mix blue and brown noise and listen to it for a while and then drop one of the noise channels you definitely notice it. You aren't learning the specific values of any part of the noise (as you say this is unpredictable) but the frequency components are consistent enough that we constantly subtract them from our sensory streams all the time.

Noise tolerance is one of the most fascinating aspects of biological intelligence IMHO :)


> Also, I know what you mean but for completeness, random noise is learnable, we do this all the time.

You are absolutely correct. The fact that brains learn randomness is _mind blowing_, and it has immense implications for our perception of reality.

Let me take fellow HNers on a journey. Consider the following snippets about human vision:

> However, there’s very little connectivity between the retina and the visual cortex. For a visual area roughly one-quarter the size of a full moon, there are only about 10 nerve cells connecting the retina to the visual cortex. These cells make up the LGN, or lateral geniculate nucleus, the only pathway through which visual information travels from the outside world into the brain.

> “But the brain doesn’t take a picture, the retina does, and the information passed from the retina to the visual cortex is sparse.”

> While the cortex and the retina are connected by relatively few neurons, the cortex itself is dense with nerve cells. For every 10 LGN neurons that snake back from the retina, there are 4,000 neurons in just the initial “input layer” of the visual cortex — and many more in the rest of it.

Source: https://getpocket.com/explore/item/a-mathematical-model-unlo...

> However, the retina only accounts for a small percentage of LGN input. As much as 95% of input in the LGN comes from the visual cortex, superior colliculus, pretectum, thalamic reticular nuclei, and local LGN interneurons.

Source: https://en.wikipedia.org/wiki/Lateral_geniculate_nucleus

In other words, contrary to popular belief, there are scarce few connections between your brain and your retinas. Our retinas may themselves have immense resolution, but the bandwidth of the connection between our retinas and our visual cortex is the neural equivalent of dial-up internet.

Very little "real" information is being passed up to the brain. The implication is that what we "see", or rather what we perceive as vision, is almost entirely fabricated. We don't directly see the information coming from our eyes. Instead, we only see what our brain predicted we would see. The limited information coming from our retinas is, most likely, only used to verify and adjust our brain's internal fantasy.

Now to bring this back on topic and blow some minds. I'm sure everyone has looked at static on a TV before, right? Black and white noise. But wait ... we know our vision doesn't have the bandwidth to actually see that noise. Noise is, after all, pure entropy, incompressible, unpredictable. Yet we _do_ see it. We do perceive it. Therefore, that noise that _you_ see, it's completely faked by your brain. The noise you perceive is different from the noise someone else looking at the same screen might perceive.

Weird, right? The only reason this isn't common knowledge is ... how would you know? Take two people and sit them in front of the same static filled TV. How would they feasibly compare their perceptions? How would they know they aren't seeing the same thing? We can't faithfully memorize and draw the static we see. There's no way to compare our perceptions of noise. So it's no wonder that this "glitch" of human perception isn't commonly known.

This of course only occurs with very noisy situations. A coin flip is also random, but it's only a single bit of information, so it's easy for our brains to quickly adjust their model to reality and absorb that bit. That's true of a lot of our world; our brains spend decades perfecting the ability to predict the world, and in so doing requiring as few bits as possible from our senses to correct mispredictions.

Further proof/mind blowingness: There are recorded medical cases of people encountering sudden, complete damage to their vision, yet having absolutely no clue that such a thing has occurred. I'm talking things like their optic nerves completely failing, so that there is absolutely 0 visual information coming into your brain. Yet they act as if they have perfect sight. For a little while at least. (NOTE: This differs from blind-sight, a related but different phenomena where there is still some connection between your retinas and brain). Hell, even my own eye doctor has seen patients who went completely blind in one eye without ever noticing (until he told them so).

What's more, it's likely all of our senses are this way. The totality of your experience is top-down, derived primarily from the internal model of the world your brain has built, with your physical senses acting more like error inputs than the bottom-up inputs we believe them to be. It's why hallucinations are so convincing. It's why we're able to have vivid dreams.


We should remember that there is a lot of computing going on in the retina already, and that rod cells of the retina are so sensitive that they can detect even single photons [1]. The density of cells also depends on the location of the FoV that we are talking about. We have many more connections and computation power reserved for the fovea than for our peripheral vision.

With that said, I'm pretty sure that our perception of noise is not as dissimilar as you suggest. By doing the simple experiment of freezing on a static noise frame and asking participants to tell me whether a pixel was black or white, I'd expect most people to get it right.

https://www.cns.nyu.edu/csh/csh04/Articles/Rieke1998.pdf


> With that said, I'm pretty sure that our perception of noise is not as dissimilar as you suggest. By doing the simple experiment of freezing on a static noise frame and asking participants to tell me whether a pixel was black or white, I'd expect most people to get it right.

Right, my suggestion was that the phenomena would occur if the noise exceeded our visual "bandwidth". If you freeze frame then you're giving your mental model enough time to "catch up" with reality, so to speak.


The difficultly here is that there's an implicit assumption: 'Noise' is implicitly defined to be anything that isn't learned by the network.

Now in same cases that may indeed be actual process noise! But there are many fully predictable data series that are difficult for deep networks to learn and in those cases actual signal will be silently removed.


Exactly!

Also, this broad approach of learning noise vs signal during interpolation is well known in the geostats world. There, you typically accomplish it by allowing a non zero "nugget" when you fit the variogram. However, folks in that community often prefer removing noise in post-processing despite having a robust statistical method to interpolate only the signal and not the noise. The reason is that the noise vs signal decision is highly application dependent, and it's easier to post-process differently than it is to run full solutions differently.

In other words, gaussian processes are also often used to interpolate signal and noise separately. Folks on that side are often very hesitant to do so, despite having well-established methods for it.

It's easy to remove things that are real and also easy to overfit noise. The same dataset needs different signal vs noise classifications depending on the end use case.

Put in other terms: One person's noise is another person's signal.


Regarding the license:

> Allen Institute Software License – This software license is the 2-clause BSD license plus clause a third clause that prohibits redistribution and use for commercial purposes without further permission.

I hadn't seen this before, If I am interpreting it correctly, it is basically saying "Look but don't touch, except you can run it for verification/reproducibility purposes". Right?

I suspect that even a non-profit, running this code or a derivative of it on their own servers would run afoul of the additional restrictions.

This means that a researcher that uses any of this code or it's derivatives in their own research is planting a huge downstream landmine into their own code.


I think this is the related paper: Removing independent noise in systems neuroscience data using DeepInterpolation

https://www.biorxiv.org/content/10.1101/2020.10.15.341602v1....


It's a bit bizarre that they didn't even link to their own paper from the github repo



As someone who has worked on "both" interpolations and image compression in a geoscientific context, the issue is specifically what is the model for the signal. If the model is accurate enough, then yes, removing independent noise is of great value. In my worldview, the big problem is what is the physics behind the signal, and where precisely does it change from one characteristic class to another. Those are the hard problems.


Indeed, to apply any form of "denoising" is to presuppose what the "true" underlying sample looks like. All of these strategies that people like to apply, whether they know it or not, directly impose a model onto the initial dataset.

Once measurements are digitized, the uncertainty within the dataset is baked in and cannot be removed. Without external information, the best estimation of the distribution from which the data are drawn may be found within the data themselves.


Allen Institute Software License – This software license is the 2-clause BSD license plus clause a third clause that prohibits redistribution and use for commercial purposes without further permission.

That's a big pity.


https://github.com/AllenInstitute/deepinterpolation/blob/725...

Is it really that hard to use relative paths? This kind of messy programming is why it's so hard to reproduce so many scientific research papers.


Just use a simple linear non-causal filter, i.e. low pass filter forward in time and then backwards in time. Do this for any desired bandwidth and iterate as many times as you want. No need for neural networks in here.


Allen Institute produces many nice things.

One thing I don't quite understand about this project is, what kind of noise it removes? Any kind of noise from any kind of dataset?

I wish there was a more beginners like explanation on the README.


Their preprint is here: https://www.biorxiv.org/content/10.1101/2020.10.15.341602v1....

On elephys data, they try to remove thermal and shot noise.


Noise in question is present in experimental images obtained from rodent and human brain. The nature of this noise is quite diverse, part of it comes from blood circulation, part from movement, part from camera/photo-multiplier noise.


Is it possible that it is learning a kalman filter indirectly ?


Interesting development by the Allen Brain Institute.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: