Hacker News new | past | comments | ask | show | jobs | submit login
Using Deep Learning to Predict the Olfactory Properties of Molecules (googleblog.com)
140 points by dyslexit on Oct 25, 2019 | hide | past | favorite | 39 comments



Summary:

- They created an entirely new dataset of ~5000 molecules, all hand-labeled by perfume experts.

- They held a competition (presumably Kaggle or a similar platform) to classify this dataset, and used the results as a strong baseline.

- Their GNNs get comparable (slightly better, but not statistically significant) results than the winning random forest model of the competition.

The embeddings show promise, but I'm curious why they omitted a simple "fully connected layer" on the Morgan bit descriptors as a baseline classifier. Seems like that would outperform the random forest.


The baseline isn't that strong tbh, it's from a very small competition from a long time ago. https://www.synapse.org/#!Synapse:syn2811262/wiki/78388


So how well does the network perform on unseen molecules? Does olfactory data allow extrapolation?


Graph convolutional networks are really cool and widely applicable. The fastest intro to the field me was actually not a paper or blog post, but the docs of Pytorch Geometric. The definition of their message passing framework [0] gets you to the right frame of mind, after which there are well documented and cited implementations of various papers which you can reuse [1].

[0] https://pytorch-geometric.readthedocs.io/en/latest/notes/cre...

[1] https://pytorch-geometric.readthedocs.io/en/latest/modules/n...


Neat. Can you point to any particularly compelling applications? I'm looking into a graph representation for something myself and this looks incredibly helpful.


The applications that speak to me most are those involving predicting properties of molecules, and also properties of biochemical networks, though I appreciate that’s not what many others would find compelling! Sorry not to be of more help.


>>> it should be possible to directly predict the end sensory result of an input molecule, even without knowing the intricate details of all the systems involved

Maybe we're missing the most interesting aspect. Olfactory Receptor Genes in humans comprise ~1% of the total genome. The benefit here is in understanding how environmental changes trigger beneficial mutations and enhance sensory features.

https://en.wikipedia.org/wiki/Evolution_of_olfaction


I doubt the olfactory complex evolves via simple mutations directly on the receptors, but rather on other dna constructs that can quickly (and badly) replicate genes like retrotransposons


if anybody could answer questions like that definitely, it would be a great advance. There is fairly strong evidence that olfaction evolution occurs by gene duplication followed by selection.


This is quite similar to a project that we did in college as part of a introduction to data science course. The professor built a sensor using an array of different smoke detectors, then pumped air over different liquids (coffee, Coke, OJ, etc) then through the sensor array, capturing the signal strength in a text document. We used different classification techniques to determine the composition of unknown liquids.


I know I may regret saying this* but have they considered vaporising/heating/burning the air sample to produce a spectrograph and then running an image recognition for comparison to known smells. A bit like you can produce a spectrograph of an mp3 and get an almost instantaneous hit if the track is previously known. Subtract out the known smells and the remaining is an unknown. Find similar shaped molecules and return their likely traits (musky, floral, almonds etc.) with a visual keyword cloud.

It is two different things – you either recognize a scent OR you identify a scent. Why not speed things up by running it as a two step process? Recognize = Rapid results, Identify = Best Guess at what it might smell like. I’m not sure how a human determines that there is a smell of gasoline, freshly cut grass and a hint of something else/unknown in the air rather than thinking hmmm, I do not recognize a scent that has aspects of cut grass AND gasoline AND an unknown therefore the whole scent is ‘unknown’.

*anon-experts that go ‘gosh, why didn’t those idjuts with multiple PhDs think of that’


What you're describing is GC-MS, which indeed returns a spectrographic signature that is used to identify a molecule. As noted by @Terr_, that tells you what a molecule is made of but not what shape it is. That being said, the smaller the molecule the easier it is to accurately predict what shape it will be. When computational methods become questionable, you move on to NMR (and a whole host of other techniques).

The real issue here is that olfaction involves a small molecule interacting with multiple different proteins for your body to "read" it. We're not so great at the whole predicting drug-protein interaction thing just yet - some closely related fields include virtual screening, molecular dynamics, protein crystallography, X-ray diffraction, and cryoEM (among others).


I think a spectrograph tells you more about the constituent elements in the sample, instead of the various shapes of various molecules that existed before you heated it.


Interesting point. Are there sensors/tools other than a spectrograph that could help? Some clever use of a camera or something?


I won't claim it's impossible, but... olfaction boils down to a small molecule protein interaction, similar to drug discovery. The scales in question are far too small for typical imaging devices; these objects are far smaller than the wavelength of visible light. CryoEM and X-ray diffraction are used in these domains, but don't apply in the manner you appear to have in mind. I suppose CryoEM technically counts as "clever use of a camera" though.


Your comment lead me down an interesting rabbit hole. Stumbled across this which might count - "Photoacoustic spectroscopy has become a powerful technique to study concentrations of gases at the part per billion or even part per trillion levels." https://chem.libretexts.org/Bookshelves/Physical_and_Theoret...


When I was a teenager and was interested in chemistry 30 years ago, the question that bothers me all the time if it is possible to predict physical characteristics such as color, phase diagrams, and - yes - smell from chemical composition and structure. Tried to do that with pen and paper, but did not get much farther that acids smell acidic and alkali smell alkaline, and salts largely do not smell unless they easily dissociate. I quickly realized that the most interesting part of this problem is in organic compounds but that was well beyond my reach. Thinking about it now I am wondering about the choice of “variables”. To me it looks like we are trying to describe complex smells in terms of combination of other probably also complex smells. Is it the right base? If I were researching that I would try to find the bases - either chemical compounds with the simplest structure, like benzol, or compounds that trigger the minimal number of receptors and different and disjoint sets of receptors at that. Is there any research in that area?


I like it. I wonder if any faux meat companies are doing things like this?


It's a little surprising to me (not necessarily bad, just unexpected) that Google is researching this.

Yes, it's good for companies to do some R&D, and sometimes the R part of the R&D gets pretty theoretical, which is usually a good sign that a company is trying to really innovate.

But usually also there's some indirect way to tie it back to some kind of application that could possibly somehow make the company money in the long term. Otherwise, you're just a for-profit company spending money on something because it's interesting.

So what's the application here? Are there novel ideas or techniques here that can be applied to other AI problems? Is there some kind of application for smell in a Google product?


Google invests fairly heavily into research and there is a fair amount of freedom among engineers and researchers to do projects with Google resources, even when there is no direct product application.

For example, I ran an idle-cycle-harvesting service at Google called Exacycle that ran problems like protein folding, protein design, drug discovery, telescope discovery, and more. The only pushback I got was to run problems where the results could reasonably be considered "useful" (scientifically).

One way to think about it is that many of the people with power at Google really like science and have the resources to support it. Once you've built things like TPUs, it would be a waste not to dedicate some about of their resources to problems that people wouldn't be able to address.

Another way to think about it is these things have indirect effects- even if Google didn't want to make some sort of product with smell (like a phone with a builtin GC-MS?), publishing this work gets the attention of scientists, who will then read the paper, and consider Google Cloud as a place they'd like to do their work.


The backbone of this work is graph convolutional neural networks. These have wide spread applicability in predicting molecular properties (useful to Google Health), building recommendation systems (useful in many obvious ways to Google), and modelling phenomena in social networks, to name a few off the top of my head.

So while the scent identifying doesn’t seem directly relevant, nailing the underlying techniques is valuable.


Google Cloud wants to be known as the best place for all kinds of machine learning. Google may not want to get into perfume manufacturing, but it wants Unilever (?) to run all their ML research on their cloud. This kind of research strengthens that part of their brand.

(complete speculation, but seems plausible)


If we're feeling positive about the world, since it's Friday:

Google wants to help the world by reducing our reliance on farming animals for meat, and thought that research like this would help humanity's nascent artificial meat endeavours.


This is building a fundamental component for automated generalized molecule detectors.


From the second paragraph "Solving the odor prediction problem would aid in discovering new synthetic odorants,". Reading between the lines - a cheaper way to produce more realistic industrial aromatics cheaper.


unrelated: but there's a theory that the sense of smell is actually influenced/determined by quantum effects. see this:

https://en.wikipedia.org/wiki/Vibration_theory_of_olfaction


I read Luca Turin's book, "The Secret of Scent", years ago, and it certainly seemed engaging. Reviews on Amazon seem to be mixed.


I wonder if a similar technique could be used to predict a molecule's psychedelic potential.


There are some non-ML methods, like for example: https://en.m.wikipedia.org/wiki/Lipinski%27s_rule_of_five


IANAdrugologist but that rule is about bioactivity rather then psychoactivity. Open to correction though.


you're correct. it doesn't help a lot with predicting psychedelic activity other than to act as a negative filter for unlikely compounds.


Yes, it's just one piece of the puzzle. The op might also be interested in David E Nichols' research into structure-activity relationships of psychedelics. See: https://web.archive.org/web/20080704190159/http://www.heffte...


My chemical intuition says that yes, it could be done. However, I imagine psychedelic states are much more neurologically complex and molecules more chemically complex than the analagous experiences vis-à-via scent (although we certainly do tie smells to memories and other secondary and tertiary experiences).


If we can also predict side effects, maybe we could find the perfect drug ?


How would you quantify that?


Great question, one could use existing known chemicals as a starting point. There could be a potential to use fMRI readings on a model organism in realtime to generate data.


Compelling. I wonder what else this could be applied to in addition to psychedelics? Anti-anxiety and other sensory affecting drugs?

If you wanna get Black Mirror-esque, perhaps a Soma-like medication from Brave New World (essentially pacifies/zombifies you by creating endless bliss) could be made. Or the "bliss" drug episode of Doctor Who.


Sentiment analysis of trip reports of known compounds on erowid?

Shulgin gives ratings to compounds in PiHKAL. Those could be used as well.


Deep Smelling




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: