Hacker News new | past | comments | ask | show | jobs | submit login
This Chemical Does Not Exist (thischemicaldoesnotexist.com)
135 points by optimalsolver on June 28, 2021 | hide | past | favorite | 82 comments



It appears that the molecules on the page are generated with a machine learning algorithm trained on a small organic molecules dataset that is heavily biased towards cyclic, particular nitrogen heterocyclic, structures (I only sampled about 40 of them, so it could also have just been my luck).

The model seems to have learnt chemistry pretty well because, as many have already pointed out, most of the molecules generated do actually exist (or are extremely likely accessible if they haven't already been documented). Even the ones with strange bond angles have otherwise perfectly normal number of bonds. The only time where I get molecules that cannot possibly exist are those with overlapping atoms that just defy known physics.

Addendum: it is worth noting that the model might actually have been trained with data that contain bond lengths, or even spatial information if no post-generation geometry optimisation is performed before a molecule is rendered.


For demos like this, it's pretty common to just run RDKit for a structural check before serving the user the actual chemical. I don't know what kind of model this is, though.

They could have limited SMILES (popular choice) to have a more stable generative space or they might have introduced validity into the loss. I think the coolest part is how the builder got all the rendering to work well!

Or it could also be a rule-based fragment model guaranteed to hit a valid structure, that works too.


I did exactly this once: trained a simple language model (Karpathy's Char-RNN iirc) on a txt file with SMILES strings.

After validating the output, it was easy to plot a (2D) skeletal formula. I never got around 3D renders, but I guess a SMILES -> 2D -> 3D pipeline with some molecular mechanics structure energy minimization for the 3D part is cheap to do.

I found the output surprisingly diverse. Model was very good in adding branched lipid tails that kept rambling on forever, though...


So https://thischemicalprobablyexists.com might be a better title?

I'm not a chemist, I've an undergrad degree in physics/mathematics. Intuitively, your answer sounds right, but I'm not in a position to judge for sure.



Chemistry is just the integral of physics (he said, extremely sophomoricly), so you should be able to work it out.


This chemical could probably be made but doesn’t exist seems most accurate.

https://www.chemistryworld.com/opinion/chemical-space-is-big...


Those overlapping atoms are just artifacts of the energy minimization they used that was not able to find a proper conformation. They may have filtered molecules with highly defavorable energies to avoid that.


The thing with randomly generating molecules is unlike with faces or cats, there is the good chance that a real molecule is generated. Unless they screen the molecule against a database and exclude matches?


Yeah, unlike https://thispersondoesnotexist.com/, this could really use a small About section or ? to hover over.

Either they do something clever to exclude real molecules, my understanding of chemistry is too limited (100% possible), or it's more like "this molecule might not exist"...


They return PDB format files for molecules:

https://www.thischemicaldoesnotexist.com/molecule.pdb

There appear to be several repositories of this format. Maybe they just randomly generate until they find one with a hash that doesn't exist? (Though it's not clear to me how much order of the lines in the format matters).


Since molecule equivalence is more or less a graph isomorphism problem it could be a bit tricky to hash them in a way that is physically meaningful.


More likely it’s not stable or no way to synthesize it. Complex molecules have internal “stress” that needs to be weaker than the individual bonds. Making explosives is often maximizing that stress while still making a viable molecule, kind of like a mouse trap.


Nah. These are all predominantly branched and/or cyclic carbon chains, with a few heteroatoms scattered in for effect. They probably burn well, they might smell nasty, but they are not going to be explosive. Explosives (or "energetic materials" as they are known in the trade) are generally all about stuffing as many nitrogen and oxygen atoms as possible into your molecule; take TriNitroToluene as an example, it has 7 carbons, 3 nitrogens and 6 oxygens, and that's fairly mild.

What these structures remind me most of is what you would find in a sour heavy crude oil. In fact, I can guarantee the person who named this website has never looked at high resolution mass spectroscopy analysis (like an FTICR-MS) of any type of petroleum, or they would have named it "this chemical is probably being pumped out of the ground right now".


Yes, practically this means shoving as many nitro groups together. But what makes nitro groups explosive to begin with? From what I remember, the large electron shells of the oxygen atoms repel each other, increasing the bond N-O-N angle by a few degrees. It's very easy to break these stressed bonds, and oxidize another carbon compound with the liberated oxygen, creating CO2 gas (the explosive bit).

Some of the most powerful explosives are made by attaching nitro groups to stressed ring or cage structures.


Whenever I hear of "nitrogen stuffing" I remember https://blogs.sciencemag.org/pipeline/archives/2013/01/09/th...


Haha! Perfect example of the "mouse trap".


Nothing that has 14 nitrogen atoms to only 2 other atoms in it can be considered a trap. It is bound to be exciting:) (see? "bound", haha, just barely)


Yeah, you are right that it's very easy to break apart nitro groups, but it's not much to do with the (O-N-O) bond angle. The bonds themselves in -NO2 groups are very weak, because nitrogen doesn't provide enough bonds to make things stable. Remember oxygen wants to "hold on with two hands" as we said in high school chemistry, that is to make a double bond or two single bonds. In NO2 each oxygen only gets one and a half bond (quantum mechanically it is in a superposition between having single bond to the first oxygen and double bond to the second, and vice versa). So this part becomes electronegative. Nitrogen, on the other hand, wants to have three bonds in total, but is forced to have four: 2x 1.5 to the oxygens, and one to the rest of the molecule. So this part becomes electropositive. Chemists will speak here of a resonance structure, which you can imagine is something that it is quite easy to excite.

When the nitro group breaks apart, the nitrogen finds another nitrogen from another NO2 and forms N2 gas, which is highly stable due to its triple bond, so this part of the reaction releases a lot of energy and produces a lot of gas, and is very fast since it does not depend on any other sub-reactions. And stuffing lots of just nitrogen (without oxygen) into a molecule is in itself a way to make it very explosive - see azidotetrazolate salts.

In NO2 decomposition, the oxygen then goes on to find carbon to make CO2, and hydrogen to make H2O, releasing more energy and producing more gas. But this first requires breaking down the relatively stable bonds in the hydrocarbon, so it actually consumes energy from the NO2 decompostion, before it releases more energy than it consumed.

Then stoichiometrically you want to ensure that you have enough oxygen for all your carbon and (ideally) hydrogen, or you'll end up producing a lot of "unburnt" stuff which is inefficient. Notice that for each carbon in a linear hydrocarbon chain (-CH2-) you need 1.5 NO2 groups to get a complete reaction into CO2 and H2O. If you only have 1 NO2 group, CO2 will be formed and you will have excess H2 which is not combusted.

Now as you say there are some stressed rings or cages that are hideously sensitive explosives, precisely because the hydrocarbon bonds have also had their stability reduced. But these typically are not practical explosives. For that you want the stuff to be a solid at a wide range of temperatures, you want it to be non-sensitive to friction and impact, and you want it to have a low vapor pressure. These are all details which depend strongly on the internal structure of the molecule.


I think most of their energy simply comes from the fact that diatomic nitrogen and carbon dioxide are very stable.


As someone who knows nothing about chemistry, this was a great explanation of how explosions work. Thanks.


For a quick check they could run the generated compounds through these databases Zinc15 - https://zinc15.docking.org/ Chembl - https://www.ebi.ac.uk/chembl/

Also as an aside I believe there's a current trend to generate chemical compounds by creating SMILES strings using BERT which is a cool way to incorporate language and chemistry (An example of a team doing that https://www.cell.com/iscience/fulltext/S2589-0042(21)00237-6)


Are we sure that's the case? I have no idea what the combinatorics are like for molecules of this size. They seem small enough that it would occasionally generate molecules that have existed at some point, but that's based on some really fuzzy intuition.


I tried a few times, and on the second load of the page I got a single hydrogen atom, so I think it's safe to say they aren't excluding things.

There was recently a link, I think on the front page here, to an article about how many chemical compounds there are [1]. Based on that link we're looking at probably trillions to quadrillions of potential structures with atomic weight under 300, which would cover the structures I saw in my few reloads of the page.

Chemistry is wild. For an example close to home, taking table sugar (sucrose, a single type of molecule) and applying heat to caramelize it results in hundreds to thousands of different end products from at least half a dozen qualitatively different classes of chemical reactions.

[1] https://www.chemistryworld.com/opinion/chemical-space-is-big...


I never got far enough in chemistry to really figure out if it has explanatory or predictive power.

If you ask a CS grad "What will happen if you run this program?" they should be able to predict it. If they've gone through nand2tetris they can explain it all the way -- compiler, OS, machine language, ALU / registers / bus, logic gates.

If you ask a chemistry grad "What happens when you apply heat to this molecule?" can they predict it? Can you explain it all the way -- from molecules to atoms to electrons to quantum fields?

If we can't predict "Okay this is what will happen if I mix these two substances together," how do we have a good scientific theory? I guess chemistry says we always end up with the same atoms we started with (unless you start to go nuclear by using energetic particles to modify the nucleus), but can we predict which of the zillions of possible rearrangements will actually happen? We know by experiment that H2SO4 is an acid, and that H2SO4 is a "legal" molecule in a way that HSO3 or H5S7O9 are not. Is there a way to figure this out from first principles? Can you figure out by inspecting the chemical formula that H2SO4 will be an acid if you didn't already know that ahead of time? Can you figure out that H2SO4 will be a "legal" molecule but H5S7O9 will not? Can you look at a reaction and tell whether it will "compile" and what it does, the same way you can look at a program and figure out if it will compile and what it does? If you can't, why not?

And what use is a theory of chemistry that can't make concrete predictions? If you just have a list of known substances and reactions, is that even a theory, or is it just experimental data?


> Can you figure out by inspecting the chemical formula that H2SO4 will be an acid if you didn't already know that ahead of time?

Strictly speaking, the answer is "no", because it's not the formula that matters but the shape. And something like C₆H₁₂O₆ doesn't tell you how all of the bonds hook up--are we looking at esters, alcohols, ketones, aldehydes, carboxylic acids? There's several distinct molecules that have that formula, and those distinctions matter for chemistry.

But given the actual structure of the compound? Yeah, we can compute a lot of stuff. That list of words that probably meant nothing to you--that's different kinds of functional groups, and functional groups tend to react in very similar ways when given several compounds. And organic compound is basically all about identifying these groups and the ways in which they react.

> Is there a way to figure this out from first principles?

"First principles" in this case would basically be a large dose of molecular orbital theory, derived from quantum mechanics. And yes, we can develop a good deal of explanations by recourse to molecular orbital--for example, why aromatic and antiaromatic compounds exist, despite the fact they superficially look like the same structure.

> Can you look at a reaction and tell whether it will "compile" and what it does, the same way you can look at a program and figure out if it will compile and what it does? If you can't, why not?

Typically, the difficulty is in figuring out how selective reactions are. If you've got a molecule with a couple different C=C bonds in it, and you're doing an addition reaction across those bonds, predicting how many of those bonds, and which ones specifically, change in the reaction is more of a crapshoot. So it's not foolproof, but it is generally reliable enough at this point that organic synthesis has moved from "here's a Nobel Prize for figuring how to synthesize vitamin B12" to "congratulations on being hired; why don't you synthesize this molecule while we ramp you up on the job."


Computational chemistry answers some of these questions.

When you look at just a molecule by itself “what happens if you apply heat” is somewhat simple. Covalent bonds just break because the molecule is vibrating too much - think of a covalent bond as a flexible strut, if you put too much pressure on it, it snaps. This can result in the temporary formation of unstable molecules that then recombine. You could predict which particular bonds in a molecule are unstable based on the total structure, angles, electronegativity, polarity, etc.

But of course those small unstable molecules can further breakdown, react with each other, and react with the parent molecule to form new stuff. So basically the parent molecule is part of some huge “power set” of potential molecules all interacting with each other.


the Maillard Reaction?


The Maillard reaction is protein browning,separate from sugar caramelization


both are very interesting both from a chemistry standpoint as well as a tasty standpoint.


Exactly.


I like this but it would be even more fun if it displayed the IUPAC names of the chemicals it generated because they would be absolutely ludicrous.


As it happens, I know of no current system, free or proprietary, which can correctly generate arbitrary IUPAC names. All of them I have seen mess up things such as the nested enclosing mark pattern, stereodescriptor placement (and resulting enclosing mark modifications), lack of enclosing marks, lack of stereodescriptors for undefined stereocenters, lack of omission of locants, locants on substituent groups, use of multiplicative vs. substitutive nomenclature, nomenclature for aldehydes and polynuclear noncarbon oxoacids, cyclic phane nomenclature, italicized letters, and a myriad other parts of IUPAC nomenclature that the Blue Book specifies very clearly.

Putting that rant aside, most of these chemicals are small enough to have rather tame IUPAC names. As an example, these are the IUPAC names of the first five molecules I loaded (disregarding stereochemistry):

  N-{1-[(2-fluorophenyl)methyl]-1H-pyrazol-4-yl}-2-[(pyridin-3-yl)oxy]benzamide
  N-ethyl-1-{5-[(2-methylcyclopentyl)methoxy]pyridin-2-yl}ethan-1-amine
  2-fluoro-N-{4-[3-(hydroxymethyl)azetidine-1-carbonyl]phenyl}benzamide
  N-(8-methoxy-2H-[1,3]dioxolo[4,5-c]quinolin-7-yl)-2-methylcyclopropane-1-carboxamide
  1-ethyl-N-(2,3,5-trimethylcyclohexyl)piperidin-3-amine
If you want an example of an absolutely ludicrous IUPAC name, I'd suggest the one I put onto https://en.wikipedia.org/wiki/Maitotoxin earlier this month.


At what point a 'name' becomes an 'infodump'? :-D

Or more aptly, what constitutes a single molecule, rather than a complex structure built from a repetitive pattern of almost-equal components?


> Or more aptly, what constitutes a single molecule, rather than a complex structure built from a repetitive pattern of almost-equal components?

The bonds. Not a chemist, but if I remember enough chem 101, all molecules are bonded together by either covalent or ionic bonds between the individual atoms. There are names for some of the common "building blocks" of molecules; i.e. a methyl group is a single carbon with 3 hydrogens bound onto it, and an arbitrary atom on the 4th side.

A "complex structure built from a repetitive pattern of almost-equal components" sounds more like a crystal. There are still individual molecules in a crystal, but the molecules are arranged in a precise and repeating pattern.


Hmm... first molecule I got is in pubchem as cpd 5216868. I guess the space of possible molecules with under, say, 25 heavy atoms, while large, is much smaller than that of possible faces.


I noticed a pharmaceutical come up a little while back when I tried this out myself. Maybe a better name would be "This Chemical May Not Exist".


I am not sure this site explores all possible molecules. Rather, it shuffles couple different real groups in different combinations.


I understand why "This person does not exist" is interesting. Generating a fake person seemed hard until we could do it.

But why is this interesting? Minus the animation, isn't this something any smart high-schooler could do with pen and paper?


Heh, just found out there's one for numbers https://thisnumberdoesnotexist.com/


It did reliably generate numbers that produced zero google search results. Maybe that's what they meant?

Though it did also occasionally spit out a "number" with a letter in it, like "q29199.951301068788".


I just laughed for a full thirty seconds.

What's most interesting about that is somebody actually bothered to put it together.


The logic for that generator is particularly basic: https://thisnumberdoesnotexist.com/js/main.js


> Generating a fake person seemed hard until we could do it.

This is also something a talented high-schooler could do well before now. The interesting part there is teaching a computer to generate plausibly-human faces, not the act of generating fake people in general. This is the same way.

This seems more interesting because it has practical implications. Generating chemicals that haven't been investigated is the first step towards a programmatic pipeline that can identify and investigate novel chemicals. These seem of particular interest because they're of the "doesn't currently exist" variety, rather than the "cannot exist under the laws of physics" variety.


I think the hard part is that you can't draw something that already exists


Is it guaranteed not to exist? I would have assumed the interesting part is that these obey the laws of chemistry. But even then it seems like you could do something fairly simple algorithmically to achieve this.


Yeah, it seems like the only way to guarantee it doesn't exist (rather than that it simply hasn't been catalogued) would be to draw something impossible -- which seems less interesting than drawing something possible.


> Minus the animation, isn't this something any smart high-schooler could do with pen and paper?

Our 8 year old does this with pen and paper, and sometimes also with some website that lets you draw molecules (there's a few, forget which one(s) he uses). That said, his molecules aren't always possible (he understands valence but sometimes he makes mistakes with it or just stops caring about it).


That's an interesting way to phrase it. I get that this is a play off of the "This X doesn't exist.", where X is a machine-generated entity such as a picture of a cat or the bio for a person.

If by "chemical" you mean "substance (as represented by this molecule)" and if by "exist" you mean "hasn't been made yet," then it might make sense.

"This substance (as represented by this molecule) has not been made yet" doesn't have the same ring to it, though.

Molecules are abstractions. Leaky ones at that.


I'm not a professional chemist, but I'm pretty sure Hydrogen exists: https://i.imgur.com/X1Lo62h.png


I'd like to see a "This Chemical Is Not What You Think It Is" site, that explains what chemicals like the infamous dihydrogen monoxide really is.


Sounds like a dangerous chemical. Luckily we have stuff with electrolytes.


It also frequently contains deuterium hydroxide, a chemical used in nuclear reactors and the manufacture of thermonuclear bombs.


its what plants crave!


Previous submission with comment from creator is here: https://news.ycombinator.com/item?id=26937223


Many of the chemicals generated actually do exist.


Now someone get thisphotodoesnotexist.com and show an image of random pixels.


This Chemical Probably Does Not Exist


First one I pulled up I recognized as actually existing. Seems like they need a blacklist of real chemicals?


Should rename to 'This Chemical Might Exist'


Somebody hook this up to a 3d bio printer thing... that'll go well right?


Not sure if the "3d bio printer thing" that you're thinking of can synthesize stuff at the molecular level though. Maybe it would be fun to have it print scale models of the chemicals with colored filament.


If you can find an efficient, generic, and universal method for synthesizing molecules of arbitrary shape and complexity, then you would receive the Nobel Prize for Chemistry and possibly Physics and Medicine too. You would also likely receive the Turing award (if your method is algorithmic and not using ML blackboxes since the search space for biochemistry is absolutely immense) and there may be entire prizes named after you.

Whoever can find such an algorithm will put Corey and Woodward out of a job and the entire field of organic chemistry will study your name and life in future.


> If you can find an efficient, generic, and universal method for synthesizing molecules of arbitrary shape and complexity, then you would receive the Nobel Prize for Chemistry and possibly Physics and Medicine too.

I hate to burst your bubble, but that is almost certainly not literally true. Any device with those capabilities will be the end result of heck of a lot of engineering (much of it incremental), but whatever scientific discoveries are necessary will be so far removed that they will hardly seem relevant.

Theoretically, the 2016 Nobel for chemistry might prove to be the relevant foundation for the necessary engineering, but time will tell.

Anyway, discoveries like CRISPR, that can be immediately (relatively speaking) deployed as a revolutionary tool, are by far the exception.

As an exercise, feel free to identify the Nobel prizes that were awarded for the invention of xerography (aka photocopying), digital laser printing, and thermoplastic 3D printing (or any other extant additive manufacturing method).


The type of "incremental advances" you see in recent Nobels are the exception in history, not the norm. Stuff like CRISPR is what the Nobel is designed for.


I've seen it sometimes generates pentagon and heptagon rings. Are these structures actually possible?


Is there a significance of this molecule?


Well, it seems to be generated randomly; similar to the "this person does not exist" site(s)

https://www.thispersondoesnotexist.com/


I missed that it changed


Guessing that site's showing composite-images?

Many of the images seem reasonable. They can have odd asymmetries that may give an unnatural vibe, though most don't seem to have majorly overt issues.

Most of the more overt issues seem to be melding facial wear (like glasses and ear-rings) into skin.

The most overt oddity was a woman with "stuff" splattered on her face.. I'd be curious how/why that'd be something that could be generated..

A lesser oddity was a man who had a mustache that appeared to be shaven on one half, but not the other.


It's using generative adversarial networks create these images from a model.

Well explained here: https://www.youtube.com/watch?v=SWoravHhsUU


It's not composite any more than human imagination is composite. It's based on existing images, but only in the sense that they formed a basis for learning how a human face generalizes.


(It changes every time you refresh the page)


Ahh I missed that


The model might need some more training. Seeing a Fluorine bonded to 4 other atoms is a bit ... odd


Pretty bold title. Seems like it would be more "not known to exist."


If it has never been made, does it exist if the pathway does actually exist?


Does it have thiotimoline?


I got a lone Hydrogen atom


Atomic hydrogen definitely exists


Duplicate..seriously? Was posted two months ago here: https://news.ycombinator.com/item?id=26937223


From the FAQ:

> Are reposts ok?

> If a story has not had significant attention in the last year or so, a small number of reposts is ok. Otherwise we bury reposts as duplicates.

It's intentional unclear what significant attention means, but the last submission has (3 points | 64 days ago | 3 comments) that is not very significant.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: