> Finally: every kid can draw up novel structures. Then: how do you actually fabricate these (in the case of real novel chemistry and not some building-block stuff). Noone has a clue!
I personally have a clue, and the entire field of organic chemistry has a clue, given enough time and money most reasonable structures can be synthesized (and QED+SAScore+etc and then human filter is often enough to weed out the problem compounds that will be unstable or hard to make). Actually even some of the state of the art synthesis prediction models are able to predict decent routes if the compounds are relatively simple [0]. The issue is that in silico activity/property prediction is often not reliable enough for the effort to design and execute a synthesis to be worth it, especially because as typically the molecules will get more dissimilar to known compounds with the given activity, the predictions will also often become less reliable. In the end, what would happen is that you just spend 3 months of your master student's time on a pharmacological dead end. Conversely, some of the "novel predictions" of ML pipelines includign de novo structure generation can be very close to known molecules, which makes the measured activity to be somewhat of a triviality.[1] For this reason, it makes sense to spend the budget on building block-based "make on demand" structures that will have 90% fulfillment, that will take 1-2 months from placed order to compound in hand and that will be significantly cheaper per compound, because you can iterate faster. Recent work around large scale docking has shown that this approach seems to work decently for well behaved systems.[2] On the other hand, some truly novel frameworks are not available via the building block approach, which can also be important for IP.
More fundamentally, of course you are correct, and I agree with you: having a lot of structures is in itself not that useful. Getting closer to physically more meaningful and fundamental processes and speeding them up to the extent possible can generate way more transparent reliable activity and novelty.
There's a lot that can be learned with building-block based experiments. If you do a building block based experiment then train a model, then predict new compounds, the models do generalize meaningfully outside the original set of building blocks into other sets of building blocks (including variations on different ways of linking the building blocks). Granted that's not the "fully novel scaffold" test, however it suggests that there should be some positive predictive value on novel scaffolds.
We've done work in this area and will be publishing some results later in the year.
I personally have a clue, and the entire field of organic chemistry has a clue, given enough time and money most reasonable structures can be synthesized (and QED+SAScore+etc and then human filter is often enough to weed out the problem compounds that will be unstable or hard to make). Actually even some of the state of the art synthesis prediction models are able to predict decent routes if the compounds are relatively simple [0]. The issue is that in silico activity/property prediction is often not reliable enough for the effort to design and execute a synthesis to be worth it, especially because as typically the molecules will get more dissimilar to known compounds with the given activity, the predictions will also often become less reliable. In the end, what would happen is that you just spend 3 months of your master student's time on a pharmacological dead end. Conversely, some of the "novel predictions" of ML pipelines includign de novo structure generation can be very close to known molecules, which makes the measured activity to be somewhat of a triviality.[1] For this reason, it makes sense to spend the budget on building block-based "make on demand" structures that will have 90% fulfillment, that will take 1-2 months from placed order to compound in hand and that will be significantly cheaper per compound, because you can iterate faster. Recent work around large scale docking has shown that this approach seems to work decently for well behaved systems.[2] On the other hand, some truly novel frameworks are not available via the building block approach, which can also be important for IP.
More fundamentally, of course you are correct, and I agree with you: having a lot of structures is in itself not that useful. Getting closer to physically more meaningful and fundamental processes and speeding them up to the extent possible can generate way more transparent reliable activity and novelty.
[0] https://www.sciencedirect.com/science/article/pii/S245192941... [1] http://www.drugdiscovery.net/2019/09/03/so-did-ai-just-disco... [2] https://www.nature.com/articles/s41586-021-04175-x.pdf