> this is what I mean by "a shitty answer fast" - structure prediction isn't a c...

roughly · 2024-10-09T21:26:17 1728509177

> ends up at the same point as a completed folding simulation.

Well, that's the hope, at least.

> Or more representative sequences and enough variants by additional metagenomic surveys, for example. Of course, this might not be easily achievable.

For sure, but for ostensibly profit-generating enterprises, it's pretty much out of the picture.

I think the reason an actual computational solution for folding is interesting is that the existing set of experimentally verified protein structures are for proteins we could isolate and crystalize (which is also the training set for AlphaFold, so that's pretty much the area its predictions are strongest, and even within that, it's only catching certain conformations of the proteins) - even if you can get a large set of metagenomic surveys and a large sample of protein sequences, the limitations on the methods for experimentally verifying the protein structure means we're restricted to a certain section of the protein landscape. A general purpose computationally tractable method for simulating protein folding under various conditions could be a solution for those cases where we can't actually physically "observe" the structure directly.

dekhn · 2024-10-09T21:27:50 1728509270

Most proteins don't fold to their global energy minimum- they fold to a collection of kinetically accessible states. Many proteins fail to reach the global minimum because of intermediate barriers from states that are easily reached from the unfolded state.

Attempting to predict structures using mechanism that simulate the physical folding process waste immense amount of energy and time sampling very uninteresting areas of space.

You don't want to use a supercomputer to simulate folding; it can be done with a large collection of embarassingly parallel machines much more cheaply and effectively. I proposed a number of approaches on supercomputers and was repeatedly told no because the codes didn't scale to the full supercomputer, and supercomputers are designed and built for codes that scale really well on non-embarassingly parallel problems. This is the reason I left academia for google- to use their idle cycles to simulate folding (and do protein design, which also works best using embarassingly parallel processing).

As far as I can tell, only extremely small and simple proteins (like ribonuclease) fold to somewhere close to their global energy minimum.

chermi · 2024-10-11T18:05:23 1728669923

Except, you know, if you're trying to understand the physical folding process... There are lots of enhanced sampling methods out there that get at the physical folding process without running just vanilla molecular dynamics trajectories.