Those databases are used to derive the evolutionary couplings and distance matri...

COGlory · on July 28, 2022

It's all about boosting signal by finding other proteins that are similar, until you get to the point that you can identify a fold to assign to a region of the protein. That's why some are structural, and some are not.

>Furthermore, AlphaFold can function with only a MSA as an input, without retrieving a single PDB coordinate.

Yes, it has a very nice model of what sequences should look like in 3D. That model is derived from experimental data. So if I give AlphaFold an MSA of a new, unknown protein fold (substantively away from any known fold), it cannot predict it.

flobosg · on July 28, 2022

> Yes, it has a very nice model of what sequences should look like in 3D.

A structural model, you would say.

> That model is derived from experimental data.

That doesn’t make it a template-based model, or a homology one.

> if I give AlphaFold an MSA of a new, unknown protein fold (substantively away from any known fold), it cannot predict it

That will depend on the number of effective sequences found to derive couplings. Domains with novel folds usually have a low number of remotely homolog sequences and for that reason the method will fail, not just because they are novel.

COGlory · on July 28, 2022

>Domains with novel folds usually have a low number of remotely homolog sequences and for that reason the method will fail, not just because they are novel.

How can you say this but not believe it's doing homology modeling?

flobosg · on July 28, 2022

Because homology search is not homology modelling. And a multiple sequence alignment is not a structural (i.e, with three-dimensional coordinates) template.

ssivark · on July 28, 2022

For someone who knows very little about this field, could you elaborate on what specific aspect of “homology modeling” AF violates/circumvents which makes you call it “homology search” instead?

flobosg · on July 28, 2022

Homology search is a method to find homologous sequences, that is, evolutionary related sequences that posess a common ancestor. This was usually done based on how identical sequences were, but newer algorithms allow to find remote homologs even when the identity between the sequences is very low. The first step in AlphaFold is to retrieve as many remotely homolog sequences as possible to generate a multiple sequence alignment (MSA) that will be used to generate the embedding.

On the other hand, homology (or comparative) modelling is a method that generates a structural model of a query sequence based on one or more experimentally solved structure of a close protein homolog. The model generation details depend on the specific protocol but, broadly speaking, spatial restraints are extracted from the template structures and mapped to the query sequence to be modelled.

Note that AlphaFold also uses a type of geometrical restraint (pairwise residue distances) in its modelling, although they are not derived from protein structures but the MSA embeddings. Both are related but are not exactly the same.

One difference between AlphaFold and homology modelling is that the latter requires templates having a certain sequence identity with the query sequence (≥30% is the rule of thumb), while the former can have in its MSA remotely homolog sequences well below any discernible identity.