Let me take a stab at this (I'll maybe take it halfway there). First of all we w...

Let me take a stab at this (I'll maybe take it halfway there). First of all we want to know what kind of matrix we are talking about.

Imagine that you have a whole bunch generative models (its best if you imagine a fully connected Boltzmann machine in particular, whose states you can think of as a binary vector consisting only of zeros and ones) that have the same form but different random realizations of their parameters. This is a typical example of what a toy model of a so-called "spin glass" looks like in statistical physics (the spins are either up down down, usually represented as +1/-1). Each of these models, having been initialized randomly will have their up particular frequency of a particular location (also called site) of the boolean vector being either a one or a zero.

If the tendency of a site to be either or one or a zero was independent of every other site the analysis of such a model would be pretty straightforward: every model would just have a vector of N frequencies and we could compare how close the statistical behavior of each model was to the other by comparing how closely the N frequencies at each site matched one another. But in the general case there will be some interaction or correlation between sites in a given model. If the interaction strength is strong enough this can result in a model tending to generate groups of patterns in its sequence of zeros and ones that are close to one another. Furthermore if we compare the overlap of the apparent patterns between two such models, each with their own random parameters, we will find that some of them overlap more than others.

What we can then do is to ask the question of how much, on average do the patterns of these random models overlap with on another in the full set of all models. This leads us to the concept of an "overlap matrix". This matrix will have one set of values along the diagonal (corresponding to how much a models patterns tend to overlap with themselves) and off diagonal values capturing the overlap between. You can find through simulation or with some carefully constructed calculations that when the interaction strength between sites is small that the off diagonal elements don't tend to zero, but rather a single number different from the diagonal value. This is perhaps intuitive: these models were randomly initialized but they are going to overlap in their behavior in some places.

Where things get interesting though is when you increase the interaction strength you find that the overlap matrix starts to take on a block diagonal form, wherein clusters of models overlap with one another at a certain level and at a lower but constant level with out-of-cluster models. This is called one replica symmetry breaking (1RSB). These different clusters of models can be thought of as having learned different overall patterns with the similarity quantified by their overalp. If you keep increasing the interaction strength you will find that this happens again and again, with a k-fold replica symmetry braking (kRSB) with a sort of self similar block structure emerging in the overlap matrix (picture is worth a thousand words [1]).

Now the real wild part that Parisi figured out is what happens when you take this process to the regime of full replica symmetry breaking. You can't really do this with simulations and the calculations are very tricky (you have a bunch of terms either going to infinity or zero that need to balance out correctly) but Parisi ending up coming up with an expression for the distribution of overlaps for the infinitely sized matrix with full interaction strength in play. The expression is actually a partial differential equation that itself needs to be solved (I told you the calculations were tricky right), but amazingly, it seems to capture the behavior of these kinds of models correctly.

Whereas mathematicians have a pretty good idea of how to understand the 1RSB process rigorously, the Parisi full replica symmetry breaking scheme is very much not understood and remains of interest both to complex systems researches trying to understand their models and applied mathematicians (probability people in particular) trying to lay the foundations needed to explore the ideas being explored by theorists.

Hope that helps a bit!

[1] https://www.semanticscholar.org/paper/Spin-Glasses%2C-Boolea...