*Great summary.* Thank you for posting it here! Your comment, along with Fripple...

justkk · 2024-07-08T15:49:34 1720453774

Since in LLM what is important is the result, I wonder if all those dimensions are dependent of accuracy, so that the dimension can be low if you want low accuracy but you need a high dimension (a large number of parameters) in order to increase accuracy. If this intuition is right then dimension is not the key concept rather the key is how minimal dimension at required accuracy scale with accuracy. A metaphor is the way in which humans structure knowledge, we don't learn by heart, rather we learn by considering local and global relations with other areas in order to construct global knowledge. So the curve that reflects the best tradeoff of dimension versus accuracy is an important curve that merits to be studied. In general, to learn well you need to separate clearly the main parts, so the regions should be structured in such a way that they provide rich and independent information, so simply using the number of regions don't seem to me to be enough, it can contain a lot of noise or randomness.

Another point about the number of regions: if the number of regions is similar to the number of clusters in a clustering algorithm then the number of cluster is not a key factor since very different number of clusters could give similar performance and looking for a minimum number could limit the generalization capabilities of the model.

In support vector machines there is the concept of margin between regions. If we fix a threshold to separate regions by a fixed margin then the number of regions is less noisy since you eliminate redundant and low information regions. So fixing the minimum margin or threshold seems to be the first step prior to studying the relation between the number of parameters, number of regions and performance of the model.

PS: edited several times.