PCA for example. It is essentially SVD. And it is quite funny though, I believe ...

chriswarbo · on Aug 27, 2018

PCA can "learn" in the sense that we can apply it to more and more data, and that can affect the directions of the components. For example, let's say we take measurements one at a time and apply PCA to the whole dataset after each measurement; if the first 100 measurements follow some distibution A, then PCA will give the components of A; if the next 1000 measurements follow some different distribution B, then PCA will 'learn' to give the components of B a higher priority than those of A; i.e. it adapts to the data.

In that hypothetical setup, we end up storing every raw datapoint. If we think of these as parameters, then PCA is a non-parametric model; although it's maximally inefficient in terms of information/compression.

plouffy · on Aug 27, 2018

The learning aspect could be how many principal components to use ?

tanilama · on Aug 27, 2018

That is a hyper parameter to the model. It is not learned.