I guess this kind of tutorials in python are more popular since the R / Matlab people come with statistics background and don't really need the tutorials on PCA.
PCA is basically a decomposition of data into a low rank Euclidean space (where the principal axis accounts for the bulk of the variation in the data and each successive axis is orthogonal and accounts for the remaining variation). Notably though, most problems do not lie in a Euclidean vector space — they often exist on a lower dimensional manifold embedded into such a space. More recent work considers the problem of performing low rank decompositions if you assume constraints on the factors, i.e., general priors. The work on this is quite recent and may be of interest to someone. You can essentially perform PCA with constraints using a message passing algorithm (originally derived from physics applications):
I recently implemented PCA by computing the covariance matrix and using power iteration to obtain the eigenvectors. I found these pages useful for understanding how PCA works:
Mildly curious question I've read a few papers recently which talk about 'Projection to Latent Structures' (PLS) I know PLS and PCA are related but I'm struggling to understand the differences and when you should use one or the other. Are there any good references people could recommend?
I believe PLS is related to PLSR, where the difference is that you are trying to maximize the covariance between the predictor variables and the observed variables.
Agreed, but that Sword of Damocles cuts both ways. The thing you do with your data which starts out useful at first can and will easily erode into infrastructural debt if not built on sound foundations, which cannot be found off the shelf in standard scientific computing libraries that have been around for decades.
That difference approaches zero as programming more symbolic and as implementations become more available.
Libraries like TensorFlow, Theano, etc. are great because then I can focus on what I want (math) instead of spending my time telling a computer how it should multiply things together in an imperative sense.
I agree. I'm surprised with the number of situations where people use PCA when ICA really would be more appropriate. I suppose that's the difference between knowing the tools and knowing the math behind the tools though.
"Principal Component Analysis is a dimensionally invalid method that gives people a delusion that they are doing something useful with their data. If you change the units that one of the variables is measured in, it will change all the "principal components"! It's for that reason that I made no mention of PCA in my book. I am not a slavish conformist, regurgitating whatever other people think should be taught. I think before I teach."
McKay did foundational research in machine learning; was a chief scientific advisor to the UK government's department for climate change; created eye tracking user interface software for paralysed people; and wrote an excellent text book on information theory. I.e. a proper Olympian. So when you read a statement like that, you should take it seriously and possibly realign your world-view.
I'm sure David MacKay is well aware of the need for normalization — his point is that people use PCA without understanding what it means and how to perform it properly. Memorizing a technique without having any idea of why the technique works often leads to meaningless (or worse, incorrect) interpretations of data.
http://michaeljflynn.net/2017/02/06/a-tutorial-on-principal-...