A Beginner’s Guide to Eigenvectors, Covariance, PCA and Entropy

alephone · on Aug 19, 2015

"square matrices have as many eigenvectors as they have linearly independent dimensions; i.e. a 2 x 2 matrix would have two eigenvectors, a 3 x 3 matrix three, and an n x n matrix would have n eigenvectors, each one representing its line of action in one dimension."

This is not quite right. First of all, if v is an eigenvector of A, then a*v for any non-zero a is also an eigenvector of A, so it makes no sense to say "there are n eigenvectors". Perhaps you meant to say n linearly independent eigenvectors?

But this is also not correct. An n x n matrix does not necessarily have to have n linearly independent eigenvectors; consider for instance A = [1, 1; 0, 1], which has dim N(A - I) = 1 (hence 1 linearly independent eigenvector). Basically, not every matrix is diagonalizable. This example also shows that rank(A) is also not necessarily equal to the number of linearly independent eigenvectors (is that what you meant by linearly independent dimensions?).

A more correct statement would be to say every symmetric matrix has n linearly independent eigenvectors (and its eigenvectors can be chosen to form an orthonormal basis for R^n).

vonnik · on Aug 19, 2015

Thanks for making that point. The statement "square matrices have as many eigenvectors as they have linearly independent dimensions" is meant to inform the statements that follow.

jordigh · on Aug 19, 2015

This statement makes no sense. Independence is not a property of dimensions. A set of vectors is said to be independent if the only linear combination of them that is zero is the trivial linear combination. Otherwise, they are dependent.

Or do you have a definition of independence for dimensions? That would first require a different definition of dimension. Ordinarily, dimension is just the cardinality of an independent, spanning set (we call this a basis). It is a single number. A linear space does not have a plurality of dimensions, nor do matrices have an associated plurality of dimensions.

Please don't mix technical words without defining them. In other areas of life, humans get along fine by using very imprecise, poetic, ambiguous, and metaphoric language. This is not how mathematics works. Everything you say must have a precise definition. :-(

vonnik · on Aug 19, 2015

Hi Jordi - I'm sure you're right. You have an amazing resume. Please help us make ND4J/ND4S better. https://github.com/deeplearning4j/nd4j

Nasa JPL is using the framework for climatic modeling, so maybe we can help save the planet together. :)

One response I would have to both your comment and @alephone's rewording, which is perfectly correct, is that they are incomprehensible to most people. So the question is: How do you bridge the gap to make non-mathematicians understand?

The other would be: It's hard to talk about math. There's always slippage when you try to speak in an everyday way about technical things. I'm not sure how to get around that. Too much precision, in this case, undoes the goal of reaching another audience.

tanderson92 · on Aug 19, 2015

> One response I would have to both your comment and @alephone's rewording, which is perfectly correct, is that they are incomprehensible to most people.

They are incomprehensible to those who do not understand the mathematics which are being discussed.

> So the question is: How do you bridge the gap to make non-mathematicians understand?

By teaching them mathematics.

> Too much precision, in this case, undoes the goal of reaching another audience.

No. In this case what failed was the author's understanding of the criterion for diagonalizability: the operator being normal or self-adjoint (depending on the underlying field). It isn't hard to add a caveat emptor saying "in some cases operators may not have a full set of eigenvectors, so it is only appropriate to say a matrix can have at most as many linearly independent eigenvectors as its order", provided the author realizes that this is indeed the case.

Precision is important because there are phenomenally important cases where operators have no eigenvectors at all! Consider an arbitrary rotation in R^2 (fun fact: in odd dimensions all real operators have an eigenvalue, why?); clearly, this fixes no vector.

vonnik · on Aug 19, 2015

I've added a caveat along the lines of your suggestion.

> By teaching them mathematics.

The terse complacence of this statement, and your easy use of technical terms, make me doubt whether you know what it means to teach mathematics, to empathize with those who lack knowledge, and ultimately to help math exit the ghetto of jargon.

tanderson92 · on Aug 19, 2015

It wasn't meant to be complacently terse, it was meant to represent the basic tautology. Mathematics is a language as well as a way of thinking and one needs both since things are truly complicated under the hood. The language is actually our friend here, it allows people to speak clearly and have near universal understanding of what is meant. With this in mind, the correct answer to 'how do we communicate maths to non-mathematicians'(as opposed merely to teaching its importance which is advocacy) is to teach them the maths. That's how I see it anyway.

vonnik · on Aug 20, 2015

I understand the tautology. It's inherent in every field that requires specialization. And the point I'm trying to make is that the best teaching relies on ideas or analogies already familiar to the student. It's not enough to say math is a language. So is French. But to learn French, we first learn to map our native language to the new one. The analogy isn't perfect, but math often describes things in the world, and we can use those things to provide windows to the math itself.

While the language of math is a useful tool for the initiated, like all linguistic barriers it keeps out newcomers. So yes, we must teach people math, but we must do it at first, as much as possible, in a language that allows them to enter into a new way of thinking. It's about unfurling a series of ideas one at a time instead of all at once. (I don't pretend to do that perfectly, but that was the intent. My understanding is as defective as some matrices. ;)

jordigh · on Aug 19, 2015

I see you've reworded "independent dimensions" to "independent variables". This indicates to me a lacuna in your understanding of eigenvectors, possibly because symmetric matrices are the only matrices you are intimately familiar with.

You should consider what happens with rotation matrices (no real eigenvectors), defective matrices (no complete set of eigenvectors), and the Jordan form. Wikipedia has adequate articles on these topics.

Also, about the Jordan form, if you've only ever worked with matrices over floats instead of exact precision, it would be understandable why you are not familiar with the Jordan form, as it's numerically useless, because it's numerically unstable. The tiniest perturbation will change a defective matrix into a diagonalisable matrix.

thearn4 · on Aug 19, 2015

Good points. Honestly, I feel like a discussion on PCA using the SVD as a background is much more intuitive, and would fit with some of the author has claimed about eigenvectors.

tsumnia · on Aug 19, 2015

You linked it on the bottom of the page, but how do you think the article compares with http://www.cs.otago.ac.nz/cosc453/student_tutorials/principa... ?

I'm planning to read through it a little more this weekend, but I remember using the PDF and a whiteboard back in the day to understand PCA.

vonnik · on Aug 19, 2015

The PDF through the link is excellent. Two thumbs up.

madvoid · on Aug 19, 2015

I understand the concept of eigenvectors/eigenvalues, and I understand the concept of the covariance matrix, but I'm having a hard time understanding why the eigenvectors of the covariance matrix represent the principal components. Does anyone have a intuitive explanation they would like to share? Maybe I don't understand eigenvectors as well as I thought I did...

tanderson92 · on Aug 19, 2015

The eigenvectors of any diagonalizable matrix (which any covariance matrix is, see elsewhere on this page for a discussion of why this is the case) form a basis for the space in which the matrix operates. The eigenvectors corresponding to the "largest" eigenvalues are called principal components because they are the directions in which the matrix principally act.

Consider a matrix A which has two eigenvectors u, v with eigenvalues a, b, and a >> b. Then A(u + v) = au + bv. But since a >> b the operator is stretching u moreso than v, and as a result we think of an operator acting 'principally' in direction u rather than v, and hence the eigenvectors are considered the 'principal components'.

I exaggerated the case by letting a >> b, but the same holds true for a similar in magnitude to b, but by considering multiple applications of the matrix you see the same effect: A^n (u + v).

covariance matrices are nothing special in this regard.

zmmmmm · on Aug 19, 2015

Here is a totally intuitive and probably wrong way to think about it, but its what I have in my head as my own mental model. Imagine that a matrix transformation is like a wind blowing in a particular direction. If you throw a ball, it gets blown by the wind, and its direction gets changed. If you throw it obliquely it ends up a certain distance from you. Which direction would you throw it in to make it go as far as possible? You'd line up the throw with the direction of the wind. Effectively that's finding your largest eigenvalue / eigenvector.

Now, for finding the first principle component we do something special. We create a wind that blows in the direction of variation of the data. We do it by getting the covariance between all our variables and substituting them into the each position of a matrix so that they make up vectors where each dimension is weighted according to the size of the variation in that direction. This makes a "wind" in the direction of the variation of the data. The first principle component finds the vector that most aligns with this covariance "wind".

I am sure this is technically wrong in essential ways and I'd love to hear it be corrected!

thearn4 · on Aug 19, 2015

Personally, I think an explanation using the SVD would probably be more intuitive:

http://math.stackexchange.com/questions/3869/what-is-the-int...

vonnik · on Aug 18, 2015

DL4J co-creator here: we did a tutorial on restricted Boltzmann machines last week that people might find helpful as well: http://deeplearning4j.org/restrictedboltzmannmachine.html

bigger_cheese · on Aug 19, 2015

Thank you for posting this. It was fortuitously timed I am reading through a thesis at the moment where the author has used PCA along with something called a PLS (Projection to Latent Structure) to perform a multivariate analysis. I was hoping I could use this technique on my own dataset

I was in the process of googling the acronyms when I stumbled on this link in HN.

tptacek · on Aug 19, 2015

This is extraordinarily well written.

jcadam · on Aug 19, 2015

Nice. I was futzing with covariance matrices in Scala today at work and realized I really need a refresher on some of this stuff (college was over a decade ago).

Fortunately, we have a couple of scientists on staff I can usually bug for assistance when I get confused.

bcheung · on Aug 19, 2015

Very simple to understand and straight forward. Thanks for writing this up.

stared · on Aug 19, 2015

This "entropy" part is missing - no pictures, no formulae... Delete it or elaborate carefully. (Other parts are great!)

abhisheks91 · on Aug 19, 2015

Very well written !! Thank you !