I don't think "arbitrary linear algebra operations" is a valid critique. If you ...

murbard2 · on Aug 23, 2018

If you think of the covariance matrix, entry i,j for i ≠ j will be

   floor(n / (p[i]*p[j])) / n - floor(n / p[i]) * floor(n/p[j]) / n^2

and the ith diagonal entry will be

   floor(n / p[i]) / n - ( floor(n / p[i]) / n )^2

for n large, you approximately get a diagonal matrix with diagonal entries / eigenvalues 1/p[i] - 1/p[i]^2.

mturmon · on Aug 23, 2018

Smart observation. Another way to say it is that, for distinct primes p1 and p2, the events “p1 divides n”, and “p2 divides n”, are approximately statistically independent. So you get a near-diagonal covariance with entries as you wrote.

jacobolus · on Aug 22, 2018

> If you understand PCA as "take the SVD of the data", then the operations seem arbitrary. But if you understand it as, "construct a low-rank approximation in the L2 sense to the data, or its covariance", then it's not.

Those are the same thing. Is your first case just a reader who has no idea what the SVD is?

mturmon · on Aug 22, 2018

Yes, they are both descriptions of the same thing. I'm trying to say that PCA does have a justification. It's not just an "arbitrary linear algebra operation", although the application of the SVD algorithm to perform PCA can be presented that way.

bhl · on Aug 22, 2018

Are you saying the kth principal component is equal to the kth prime number?

mturmon · on Aug 22, 2018

Yes. See the plot: for example, PC #1 is essentially a 0/1 vector with all its weight (1 in this case) placed on the "2", which is the representation of the prime number 2 in the scheme used in the OP.