Interesting stuff! In the world of equipment for making this kind of music, there’s a device called Mutable Instruments Grids [1] whose maker did this same sort of analysis just for drum patterns. She designed a “topographic drum sequencer” with 3 knobs that smoothly transforms across the “eigenbeats” of a training set of electronic music rhythms.
One of the issues when using PCA (or POD, Karhunen-Loève expansion... whatever your field calls it) on a dataset is that it optimizes the spectral content of each mode to obtain the best possible orthonormal basis for the reconstruction, so you can get very different frequencies in a single mode. Should you want to restrict this, you could use DMD (Dynamic Mode Decomposition, one frequency only per each mode) or mPOD (Multi-Scale Proper Orthogonal Decomposition, a restricted set of frequencies per mode).
Also some of the frequency spectrum lopsidedness could be explained by the properties of our ears, to which modern music is acutely tuned. Look up "A weighting" as an example "loudness" space, which differs from straight amplitude/SPL because of how much less sensitive the average person is to bass than to, say, 2.5kHz tones.
"I downloaded 10,000 electronic music tracks that were licensed with the Creative Commons License."
I'm honestly surprised there is this much music with this license. To download 10k means there's an even larger pool of music available. I know the royalty free music is becoming more wide spread for all of those youtubers, but I would not have thought it to be this big.
So even after equalizing, all the top components are similar variants of kick drums. I think this is a problem - PCA should really identify maximally different things for the top components. If they're not perceptually different, then the PCA is not doing the best job at capturing perceptually relevant variance.
I presume what is happening is that the drums all have slight variants in timings, and PCA is not phase insensitive, so sees them as totally different.
I wonder if you'd get better results if you downsampled even more agressively, or did some more sophisticated feature transformation.
Have you considered trying more sophisticated dimensionality reduction algorithms? I'm especially interested in seeing the results of UMAP on this dataset.
Also, what about doing inverse transforms to try to hallucinate music?
Neat. Waiting for “the drop” or changes in the music that “tickle my senses” makes me wonder if an emergent Property (the pleasurable sound from the composition of base, kick, Snares, “woop” in this case) can be covered with PCA.
More than the sum of its parts (emergent) is to be reduced to its principal components.
Slightly off topic, but I think some version of this idea appears in Jonathan Franzen's `The Corrections` - it's the startup that makes Brian's millions and allows him to fund The Generator?
[1] https://mutable-instruments.net/modules/grids/