I've always wondered how services like Shazam work. I'm amazed that they can do this kind of perceptual hash against ANY 10 second portion of a song. How do they search against something like that when they don't know the start or end time of the segment that is being input?
I do research in music information retrieval. See the ISMIR 2003 paper below. In short, it searches for landmarks in the spectrogram, hashes those landmarks, then compares those hashes against database hashes for temporal continuity. http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
I meant to suggest the near opposite. That with pop/rock/rap, a 10s chunk is enough to get a signature for the wole piece. While for classical, the music changes enough that any 10s chunk may not respesent adequately a different 10s chunk.