Hacker News new | past | comments | ask | show | jobs | submit login

I've always wondered how services like Shazam work. I'm amazed that they can do this kind of perceptual hash against ANY 10 second portion of a song. How do they search against something like that when they don't know the start or end time of the segment that is being input?



I do research in music information retrieval. See the ISMIR 2003 paper below. In short, it searches for landmarks in the spectrogram, hashes those landmarks, then compares those hashes against database hashes for temporal continuity. http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf

A seminal paper on audio fingerprinting is the one by Haitsma and Kalker. http://ismir2002.ismir.net/proceedings/02-fp04-2.pdf


There's a gpl implementation of an audio information retrieval approach here: http://code.google.comp/audioscout/


Thanks!


A Matlab implementation and tutorial to give you an idea can be found here: http://labrosa.ee.columbia.edu/~dpwe/resources/matlab/finger...


Yes the delta part is clever. Shazam is well described here: www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf


For pop/rock/rap music it probably doesn't matter. For classical, I'm not sure.


I think you just suggested that classical music is all the same.


I meant to suggest the near opposite. That with pop/rock/rap, a 10s chunk is enough to get a signature for the wole piece. While for classical, the music changes enough that any 10s chunk may not respesent adequately a different 10s chunk.


There was a cool HN article a while back on building your own Shazam clone in Java.. I think the author received a C&D..

Found it! http://www.redcode.nl/blog/2010/06/creating-shazam-in-java/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: