I'm doing FFTs at regular intervals using the Accelerate framework. Then a lot of rolling window comparisons of frequency data to try and determine which fingerprints these frequencies match closest.
I imagine it's similar to what Shazam do. I read the original algorithm paper years ago; it's very interesting and written clearly enough to be understood by people with zero background on the subject: