Shazam is one of the very few apps in the past 20 years that STILL "wows" me. I have no idea how the tech works, and I even sort of like not knowing, to be honest. It's one of the very few apps out there that still in exist in a "magical" way to me. I am constantly impressed with how fast/easy it works, even with very obscure music. What an amazing app.
Fun quick related story, about 10 or more years ago there was a back tracking song on a TV show (Scrubs) that I really liked that was only in the Netflix version. It was just an instrumental song with some French sounding words speaking in it so there was no easy way to search for it. However, it was distinct enough that it didn't seem like something made just for the show. It was also pretty quiet and under some talking in the tv show scene. I had posted on reddit asking if anyone knew it, and never got any responses. I searched all over the web, but no source had the track details. It drove me crazy every time I would hear the song in re-watching the show, and I still could not track it down every few years when I tried again. Back then, Shazam had no cataloging of it so it wasn't in there either yet. However, when re-watching it a few years back again, I tried Shazam again and to my surprise it finally worked. I was blown away that Shazam was finally able to solve this 10+ year mystery. It was one of the coolest feelings every to scratch that itch finding this rare French song and hearing it in full. It was truly magical.
EDIT: Oh sorry, I didn't think anyone would actually care about the song itself lol It was called "Sans Hésitation" by the French-Canadian band "Chapeaumelon". https://www.youtube.com/watch?v=Ju4d3YQhByU - It's also interesting cause now the song does in the episode in tv music database sites. Very cool.
Shazam used to wow me, but then as others mentioned in the replies it's essentially matching the signature of the sound to the sounds in the database. If it's one of the song, it gets matched fairly quickly.
It works so well even with my shitty humming - even my girlfriend can't recognize what the song is but Google can. It doesn't even have the same signature as the original audio file, just similar hums in a noisy environment and it still works. Black magic fuckery.
> it's essentially matching the signature of the sound to the sounds in the database.
You aren't giving it enough credit. The algorithm uses just a few seconds from any part of the song, and has to deal with phone audio quality and often background noise. I mean, you can be in a bar with all that jabber and hold up the phone and it could pick out the song. The app on the phone does the preprocessing to the audio before it is sent to the server that does the matching ... using the comparatively miserable power of a 2001 era cell phone.
Oh that wasn't my intention - Shazam was and is groundbreaking, they did it when no one else could. All I meant was that it seems more "doable and I probably understand how it works" when compared to how Google assistant recognizes songs from my humming.
What is a signature? How is a signature computed from a noisy audio stream, over a mall speaker? How is a signature computed from an arbitrary starting point?
IIRC, it's uses a Fast Fourier Transform of the time delay between high notes in the song to generate a series of "hashes" that are stored a db. Those ids can be calculated locally on the phone and then its a simple db lookup to retrieve potential hits. When Shazam adds a song to the db, they compute a series of "hashes" so you can identify at any point in the tune.
Wow, that's fascinating! I just ended up down the rabbit hole reading Avery Wang's "An Industrial-Strength Audio Search Algorithm" (linked in this thread) - it's such a cool way of "fingerprinting" pieces of music data.
My original comment was from memory of reading a post about how it worked a few years ago. Looking at what you read, I think the gist of what said is right, though it seems they use a different algorithm than FFT.
Totally agree though. It is something that opened my mind to thinking of a way to solve that problem in a way that actually works. Shazam definitely looked like magic the first time I saw it work.
TL;DR (from skimming thru the paper) he figured that a song's spectrogram looks like a starry sky, so matching a song is like finding a constellation on the sky. How do you do it efficiently? By searching for simple features of your constellation, such as pairs or triples of bright stars - those can be pre-hashed to find matches instantaneously. Once a possible match is found, you compare the rest of the constellation. Nothing breathtaking, in other words. However, among all the men who talked, he was the one who both talked and did, and that's his achievement.
Brilliant stuff is easy to understand, a lot harder to come up with. I could do that! (With a little help from wikipedia, audio processing libraries, the answer sheet, and the knowledge that it's possible in the first place)
To me, this highlights how hashing is the closest thing programmers have to magic.
Create a compound signature. You don't just take one measurement but many measurements and then assess the probabilities. You may have people talking in a mall, but they will be in a narrow frequency band. Similarly you can analyze the repeating elements. Keep iterating and adding stuff until f(signal) performs well
> Wow blew my mind was when Google introduced 'hum and we'll recognize the song for you' in Google assistant
Their announcement actually made me roll my eyes a bit, as Soundhound had that functionality nearly a decade before. I had both SH and Shazam installed on my old phone for these usecases - now Shazam is baked into Siri so I don’t even have the app itself installed.
I haven’t tried humming with Shazam recently, but I don’t think it worked well back when I did have the actual app. It works very well for music though. I used it around five times, just this Wednesday night at a concert, and it got every track for me.
Soundhound is what had humming “support” explicitly in its product description, and it worked pretty well from what I remember. It’s been long enough though that I may only be remembering the times it worked.
If you prefer to access it via your iPhone's control center, you can configure it that way in the control center settings. It is called "Music Recognition" there.
Nice I will have to check this out, control center is definitely a great little overlay but I haven't reliably figured out how to add things to it. I will investigate further.
For general information have a look here [0]. Also be aware that elements in the control center might even offer additional functionality, e.g. like setting the brightness of the flashlight. In this case instead of just switching the flashlight on by a tap on the button, keep the flashlight button pressed to bring up a slider to set the brightness. Just play around with the other control center elements to find out what is possible.
> Essentially matching the signature of the sound to the sounds in the database.
And Dall-E 2 is just doing fuzzy hashing of images with text keys.
Shazam continues to amaze me because it "just works", and still feels more magical to me than most of the AI out there since it directly solve a major problem I didn't even think was solvable "what is this song!!?"
I enjoy salsa dancing, but I don't know any Spanish, so I use that built-in Google functionality to hum various songs all the time to figure out what they're called.
As an almost teenager at the time, that (Shazam over the phone with an answer texted back - which I used on a Nokia 3310) was the one thing that convinced me we would soon have pocket devices that really could do anything.
And while it took a few iterations (for me, from palm pilot to blackberry as a teenager, then eventually moving to iPhone after a few too many painful Blackberry upgrades - still missing that unified inbox though, as is everyone else I know who had a BB of that era... and frankly missing a great physical keyboard on a phone, too) I still am impressed on a daily basis that I do indeed have the device in my pocket that 12 year old me dreamed of.
I didn't know it ever worked that way, that's incredible. Reminds me of ChaCha, the texting service where you texted questions and a human would quickly look up the answer and text it back. It's a very cool idea that was quickly outmoded by smart phones and is kind of lost to history now.
I remember Sony Ericsson handhelds all came with TrackID back in the day (2007/2008) and I used it to name music I heard in public. It was the same idea. I think it charged £1-2 per track!
I'd take a landline over the inconsistency of cell calls any day. Maybe peak/maximum quality is better with today's tech, but reliability definitely isn't. I would bet average quality of calls has dropped too. HD voice is a carrier specific thing, no? Most of the time I can barely even hear the other person, much less have HD anything.
I think you have been misinformed. Voice calls over plain old telephone service were band-limited to <4khz decades before cell phones became popular. This was necessary in order to cram more active calls into our existing telephone infrastructure.
Today's g.722 (HD Voice) is better, but GSM codecs are also 8kHz sampled, then lossily encoded. If the compression is appropriate, gsm is 13-bit per sample vs 8-bit per sample commonly used for POTS (minus robbed bit signalling in the US), but if the compression isn't appropriate, you get some pretty nasty artifacts. Encode/decode delay can be significant in some applications, but since GSM is TDMA anyway, you're going to have buffering and may as well use that to encode; a T1 PRI multiplexes one sample at a time, so a lot less delay there.
I’m sure that’s all true. I’m also pretty sure I wasn’t regularly shouting into my AT&T landline “I can’t hear you!” Obviously we’ve gained a lot with cell phones and portable music playing but it’s been mostly at the cost of consistent quality.
You can't just post a story like that and not link the song!
Personally, my main usage of Shazam is for identifying vaporwave samples. Often all you have to do is throw the song in Audacity, tweak the speed a bit, and Shazam it.
There are entire albums on Spotify which are full songs of 80s pop classics, played at a slower speed, then uploaded as a new album from another artist.
Interesting, have any of them been hit with copyright violations?
I ask because I like to create bootlegs (basically homebrew remixes, these are substantial re-imaginings of the original track) and would like to put them on Spotify, but am worried about copyright issues and how that might affect posting original music.
The first time I heard of Shazam was on a road trip with a friend of mine who had minimal tech skills at best. I was already 10 years into my career as an engineer, and when he told me about it, I honestly didn't believe him; I was positive he was mistaken, and speculated it was a service similar to Aardvark[1], which was a peer-to-peer information engine.
I was wrong, of course, Shazam really did live up to its hype. I think it's interesting that the someone knows about how a technology works the more sceptical they are of what it is capable of.
I don't know about Shazam's current algorithm specifically, but years ago I worked at a place with a mathematician that worked on gracenote's algorithms, and asked him for the basics on how it works.
Basically, it records audio chopping it up into small segments and throwing them through a FFT. Then it takes that, and thinking of the data like a greyscale spectrograph image, runs it through a quantization filter that helps reject some noise, then converts that to locality sensitive hashes that are sent to the server. So basically FFT, filter, hash, lookup.
Don't want to spoil it for you if you really don't want to know but I want to share to others in case they do because I found it so interesting when I first learned!
It's short but very cool. I read it a while ago and honestly can't pretend I fully grokked everything, but my understanding was that you can't just use a Fourier transformation alone. Noise would basically make this impossible.
So what I'd consider the key insight is that they compressed songs down to "fingerprints". IIRC they noticed that songs, even in noisy environments, preserved certain bits of information. Particularly, they could look at the spectrogram and see peaks of amplitude in the tapestry. They essentially set some radius and scanned the spectrogram. In a given radius, only the largest amplitude value in time and frequency would be preserved. So you've reduce a 3MB song to several bits.
This would be good enough for small databases (I think). But it's intractable for anything practical. So they built hashes out of these fingerprints using pairs of the preserved peak bits. They would choose a certain peak (called the anchor point), record its time offset from the start of the song, and then form pairs with other nearby peaks, saving the pairs of frequencies (but discarding e.g their amplitudes). So for each of these anchor points, you would get a 64 bit value: 32 bits for the time offset and track ID and 32 bits of frequency-pairs.
When you wanted to look up a song, they would fingerprint your snippet into multiple 32bit hashes and compare them against the frequency-pair hashes in the database. If a song was a good match, then you would see that your snippet matched against multiple hashes from that song, and specifically they matched linearly over time (I'm struggling to explain this bit but it's visually obvious if you look at Figure 3 in the paper).
I probably got some of this wrong, but I hope it's a helpful summary of the paper. I remember struggling to understand parts of it, so please let me know if anything I said is egregiously wrong!
I had a similar experience looking for a background track in an episode of This American Life. I couldn't remember which episode it was, and none of the lyrics were in English. Pretty sure I went through backwards through the episodes and listened to all the credited songs to find it. The song was 69 Police by David Holmes, which still feels perfect to me. https://www.youtube.com/watch?v=IWissIWxqKk
On the topic of background music, tons of original background music copies/imitates famous stuff. Sometimes it's "I wanted the sound of X but couldn't afford it", but there are some in-jokes in there too. Wish I could remember some examples.
I think all (so simple) you have to do is parse all the tracks ever made, and say generate a sequence of snapshots of what the tune sounds like and the delta.
e.g. if it was notes (for simplicity) E,D,C,D,E,E,E,D,D,D,E,E,E is the start of "Mary had a little Lamb"
Millions of tracks contain the note E. Many hundreds of thousands probably have the note D next - and as you work through the sequence, you're pruning down that list until you who what it is.
Bit that makes my mind hurt though, is the data-structure you put those sequences into to make it quickly searchable.
Users can start recording at any point in the song - so you can't just prune a tree down from a known starting point.
There's going be be background nose - so you need some way of "when you have no choice left", I presume sticking wild-cards into the previous decisions, to see if you end up back on a known track.
Yeah - I think it's magic as well.
Other thoughts:
I used it back in the UK when it launched, and the first track I ever used it on dialling (2580 - the numbers down the middle of your keypad) was also a French track (MC Solaar – La Vie Est Belle)
I always felt they missed a trick, just identifying music (and then trying to sell you stuff).
Surely they could have used the same tech to seamlessly mix all music together.
(i.e. take the sequences within tracks they find hard to differentiate, and then use these points to allow two tracks to be mixed together).
What's the minimum number of tracks it would say take to seamlessly mix from Megadeth to Mozart?
The "doesn't work with live performances" bit is borne out by my consistent experience failing to identify some songs at live performances, but with the "DJ Set" form of live performance, tempo shifting music without pitch shifting it still appears to get the goods more often than not.
We use AcoustID in MusicBox[0] to identify and deduplicate content, and it works great for us.
What we do is calculate the acoustic fingerprint of every uploaded content and compare/check for duplicates (only authorized staff can upload, but this still helps a bunch with user errors and in cases where you need to reupload a track). Then we compare the fingerprints, using this[1] approach, so we can fine-tune the similarity based on our needs.
In our case it's been very effective. Yes, live versions are treated as different ones (which is exactly what we need in our case, so it's a feature for us), but mechanical differences between tracks (volume, slight distortions from codec, different compression levels or remasters, or track being cut differently) are just ignored.
If you ever want/need audio fingerprinting, I can warmly recommend it.
I think you're talking about a live recording vs a studio recording? But what I think zelos was talking about was "someone is currently playing music live, what is it?", which is a lot harder because you need to recognize the essence of a song and not the essence of a recording of a song.
My instinct is that it probably isn't as simple as you describe because not only are there multiple notes at a time in a given track (i.e. chords), but there are also several tracks playing at once! It's possible that they're literally generating data like {guitar 1: C chord, guitar 2: single note E, bass: single note E} for every point in time, but even then each instrument isn't playing the exact same rhythm most of the time, so the notes won't exactly line up. I guess I don't think it's completely computationally infeasible to do it this way, but it seems more likely that they're just trying to separate the music from the background noise and then try to find the closest match to the music audio as a whole rather than trying to separate it into component.
Sorry - I wasn't clear.
I don't mean they're listening for notes. They're just analyzing the wave-form/fingerprint/whatever-you-want-to-call-it that's being generated at a moment, and then one form the next moment, then the next.
One of these might match random points in many songs, but a far smaller subset of these will have the same three in the same sequence.
> Surely they could have used the same tech to seamlessly mix all music together. (i.e. take the sequences within tracks they find hard to differentiate, and then use these points to allow two tracks to be mixed together). What's the minimum number of tracks it would say take to seamlessly mix from Megadeth to Mozart?
I noodled around with this idea in my free time a few years ago, got absolutely nowhere really usable with it (I probably put in a couple hundred hours).
I knew I was limited by my dataset (small), code quality (terrible) and understanding of musical theory (virtually nil).
Maybe I'll pick up that idea again - even doing beat matching would be kind of neat.
Shazam as a product feels a bit odd. Almost as if they’ve never quite outgrown their slightly sketchy “advertised on MTV2 alongside the Crazy Frog” origins.
They must have loads of data on songs people actually want to know yet never really managed to turn themselves into anything more sophisticated.
It does a Fourier analysis of sections of the song, and puts the results in a database. A Fourier analysis yields what frequencies make up a waveform along with their amplitudes, so it is very compact.
Taking the DTFT of a signal yields exactly the same amount of information, so it's not really more compact. Shazam used a spectrogram (which is more information than the original signal) and searched for peaks to create a finger print.
It's not the analysis that is compact, but the fingerprint derived from it.
I know it contains the same information, but it makes it easy to discard the low amplitude frequencies, and the frequencies that are not heard by the ears, or are not particularly important to our ears.
Shazam is great but a similar app that really "wowed" me around 2007 was Midomi - it could recognise humming with good results, even though I'm really bad at hitting right notes and key. It still exist but is not really talked about anymore, Shazam seems to have dominated that market.
Shazam is not particularly complex, however it is a very clever solution and a great example of applying a simple engineering concept broadly. I still hold it as one of the best examples of clever engineering in the app world
I have a relevant story here. The inventor of the Shazam algorithm, Avery Wang, gave me a demo of it within a couple weeks of it being created. Here is the backstory (partly from personal knowledge, partly relayed by Avery).
Avery had gotten his PhD from CCRMA at Stanford under Julius O Smith. His PhD had been on the topic of automatically ("blind") recovery of individual vocal/instrument tracks from a final mix. From there he joined a startup, Chromatic Research, where I was also at. He created lots of code and patented some algorithms related to resampling and MIDI synthesis, stuff like that. Avery was (and is) a super nice guy, humble but incredibly smart -- he could work not only with the high level mathematics but was also equally excited to fine tune assembly code.
After Chromatic folded, Avery had been struggling to get his own startup off the ground. About the same time, the Shazam guys had the idea for the product but didn't know how to create the algorithm. They approached Smith at CCRMA looking for someone capable of creating something that worked. Smith suggested they try Avery Wang.
At first Avery said, "Hmm, that seems difficult, but let me think about it." Within a week he had a demo running on a few thousand songs he had gathered from CDs. I'm sure a lot of refinements went into it after that, but the core idea took him a weekend.
[ All factual mistakes above are due to me and 20 year old memories -- if I misrepresented something it certainly wasn't due to Avery telling me something that wasn't true ]
> August 2002: Shazam launches as a text message service based in the UK. At the time, users could identify songs by dialing “2580” on their phone and holding it up as a song played. They were then sent an SMS message telling them the song title and the name of the artist.
Incredible! Curious to know what exactly happened backend after it listened to the audio, and what hardware it ran on.
Just as amazing to me is that the algorithms could identify a song through the extremely limited bandwidth and spectrum of an early-2000s CDMA stream and a cheap Kyocera microphone.
Low bandwidth is perfectly suitable for low frequency data (ie melody). You lose some of the high frequency details (ie timbre), but it’s still very easy to recognize songs.
It’s the same as recognizing objects in a 256x256px image.
Try resampling a song from 44kHz to 4kHz and you’ll still have no trouble recognizing it.
To put some numbers on it a piano goes from A0 (27.5 Hz) to C8 (4186 Hz). Most vocalists and most instruments you are likely to hear will be somewhere in that range.
A 6 string guitar for instance goes from E2 (82 Hz) to E6 (1318 Hz) for 24 fret electric (classical guitars typically have 19 frets and go to B5 (988 Hz) and acoustic guitars have 20 or 21 frets so go to somewhere in between).
Popular singers with high notes are Mariah Carey who goes up to G7 (3136 Hz), Christine Aguilera who reaches C#7 (2217 Hz), and Prince who could hit B6 (1975 Hz) [1].
Plain old analog telephones and, I believe, early cell phones had a voice band of 300-3300 Hz.
They would have no trouble with most of the notes in the upper parts of the aforementioned ranges, except for the top 5 notes of a piano. As you note you'd change the timbre, but you'd still have the right notes.
Low notes might be a problem though. If you lose everything below 300 Hz that would cut out most of the left hand on a large majority of piano parts. On guitar it would cut all the notes that cannot be played on the first strings except for one.
That would change the notes. You'd lose the fundamental of a lot of notes just leaving the overtones, so it would look like the musicians played a higher note.
My guess is that when they were processing the song database to generate the hashes they put the songs through a bandpass filter the was smaller than the frequency range of the most limited device they supported listening on. Then when listening on any other device they could filter it down to that so those hashes would work.
The UK was one of the first countries to introduce GSM-EFR which used the ACELP codec at 12.2 kbit/s for phone calls. The quality was actually pretty good.
I don't really understand why phone call fidelity hasn't improved since then. Sometimes it seems like it's even worse!
It's obviously a cracking algorithm, but what made Shazam doubly remarkable was how efficiently they turned it into a working product.
It wasn't just a case of developing an algorithm that could in theory be used to match an audio signal against all the world's pop songs. They presumably also had to get hold of a substantial number of those songs, fingerprint them, and roll out the search robustly against generally very poor audio hardware using simple telephony services at (for the time) quite considerable scale. They did it very quickly, it worked super well from launch, and it's been running continuously ever since.
I've read the paper about the method, but I would love to know more about the original development and deployment.
Or just collect mp3s? I'm guessing that at least happened while prototyping. And it seems like the most efficient way to make the service work in that era so it wouldn't surprise me if that data got used in production initially while handling licensing concerns separately.
I think it was a premium-rate number you had to call. When it became an app, it eventually had an ad-supported version with a paid, premium version "Shazam Encore" that was ad-free.
Yep, 50p a go from memory. Charged to your phone bill.
I used Shazam in London when it launched. I left the UK in 2003 so I reckon it was literally just as it launched. Presumably I saw billboards or a Wired article or something?
I'm just imagining the founders as every stoned business school student at the time like "Bro, wouldn't it be cool if there was an app where you just press a button and it tells you what song is playing" and they actually managed to find the guy capable of building it.
To this day, Shazam still has that aesthetic of "press big button, do magic".
Actually, the first version in 2002 worked by listening to a phone call. There were no apps really at the time, at least not in the iPhone sense (because it didn’t exist).
Maybe in a parallel dimension
where internet took a few more decades to be created we call a number to listen to our Twitter feed narrated by some robotic voice, and we get a bulletin/form every week where we write down who we want to follow or unfollow, and of course a field where to write our own tweets!
When I first used Twitter it wasn’t that far off. You could text your tweet to 40404 and it would post it. If I recall correctly it would also send you a text for any @‘s too
Shazam always blows my mind. It doesn't work 100% of the times, but when it does it feels like magic.
On top of that they introduced (I don't know exactly when) the feature to see lyrics for the song which are automatically synched with the music. This is also mind-blowing.
Only Google has managed to top Shazam in blowing my mind, and only ~recently, by making this whole process happen completely offline and continuously in the background on a phone. It's not as broad but still incredible. Google's paper: https://arxiv.org/abs/1711.10958
Sometimes, I like to stop and think about all the amazing things that we can do with our phones and that we take for granted.
What I do is to imagine myself finding a smartphone in elementary school (90s kid). These are a few things that would blow my mind:
- Having a digital global map, with multitouch, that can show me where I am in that map. I can search anything and find reviews from virtually anywhere in the world. I can zoom and see my actual house. I can use street view.
- I have access to any song I want.
- The phone can listen to a song and it can tell me the name of it (then I can listen to it again)
- I can play video games with much better graphics than my N64
- I can watch movies and TV in there.
- I can video call
- I have a digital assistant
- I can find any answer online
- I can buy anything online
- In the future all this technology is not just for the rich, virtually anyone can buy a smartphone.
In 2004 I took my grandpa for a ride in my car, which had GPS. He couldn't stop staring at it. I asked him what's wrong, and he said, "When I was an insurance salesman driving door to door in LA, I had entire file folder of maps I kept on the passenger seat to find my way. If I had this gadget back then, I could have sold twice as much insurance".
Shazam loads so freaking fast and ready to listen on my iPhone I really want to read an article on how they did it. It loads as fast as an empty hello world app but the button is ready to press and listen!
Sadly, that is not my experience with the app. I would equate the time my Shazam loads and is ready to listen as roughly the same amount of forever that it takes to launch the native camera app and being able to take a picture.
Tangent, and this may be one of those cases where I learned about this on HN and am now passing it back to HN so pardon if I'm just late to the party, but if you're a regular Shazam user on iPhone you don't need to open the app anymore. You can just add a widget to control center. https://support.apple.com/en-us/HT210331
No, it definitely is not. Besides on many occasions I would not even want to do that.
Fastest is simply turning on “Shazam on app start” in the settings and then starting the apps. However if your phone is locked then indeed what parent commenter said is quickest.
Way before apps were even a thing, around 2002-2004, I lived in the UK and shazam worked as calling service: you'd call 2580 (top to bottom on the center column of a phone numpad), it would listen for 30s, then would hang up and send you an sms with the name of the song. IIRC it would charge you something like 50p if it found a result.
I turned them down after an interview in around 2000/2001 (they started out in London). The reason was that the idea seemed completely useless to me and I thought they'd never make a business out of it :-)
That's interesting - I had a vague recollection of having heard of them before launch - I guess they were hiring from the pool of developers being laid off from the dotcom bust?
I have an image in my mind of my boss at the time going around the office asking if anyone was interested in talking to this thing called Shazam. I've long wondered if I imagined it. I certainly didn't act on it.
I remember (not much later than this) interviewing at a place where the product was intended to be "an automated assistant that listens to your phone call and pipes supporting information to your computer as you speak". Obviously I gave them a wide berth. It's funny to think about the "gap" in magic - Shazam seems magical but totally worked, this other idea seemed magical and, at the time, totally was.
I checked my email and the interview was actually in mid March 2002, not 2000/2001. I think still just before they did the initial launch of the premium phone service. Here's the job spec:
> Role: Senior software engineer - Low Level Device , Distributed
Communications Role mission: To ensure that Shazam's subsystems are integrated and interface effectively and efficiently with external partners' systems/hosting environments, yielding available, robust and scalable full offerings. Key Performance Areas: 1. Design real time software using standard techniques and protocols, to be scalable, maintainable and robust 2. Manage & collaborate within and between team(s) 3. Implement quality
software solutions within budget 4. Ensures that design and implementation of software is of high quality 5. Ensures that all deliverables are documented Required Skills/Capabilities <B7> Knowledge of interfacing peripheral and devices to Linux <B7> Knowledge of Linux device drivers a plus. <B7> Distributed messaging techniques and protocols, eg: PVM, MPI <B7> Ability to
grasp and work with abstract concepts <B7> Familiar with current software engineering methodologies e.g. RUP, XP <B7> Understands and is able to manage quality assurance e.g., module tests, code review Required Knowledge/Previous Key Experience <B7> At least 4 years of full-time software engineering within a team of at least 3 sofware engineers. <B7> Must have been
involved in all phases of the software cycle from requirements engineering
to launch. <B7> Must have developed low level device or communications software
<B7> Experience with Computer telephony a big plus <B7> Experience with a
high-growth startup environment a plus Ideal Qualifications Ideally
University degree in Computer Science (alternatively at least 4 years of
proven software engineering experience). Please forward your CV/resume',
with cover e-mail, including full details of your earnings expectations, to recruit <at> shazamteam.com
Just read this paper, it is brilliant. And TIL about Shazam and the idea still sounds useless to me. Seems like I have never had this problem in my life.
I use it several times a week. Driving with classical station on. Is this Mozart? That cadence sounded like Mozart, but I'm not sure. <Shazam and wait 10 seconds> Oh, it's Brahms! And now I have it auto-saved to listen to when I get home.
I used to work at Shazam as a front-end web developer and they really supported me. Going to conferences, hackdays (20% time), being exposed to real production-grade systems, it was great. Very grateful to have worked there.
Not Shazam but I remember a website back in the day called ‘The Song Tapper’ where you could press your space bar to the rhythm of a song in your head and it would suggest which song(s) it might be. Teenage me thought that was very cool.
In the mid-ninties, around the time I had just become a teenager, I remember walking down the back corridor of a mall where my parents were leasing a space for their business and hearing a song playing overhead on the mall speakers that really caught my attention. I had no idea what the song was called or who made it, but I really liked it. I remember wishing I had some way to quickly find out, before the song ended, the name of the song and the artist. I remember thinking, "wouldn't it be great if this cell phone in my pocket could somehow tell me the name of this song?"
A decade later I discovered Shazam, and even today, more than a decade after that, Shazam still has a place on my home screen, quickly within reach, helping me discover hundreds of great artists and songs overheard from as many different places. The magic of the experience, and the appreciation for the technology, stem from the memory of that moment in the mid-nineties when I stood under a speaker listening to a song that I might never hear again.
If i remember correctly the main way shazam makes money is by seeling statistics to the record companies and concert planer. If they would break it, they would loose these informations in certain areas of the world where iphones are not that common.
It might have something to do with the big "Play Full Song" button that opens Apple Music. Since they already have access to their music catalog for fingerprinting and the app is mature, maybe it pays for itself in subscriptions?
That did cross my mind and probably explains it. I do wonder how much Apple Music revenue really comes from Android users via Shazam, but perhaps it's significant enough that they don't want to ruin it.
I have to believe that much of the reason for Apple buying Shazam was to know what songs piqued people’s interest. That type of data has to be valuable.
I too have always been blown away by Shazam and pondered how it could possible index so much content for fast lookups. A few years ago this article was super helpful in helping me understand and learn a lot. Fun read which required a lot of side googling for me
http://coding-geek.com/how-shazam-works/
Reminds me of a fun IRC moment 20 years ago or so. A buddy had a song stuck in his head, but he couldn't recall the name of it.
I asked how it went, and he typed something like "du du duu duu du du duu du, du du duu duu du du duu du" and within 10 seconds I replied "oh, Tom's diner by Suzanne Vega?" After a few moments he replied "yes! how the hell?!"
Anyway, Shazam is great when out and about and I hear something I like. Clubs and other loud venues provide a challenge, but covering the mic usually does the trick.
I'd love to read some more details about how such fingerprinting works. I'm sure there are lots of interesting details on how it deals with recording noise and such.
> I'm sure there are lots of interesting details on how it deals with recording noise and such.
There's more to Shazam than that but Fourier transforms gets rid of the noise. I ported a FFT to Java back in the days and it was, IIRC, not even 100 lines of code. Amazing algorithm. I used it to record engine noise under acceleration and then derive power/torque curve of my car (it took into account the number of cylinders): drive the car several times, both ways, on a street, record the noise. Apply the FFT. Input the rims size / gear ratio etc. And I'd end up with about the exact same plot as the official one from the car manufacturer.
+1 and also the fact that Google Pixel's "Now Playing" feature is such an amazing application of the same idea. Though I wonder how different are their implementations
Google's now playing feature is somehow always offline (to relieve privacy concerns) and is somehow still incredible at recognizing even obscure songs. Really impressive.
I also love that it just shows up on my lock screen.
Supposedly while building the backend, they realised the actual summary data for a reasonable breadth of tracks (say, anything you'd likely hear on the radio or on a jukebox) was tiny and so, why build a service at all when you can just ship the data to phones ?
Recently for whatever reason I was listening to the twist/ cover "Stacy's Dad" and Now Playing recognised it as the rather more famous original Fountain's of Wayne "Stacy's Mom". So yeah, it doesn't know everything. It also doesn't recognise lots of obscure stuff I own that's like B-sides or special editions that never saw radio play, or bands that my friends were in (but everybody I know has read both Steve Albini's "Some of your friends are probably already this fucked" and the KLF's "The Manual" and so none of them signed a recording contract and thus you've never heard of them) but I've never had a situation where I heard something I liked at like a bar or somewhere and Now Playing didn't know what it is.
yeah packaging the data and updating it async does make a lot more sense. Also, I guess its fine that it doesn't know it all but covers a good percentage of requests
Heck yeah! If you’ve got an iPhone you literally just have to say “Hey Siri, what’s this song” and it’ll start listening and give you the Apple Music link. The only indication it’s Shazam is a little understated badge at the bottom.
Shouldn't their patent expire soon? I remember them actively issuing cease and desist letters to anyone who made code tutorials about how their algo works.
Shazam belongs to that class of iPhone apps that when they were released I was like "wow, the future is here" -- this alongside the first accelerometer and AR ones
Ive always wondered why there is only 1 shazam. I was under the impression the algorithm/patent was just licensed from another company. Why havent people cloned this app? Also why arent the shazam geo charts more detailed ie by city/neighborhood or location. Would love to see what songs djs are playing at a location even if im not there. Right now the geo charts are very high level (only major cities)
The actual audio fingerprinting/music identification project associated with MusicBrainz is AcoustID (https://acoustid.org/). It's a separate project that happens to be well integrated with MusicBrainz.
For those of you who like a minimalist approach to apps, if you have an iPhone you don’t need any app to use the algorithm. Just say “what song is this” and Siri will employ the Shazam algorithm.
Even more obscure is the list of all songs you ID is maintained. Launch the “iTunes Store” app (why is this still even a thing) then tap the upper right hamburger menu. Then select the middle “Siri” tab. It’s kind of neat. I use this feature when I’m out and about in restaurants, stores, etc.
I’m confused - I love Shazam as much as the next person - but why is an app’s anniversary pushed to the top of Apple’s newsroom feed when there is no native write-up of comparable reach & understandability for their critical, already-exploited security update spanning the entire product line?
I use Huawei Music instead to identify songs. Shazam broke my heart when they announced they would sell their souls to Apple, and I was like f the american empire.
A lot of people don't realise that Shazam is built into iOS. You don't actually need the app. Just ask Siri "What's playing?" and it will start to listen.
Shazam was amazing when it came out but for some reason I ended up switching to Soundhound many years ago and haven't looked back. Maybe Shazam has this feature too but Soundhound would not only Id the song but if it's a popular enough song it will show the lyrics AND they'll be synced to the music.
I was interested to hear (elsewhere) that the service gets spammed, perhaps by people trying to boost ratings. But you can tell the caller is a bot because it tends to start from the beginning of the track every time, and/or play exactly N seconds. Humans don’t do those things.
The Shazam creators commissioned a movie to be made about their company - they considered it to be like "The Social Network" except them as the heroes.
Source: I watched the Shazam founder awkwardly pitch Danny Boyle at a director meet-and-greet while Danny tried his best to avoid them.
acoustid is an open-source database with an API; here's a list of already existing applications. beets is definitely CLI, not sure about some of the others:
I agree with the other folks here, Shazam is one of those things that still works just fine and I have no fucking clue on how they do it. What do they compare the audio recorded with?
They "fingerprint" the audio and then compare the fingerprint to a large corpus. The hard part is fingerprinting that is resilient to noise (and position within a track!) and a fast way to search the corpus of known fingerprints.
It is already folded into Siri. That's what happens if you ask "What song is this?" It just shows you the answer with the Shazam branding.
edit to add: I only know this so well because, when I switched to iPhone in 2016, I was REALLY confused about how you Shazam something. I couldn't find it in the App Store. I tried to use the Google app, since that's how I did the equivalent thing on Android, but it didn't support it. Finally I figured out that it's just built into Siri.
Oh I know it's included, by folded in I mean it's going to be completely removed as an additional app and the branding will disappear. It's just going to be a feature of Siri and nothing more at some point.
Apple acquired Siri from another company and kept the name. I doubt that they will kill the 20 year old Shazam brand. In fact on the control center, the Shazam icon is very prominent and it’s available as a complication on the watch.
If I ask Siri what song is playing, it doesn't say "thanks to Shazam" or something. It's already folded in to Siri to quite the degree. As the personal assistant war heats up, the drive to make Shazam disappear and force users to use Siri will become too strong to ignore.
ERGO I am surprised that they put out a PR about it.
Does it strike anyone else as being odd that Apple is noting this in their “Newsroom” press release? Don’t get me wrong, great for Shazam and all … but why is Apple - the company that bought them just 4 years ago - is making a todo about it?
I don't understand why you're so confused. Shazam is now part of Apple, they are not going to do a press release as "Shazam". The "Shazam" entity does not exist.
I think it's great. Too often, when a larger company acquires a smaller one, they try to erase all history and culture of the smaller company. Obviously, there is still a self-congratulatory tone to this press release, but I think it's nice that they're recognizing Shazam's past.
Google doesn't celebrate the history of Writely or Android-pre-Google that much and Microsoft don't promote the history of Excel-pre-Microsoft that much either.
Fun quick related story, about 10 or more years ago there was a back tracking song on a TV show (Scrubs) that I really liked that was only in the Netflix version. It was just an instrumental song with some French sounding words speaking in it so there was no easy way to search for it. However, it was distinct enough that it didn't seem like something made just for the show. It was also pretty quiet and under some talking in the tv show scene. I had posted on reddit asking if anyone knew it, and never got any responses. I searched all over the web, but no source had the track details. It drove me crazy every time I would hear the song in re-watching the show, and I still could not track it down every few years when I tried again. Back then, Shazam had no cataloging of it so it wasn't in there either yet. However, when re-watching it a few years back again, I tried Shazam again and to my surprise it finally worked. I was blown away that Shazam was finally able to solve this 10+ year mystery. It was one of the coolest feelings every to scratch that itch finding this rare French song and hearing it in full. It was truly magical.
EDIT: Oh sorry, I didn't think anyone would actually care about the song itself lol It was called "Sans Hésitation" by the French-Canadian band "Chapeaumelon". https://www.youtube.com/watch?v=Ju4d3YQhByU - It's also interesting cause now the song does in the episode in tv music database sites. Very cool.