If you never seen some of what audio restoration people do with a spectrograph view in Izotope RX, you should check it out. They clean up audio using a process somewhat akin to photoshopping images. The ears are still involved, but it's a very visual process.
I suspect doing it a lot leads to some purposeful synthesia where you can hear things and know what they would look like in a spectrograph.
The guy who solved voice recognition at Google years ago told me that after looking at enough spectrograms with human speech he could tell exactly what was being said. "Oh, that shape- it's an 'ooo' noise."
Armed with that knowledge I started looking at more spectrograms of birds (I record birds around my house and pass through bird.net's model. Amusingly, I kind of learned eveyrthing in this article just by looking at lots of spectrograms- they are far, far easier for me to parse than any other musical representation.
I just found 'demucs' and am really enjoying doing source separation and then looking at the resulting spectrograms, then back at the original full musical spectrogram- it helps me parse out the instruments a lot better when they overlap.
There is also a product called 'Celemony DNA' which can do some editing in spectrogram space to remove individual notes.
Spectrograms are used in speech therapy to help with pronunciation. I remember using one such program to correct my own speech, at elementary school in the 90s. It was even a game: I moved a cursor through a maze and it would only move in the correct direction when my sound I made was correct. The game had a spectrogram for additional visual feedback -- my speech therapist could point to an area on it and tell me that I was placing my tongue incorrectly or whatever. Its a really cool example of the power of biofeedback.
Vietnamese and Cantonese are tonal languages that has 6 levels phonetic tones [1]. The linear time-frequency spectogram is very useful for this type of human language analysis and not only for bird. Would be interesting to apply non-linear time-frequency analysis for this domain as well.
Heck, if you don't even know that modern editors come with a "why would I look at an amplitude when the information we care about is in the spectrogram?" view, you're going to learn something pretty special.
The audio is a landscape, and anything that isn't a "global" operation (like normalizing, noise reduction, cross-fades) ranging from from just touching it up a but literal sound-from-nothing is about sculpting that landscape.
Rather than Photoshop, it's more like 3D Modeling for your audio. (while using a top-down view in RX, although some tools will let you toggle between 2d and 3d because seeing the landscape is quite valuable)
Would love a podcast with Lex Fridman, Brian May and DeGrasse Tyson. (Michio can sit this one out, as he as transparent. I see plans within plans. I see yc, venture backing behind it.)
Pieplow's books on bird song are fantastic. Each page has a tiiiiny picture of the bird up in the corner, then a big spread of spectrograms and a decent chunk of text to give the spectrograms context: eg, what are the 'typical' parts of it, vs what varies, how likely it is that subsequent vocalizations are the same/different, what to expect in variations by sex and age, and so on.
But my favorite part of the book is the introduction. In addition to just a great intro to reading spectrograms and what's up with bird vocalization physically (they have a syrinx, not a larynx), there's a lengthy section with a kind of 'phonetics' of bird song. It's great.
there are quite a few bird species that are so similar that the only reliable way to tell them apart is their vocalizations. a good example is red eyed vireos and black whiskered vireos.
I am terrible with calls, this article will be very helpful for the fall migration, thanks OP.
This even extends to insects - the common green lacewing Chrysoperla carnea was found to be multiple morphologically identical species which can only be distinguished by their mating songs:
After listening to some of the samples and thinking how nice they were, I've just also noticed the morning bird calls outside my home office window, which my mind usually filters out and ignores...
I suspect doing it a lot leads to some purposeful synthesia where you can hear things and know what they would look like in a spectrograph.
Here's the first example I could find: https://www.youtube.com/watch?v=XKNYYR-uUEo