Hacker News new | past | comments | ask | show | jobs | submit login

Hello HN! Author here. I was thinking to call the post "The underappreciated complexity of musical sounds" but decided to stick with the DFT one as it would probably get more attention. This is a small discovery I came across this weekend. FFT-based spectrograms of musical instruments isn't a novel thing do, but I thought what if I do a super highres spectrogram with a continuum of freqencies, instead of the N fixed ones FFT gives. Turns out, FFT "supports" such frequency shifting by multiplying the input by a specially constructed complex exponent. As a result, I've found out that musical instruments produce sophisticated ornaments in between the harmonic levels.



Did I understand this correctly, what you are doing is essentially:

X[n] = F[x[k]][n/2] if (n even) else F[x'[k]][(n+1)/2]

With F[x[k]] the DFT of the time-domain signal x[k], x'[k] = x[k]·exp(2·pi·i·k·alpha) and this alpha some constant which yields a frequency-domain shift by 25Hz.

If so: How does this method compare to zero-padding the time-domain signal (i.e. sinc-interpolating the frequency domain)? It is an interesting concept, but alas it's not immediately clear to me how to analyze this...


This sounds about right. I assume your (n+1)/2 is really n+1/2. The idea, like you've said, is to get Y[k+1/2] values where Y = FFT[X].

Whether this is mathematically sound is another question. I presume that it is, for two reasons. First, FFT essentially convolves X with a bunch of sinusoids with frequencies from a fixed set: 0 Hz, 50 Hz, 100 Hz and so on. There's nothing wrong with manually convolving X with a 57.3 Hz sinusoid, it's just FFT isn't designed for this (it's designed for rapid computation). The other reason is that combining such shifted FFTs we get what looks almost exactly like a CWT (i.e. wavelet transform).

As for sinc-interpolation, I think it's mathematically equivalent. Say we shift the input X with Z[k] = exp(ik/N...) and get XZ. Then we transform it to FFT[XZ] = FFT[X] conv FFT[Z], so it's convolving FFT[X] with FFT[Z] where FFT[Z] is probably that sinc kernel. I certainly know from experiments that FFT of exp(2·pi·i·k·alpha) where alpha doesn't precisely align with the 1024 grid produces a fuzzy function with a max around alpha and a bell-shaped curved around it, the width of the curve depends on how precisely alpha fits into one of the 1024 grid points.


Instead of combining 2 FFTs of 1024 bins (one shifted + one non-shifted), could you not just calculate 1 FFT of 2048 bins? Isn't it the same result?


Larger FFT window has undesired side effects because the estimated frequencies are averaged over the entire window. Moreover, the FFT output always spans from 0 Hz to 24 kHz (with 48 kHz sample rate), so to zoom into the 0..6 kHz region we'll need a window with 8192 bins or about 150 ms. With such window it would be impossible to capture rapid volume oscillations.


My mind is kind of blown that birdsong virtually does not include higher harmonics. I didn’t even think that was possible for a physical resonator. Great post


I think the mystery has a simple explanation: when a bird sings at 7 kHz and the mp3 file captures only first 20 kHz, there isn't much room for harmonics. Maybe birds do have interesting harmonics at 56 kHz, we just don't know.


maybe they were not captured by the bandiwth-limited microphone ?


Thanks for writing this up, I'm always on the look out for alternative methods for DFTs and the like, currently concentrating on interpolation of low frequencies (after DC, but still within the first 5% of wave numbers) . I'll see if this fits my use case soon, hopefully today.


What is an ornament?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: