Hello HN! Author here. I was thinking to call the post "The underappreciated com...

cviilgan · on May 3, 2021

Did I understand this correctly, what you are doing is essentially:

X[n] = F[x[k]][n/2] if (n even) else F[x'[k]][(n+1)/2]

With F[x[k]] the DFT of the time-domain signal x[k], x'[k] = x[k]·exp(2·pi·i·k·alpha) and this alpha some constant which yields a frequency-domain shift by 25Hz.

If so: How does this method compare to zero-padding the time-domain signal (i.e. sinc-interpolating the frequency domain)? It is an interesting concept, but alas it's not immediately clear to me how to analyze this...

ssgh · on May 3, 2021

This sounds about right. I assume your (n+1)/2 is really n+1/2. The idea, like you've said, is to get Y[k+1/2] values where Y = FFT[X].

Whether this is mathematically sound is another question. I presume that it is, for two reasons. First, FFT essentially convolves X with a bunch of sinusoids with frequencies from a fixed set: 0 Hz, 50 Hz, 100 Hz and so on. There's nothing wrong with manually convolving X with a 57.3 Hz sinusoid, it's just FFT isn't designed for this (it's designed for rapid computation). The other reason is that combining such shifted FFTs we get what looks almost exactly like a CWT (i.e. wavelet transform).

As for sinc-interpolation, I think it's mathematically equivalent. Say we shift the input X with Z[k] = exp(ik/N...) and get XZ. Then we transform it to FFT[XZ] = FFT[X] conv FFT[Z], so it's convolving FFT[X] with FFT[Z] where FFT[Z] is probably that sinc kernel. I certainly know from experiments that FFT of exp(2·pi·i·k·alpha) where alpha doesn't precisely align with the 1024 grid produces a fuzzy function with a max around alpha and a bell-shaped curved around it, the width of the curve depends on how precisely alpha fits into one of the 1024 grid points.

ggerganov · on May 4, 2021

Instead of combining 2 FFTs of 1024 bins (one shifted + one non-shifted), could you not just calculate 1 FFT of 2048 bins? Isn't it the same result?

ssgh · on May 4, 2021

Larger FFT window has undesired side effects because the estimated frequencies are averaged over the entire window. Moreover, the FFT output always spans from 0 Hz to 24 kHz (with 48 kHz sample rate), so to zoom into the 0..6 kHz region we'll need a window with 8192 bins or about 150 ms. With such window it would be impossible to capture rapid volume oscillations.

_dh54 · on May 3, 2021

My mind is kind of blown that birdsong virtually does not include higher harmonics. I didn’t even think that was possible for a physical resonator. Great post

akomtu · on May 3, 2021

I think the mystery has a simple explanation: when a bird sings at 7 kHz and the mp3 file captures only first 20 kHz, there isn't much room for harmonics. Maybe birds do have interesting harmonics at 56 kHz, we just don't know.

ttoinou · on May 3, 2021

maybe they were not captured by the bandiwth-limited microphone ?

Prcmaker · on May 4, 2021

Thanks for writing this up, I'm always on the look out for alternative methods for DFTs and the like, currently concentrating on interpolation of low frequencies (after DC, but still within the first 5% of wave numbers) . I'll see if this fits my use case soon, hopefully today.

stainforth · on May 3, 2021

What is an ornament?