Being able to run an FFT on audio signal in real time in a background thread see...

Ameo · on May 22, 2023

Definitely possible to run high-quality speech-to-text in realtime in the browser: https://whisper.ggerganov.com/

Made by the same guy who created the popular llama.cpp LLM library. The model uses log-mel spectrograms as input.

Using modern algorithms, FFT is actually really fast to compute. Definitely dwarfed by the evaluation of the model itself, even when using many threads and Wasm SIMD.

ioseph · on May 22, 2023

Isn't that already offered out of the box via WebAudio's `AnalyserNode` ? https://developer.mozilla.org/en-US/docs/Web/API/AnalyserNod...

bigger_cheese · on May 22, 2023

Yep looks like it - wasn't expecting browsers to have it baked in. I'm not all that familiar with what is available in the world of audio.

I've done some rudimentary signal analysis on other types of sensors - accelerometer (for detecting vibrations of specific frequency) and millivolt probes on a hardware instrument - I've used stuff like Matlab or Python to do the signal analysis.