Hacker News new | past | comments | ask | show | jobs | submit login

Being able to run an FFT on audio signal in real time in a background thread seems really cool. Not sure how feasible it is but could be the basis for speech recognition engine running in browser (I'm not expert but I know Speech detection machine learning models are trained on range of FFT derived signals) - if you can pipe the FFT output to ML endpoint you could potentially have speech to text generation running in the browser - or something like automatic closed caption generation from an incoming audio signal.



Definitely possible to run high-quality speech-to-text in realtime in the browser: https://whisper.ggerganov.com/

Made by the same guy who created the popular llama.cpp LLM library. The model uses log-mel spectrograms as input.

Using modern algorithms, FFT is actually really fast to compute. Definitely dwarfed by the evaluation of the model itself, even when using many threads and Wasm SIMD.


Isn't that already offered out of the box via WebAudio's `AnalyserNode` ? https://developer.mozilla.org/en-US/docs/Web/API/AnalyserNod...


Yep looks like it - wasn't expecting browsers to have it baked in. I'm not all that familiar with what is available in the world of audio.

I've done some rudimentary signal analysis on other types of sensors - accelerometer (for detecting vibrations of specific frequency) and millivolt probes on a hardware instrument - I've used stuff like Matlab or Python to do the signal analysis.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: