My guess is the reason that apple invested so heavily in this [0] is because they are going to train a big transformer in their datacenter and apply it as an RNN on your phone.
Superficially, I think this will work very well, but slightly worse than whisper (with the advantage ofc being that its better at real-time transcription).
Superficially, I think this will work very well, but slightly worse than whisper (with the advantage ofc being that its better at real-time transcription).
[0]https://machinelearning.apple.com/research/attention-free-tr...