rdamico's favorites | Hacker News

Also interesting/related:

In speech recognition, we usually use log Mel features, or MFCC features, which do a Fourier transformation on the raw audio frames.

You can also train neural networks directly on the raw audio features. When you do so, you can inspect the weights of the first layer (convolutional layer), and you see that it pretty much learned the short Fourier transformation.

https://www-i6.informatik.rwth-aachen.de/publications/downlo...