I have some iPython notebooks for speech analysis using a Chinese corpus. I used those for a tutorial on machine learning with Python and unfortunately they are still a bit incomplete, but maybe you find them useful nevertheless (no deep learning involved though). What I do in the tutorial is to start from a WAV file and then go through all the steps required for analyzing the data (using a "traditional" approach), i.e. generate the Mel-Cepstrum coefficients of the segmented audio data and then train a model to distinguish individual words. Word segmentation is another topic that I touch a bit, and where we can also use machine learning to improve the results.
Here's a version with very simple speech training data (basically just different syllables with different tones):
For part 2 I also train a simple neural network with lasagne (a Python library for deep learning), and I plan to add more deep learning content and do a clean write-up of the whole thing as soon as I have some more time.
Here's a version with very simple speech training data (basically just different syllables with different tones):
https://github.com/adewes/machine-learning-chinese/blob/mast...
More complex speech training data (from a real-world Chinese speech corpus [not included but downloadable]):
https://github.com/adewes/machine-learning-chinese/blob/mast...
There are other parts of the tutorial that deal with Chinese text and character recognition as well, if you're interested:
https://github.com/adewes/machine-learning-chinese
For part 2 I also train a simple neural network with lasagne (a Python library for deep learning), and I plan to add more deep learning content and do a clean write-up of the whole thing as soon as I have some more time.