If you use the CPU that we used for testing yes. It's an Intel CPU. The detail is in readme. You can do 30 streams per core. If you have 2 cores you can do 60.
With such amazing efficiency, I would suggest forking off a second branch that massively increasing complexity for gains in your WER. CPU is just going to keep getting cheaper and faster, and being able to leverage the extra cycles for a platform that has them would allow you to dominate from embedded up to the Xeon space.