It is practical, but it's important to have the core of your code optimised. In particular, you will need a very tight "multiply-accumulate" loop, as typically used in an FIR filter.
By way of real-world example, a 9600 bps P25 transceiver (with FEC) can be done on a C5500 DSP (200MHz) with about 10-20% CPU utilisation. As a further illustration, a vocoder that could not run in real-time was reduced to 5% CPU usage when a correlator (a single MAC loop in C) was replaced with the equivalent assembly that used a MAC instruction. 97% of the CPU time was being spent in that loop.
By way of real-world example, a 9600 bps P25 transceiver (with FEC) can be done on a C5500 DSP (200MHz) with about 10-20% CPU utilisation. As a further illustration, a vocoder that could not run in real-time was reduced to 5% CPU usage when a correlator (a single MAC loop in C) was replaced with the equivalent assembly that used a MAC instruction. 97% of the CPU time was being spent in that loop.