We already know how to stream raw 16-bit uncompressed samples at 32KHz [1]; so we have just shy of CD-quality audio already. There's really no way to make things sound better than we have now.
The remaining enhancements are to optimize things: produce better sound while using less ROM space, use less CPU power, and be able to still run as many simultaneous sound effects as possible.
It took a long time to perfect because it's nuanced, but it's not a very complex chip: it's around 20KB of C code to emulate its 32-stage state machine. The only part of the SNES DSP that we don't fully emulate right now is the slight decay rate (in the nanosecond range) involved when the MUTE flag is enabled on a channel. And that's mostly because it's an analog component and we don't have the right people to analyze that sort of thing properly.
The remaining enhancements are to optimize things: produce better sound while using less ROM space, use less CPU power, and be able to still run as many simultaneous sound effects as possible.
It took a long time to perfect because it's nuanced, but it's not a very complex chip: it's around 20KB of C code to emulate its 32-stage state machine. The only part of the SNES DSP that we don't fully emulate right now is the slight decay rate (in the nanosecond range) involved when the MUTE flag is enabled on a channel. And that's mostly because it's an analog component and we don't have the right people to analyze that sort of thing properly.
[1] https://www.youtube.com/watch?v=p_60V8UdYEY