If Discord is basically proxying the raw packets from one client to the others, isn't that wasted bandwidth (for discord, not the clients). I understand from the post that the goal would be to mask the ip of the users, to shoulder user privacy and the ddos vector. Kudos on silence detection to save overhead.
So video w/audio broadcasting has to be compressed client side, then proxied through Discord's media servers, to the end user's. That's pretty smart...I just wished that I could send my raw stream to a LAN host so I could offload the compression, and allow my LAN host to provide delivery (I'm a nitro user).
Would rather waste bandwidth than CPU cycles in this case. Would take way too much CPU time to mux audio streams together server-side, and then recompress. (Means we have to buffer data for each sender, deal with silence, deal with retransmits and packet drops, have a jitter buffer, etc...). No way we'd be able to hit the # of clients we want per core with that overhead. Our SFU's are intentionally very dumb for this reason.
Also, muxing server side means we can't do things like per-peer volume and muting, without having to individually mux and re-encode for each user in the channel depending on who they have muted and the volumes they have set per peer (which would explode CPU complexity even further).
So, in this case, bandwidth is cheap, let's use (and waste) some, in an effort to simplify the SFU, and also, make it more CPU efficient. Default audio stream is 64kbps (or 8 KB/sec), per speaking user.
So video w/audio broadcasting has to be compressed client side, then proxied through Discord's media servers, to the end user's. That's pretty smart...I just wished that I could send my raw stream to a LAN host so I could offload the compression, and allow my LAN host to provide delivery (I'm a nitro user).