I should note, we don't care about throughput for the most part. Our constraint is purely the memory use of holding open the connections. The aim is to hold as many connections as possible within 10-20% of the machines RAM, and not exceed it. As such, we need to be careful about resource usage and spikes.
Goroutines feel cheap, but if you're holding 140k connections, and just 20k of them do something that spins up a goroutine each... you can easily exceed the memory constraint. As such, we had to put goroutine pools in place, careful select statements around them from connections to ensure we didn't overwhelm external resources, etc. It was a huge pain. It has been drastically easier to control resource usage with these constraints under python/twisted.
YMMV, of course, this is just our experience. Part of the reason for putting it out there is that there are already many people who have talked/blogged about going from Python -> Go. I thought maybe the world could handle just one story about going the other direction.
Typically if you wish to limit the number of goroutines you would spawn N workers and have them read from a single channel. If 20k of your incoming connections want to do something they send on the channel, without spawning a goroutine themselves.
Yep, this is what I meant by 'goroutine pools'. The select statements were on the sending side to ensure if the feed channel was full we wouldn't retain too much additional state. It works, but at that point its starting to look like an async event-loop with a thread-pool....
Not exactly related to Go/PyPy, but I'm curious whether you can say something about how you handle memory and bandwidth constraints?
E.g. what do you do if you want to send notifications to lots of clients but for some the connection is very slow (you would probably need to buffer the data)? Do you have hard limits of maximum buffered data until you close the connection? End to end backpressure (for which channels are quite good) doesn't seem like the best option for 1:N broadcasts, because then the slowest receiver slows down all others.
And what do you do with connections which are sending you lots of (probably unexcepted) data? Stop reading from that socket?
We're using twisted, but I believe Python 3's asyncio has a similar feature with use of non-blocking sockets, which is that you can add a hook to be triggered when too much data accumulates in user-space (can't be flushed to the kernel's tcp buffer).
In our case, when notifications buffer for a slow client, this API gets triggered and we mark the client connection as 'paused'. Until that state is cleared by more data getting to the client, notifications go to the database instead with just a flag on the client connection to check the db when the pending data was retrieved.
We do a similar thing on the receiving end to pause reading off the socket if we're already doing more work on behalf of the client at once than desired.
Goroutines feel cheap, but if you're holding 140k connections, and just 20k of them do something that spins up a goroutine each... you can easily exceed the memory constraint. As such, we had to put goroutine pools in place, careful select statements around them from connections to ensure we didn't overwhelm external resources, etc. It was a huge pain. It has been drastically easier to control resource usage with these constraints under python/twisted.
YMMV, of course, this is just our experience. Part of the reason for putting it out there is that there are already many people who have talked/blogged about going from Python -> Go. I thought maybe the world could handle just one story about going the other direction.