I appreciate these war stories more than the "look at this great new thing that ...

sethammons · on Oct 17, 2015

I should reach out to our team that took python/twisted dealing with sockets and lots of concurrency and ported to Go and see if they would put together a similar presentation. Our case is a bit different, but we saw over 130x improvement in throughput going to Go. While they were in there, they increased monitoring, stability, and maintainability. More case studies to help others make informed choices. Sending that email now :)

[edit: grammar]

windlep · on Oct 17, 2015

I should note, we don't care about throughput for the most part. Our constraint is purely the memory use of holding open the connections. The aim is to hold as many connections as possible within 10-20% of the machines RAM, and not exceed it. As such, we need to be careful about resource usage and spikes.

Goroutines feel cheap, but if you're holding 140k connections, and just 20k of them do something that spins up a goroutine each... you can easily exceed the memory constraint. As such, we had to put goroutine pools in place, careful select statements around them from connections to ensure we didn't overwhelm external resources, etc. It was a huge pain. It has been drastically easier to control resource usage with these constraints under python/twisted.

YMMV, of course, this is just our experience. Part of the reason for putting it out there is that there are already many people who have talked/blogged about going from Python -> Go. I thought maybe the world could handle just one story about going the other direction.

mrits · on Oct 17, 2015

I miss the time when you bragged about increasing performance without resulting in having to switch frameworks or languages.

windlep · on Oct 17, 2015

So do I! Eventually you 'top-out' in a language/framework though... and then its all tears.

Jabbles · on Oct 17, 2015

Typically if you wish to limit the number of goroutines you would spawn N workers and have them read from a single channel. If 20k of your incoming connections want to do something they send on the channel, without spawning a goroutine themselves.

Did you try something like that?

windlep · on Oct 17, 2015

Yep, this is what I meant by 'goroutine pools'. The select statements were on the sending side to ensure if the feed channel was full we wouldn't retain too much additional state. It works, but at that point its starting to look like an async event-loop with a thread-pool....

zimbatm · on Oct 17, 2015

How do these 20k connection feed the channel without being themselves managed by goroutines ?

One thing I wish was possible in go is being able to use the `select` keyword with both channels and IO.

Matthias247 · on Oct 17, 2015

Not exactly related to Go/PyPy, but I'm curious whether you can say something about how you handle memory and bandwidth constraints?

E.g. what do you do if you want to send notifications to lots of clients but for some the connection is very slow (you would probably need to buffer the data)? Do you have hard limits of maximum buffered data until you close the connection? End to end backpressure (for which channels are quite good) doesn't seem like the best option for 1:N broadcasts, because then the slowest receiver slows down all others.

And what do you do with connections which are sending you lots of (probably unexcepted) data? Stop reading from that socket?

windlep · on Oct 17, 2015

We're using twisted, but I believe Python 3's asyncio has a similar feature with use of non-blocking sockets, which is that you can add a hook to be triggered when too much data accumulates in user-space (can't be flushed to the kernel's tcp buffer).

In our case, when notifications buffer for a slow client, this API gets triggered and we mark the client connection as 'paused'. Until that state is cleared by more data getting to the client, notifications go to the database instead with just a flag on the client connection to check the db when the pending data was retrieved.

We do a similar thing on the receiving end to pause reading off the socket if we're already doing more work on behalf of the client at once than desired.

twisted documents this as producer/consumer: http://twisted.readthedocs.org/en/twisted-15.4.0/core/howto/...

trentnelson · on Oct 17, 2015

Why 10-20% of RAM? How much RAM does each machine have? What else are they doing? Are they virtualized?

benji-york · on Oct 17, 2015

He said "within 10-20% of the machines RAM", i.e., utilizing 80-90% of the machine's RAM, without exceeding it.

cs702 · on Oct 18, 2015

Amen.

This post reminds me of another post I recently saw on HN, in which the author (someone with an Erlang background) lays out all sorts of reasons why he chose Ruby for a highly concurrent application that launches lots of (heavyweight) threads. Upon seeing the link on HN, my first thought was, Ruby!!?? But then I read the post and the reasons were all very sensible and practical-minded, so in that case Ruby was arguably a much better choice than Erlang, Go, Scala, Rust, etc. for a highly concurrent application.

Edit: here's the post I mentioned about Ruby being used for a highly concurrent application: https://news.ycombinator.com/item?id=10394450

truncate · on Oct 17, 2015

Just wanted to share my own very small case study. I had a homework assignment to build a polite crawler. I initially built it in Python, and it was awfully slow. I rewrote same thing in Go, it turned out to very very fast (10x at least IIRC). I liked the fact how quickly I was able to write something so quickly (with not so shabby design) in Go with so much less experience in it. Go is definitely awesome for writing concurrent code quickly. Its not big industry story, but as a busy student I still feel great about using Go. Reason being, we had to use same crawler for doing other stuff, for which a fast crawler was really handy and saved me hours.

The problems I observed with Go was that its regex seemed to be slower than Python, and memory usage was way higher. I explicitly added some GC requests.

crdoconnor · on Oct 17, 2015

Did you try pypy before rewriting?

truncate · on Oct 17, 2015

Didn't solve my problem. PyPy still retains multi-threading and GIL stuff if I remember correctly.

lqdc13 · on Oct 17, 2015

Did you try requests_futures lib? It's all about async and network speed with crawling. Not so much cpu.

truncate · on Oct 17, 2015

I thought it would be IO bound (that's why I started with Python at first place), but since I was extracting links as well and working a bit on graph it turned out to be more CPU intensive. But well, maybe I could have written better code, better libraries, maybe multiprocessing (would have been painful though with multiprocessing). I do admit, I didn't look much into how I could improve it within Python. I just went with Go because it was quicker that way for me.

lqdc13 · on Oct 17, 2015

well.. extracting links etc is super fast with lxml's xpath. It is written in C, and I don't think it would be faster if you write your own parser.

For example, to extract links from hacker news homepage, you would just do

    xpath('//tr/td[@class="title"]/a/@href')

This will be really fast. You can do it even faster with a more specific xpath. I extracted about 10k links a second from documents this way and was still network bound. Usually you are primarily limited by websites throttling you.

truncate · on Oct 18, 2015

I was using beautifulsoup with lxml backend I believe. I should have mentioned earlier. There were some other graph manipulation stuff too, like favoring links with more inlinks, keeping web crawler polite but still busy by looking at other domains. This is more expensive that extracting links I guess. I had a submission deadline, but whatever I tried in that time with Python didn't work. It was just easier to write faster code in Go (except maybe where regex are involved, now I remember I used some Go markup parser instead that is now in their library).