Hacker News new | past | comments | ask | show | jobs | submit login
From Python to Go and Back Again (docs.google.com)
366 points by azth on Oct 16, 2015 | hide | past | favorite | 165 comments



I appreciate these war stories more than the "look at this great new thing that will take over the world" posts (those have a place as well). We need more war stories in this industry because everything has pros and cons, and our job as software engineers is to be able to make decisions based on limited information. Case studies are great way to glean real-world experience from others without having to implement every new technology yourself in order to make high-level assumptions about that technology.

This article shouldn't say to you: "See, Go is BAD, Python is GOOD!" It should say, "That's an interesting case study. If I'm working on a project that involves lots of sockets and concurrency, I'll want to take what they said into account when I'm making technology decisions."


I should reach out to our team that took python/twisted dealing with sockets and lots of concurrency and ported to Go and see if they would put together a similar presentation. Our case is a bit different, but we saw over 130x improvement in throughput going to Go. While they were in there, they increased monitoring, stability, and maintainability. More case studies to help others make informed choices. Sending that email now :)

[edit: grammar]


I should note, we don't care about throughput for the most part. Our constraint is purely the memory use of holding open the connections. The aim is to hold as many connections as possible within 10-20% of the machines RAM, and not exceed it. As such, we need to be careful about resource usage and spikes.

Goroutines feel cheap, but if you're holding 140k connections, and just 20k of them do something that spins up a goroutine each... you can easily exceed the memory constraint. As such, we had to put goroutine pools in place, careful select statements around them from connections to ensure we didn't overwhelm external resources, etc. It was a huge pain. It has been drastically easier to control resource usage with these constraints under python/twisted.

YMMV, of course, this is just our experience. Part of the reason for putting it out there is that there are already many people who have talked/blogged about going from Python -> Go. I thought maybe the world could handle just one story about going the other direction.


I miss the time when you bragged about increasing performance without resulting in having to switch frameworks or languages.


So do I! Eventually you 'top-out' in a language/framework though... and then its all tears.


Typically if you wish to limit the number of goroutines you would spawn N workers and have them read from a single channel. If 20k of your incoming connections want to do something they send on the channel, without spawning a goroutine themselves.

Did you try something like that?


Yep, this is what I meant by 'goroutine pools'. The select statements were on the sending side to ensure if the feed channel was full we wouldn't retain too much additional state. It works, but at that point its starting to look like an async event-loop with a thread-pool....


How do these 20k connection feed the channel without being themselves managed by goroutines ?

One thing I wish was possible in go is being able to use the `select` keyword with both channels and IO.


Not exactly related to Go/PyPy, but I'm curious whether you can say something about how you handle memory and bandwidth constraints?

E.g. what do you do if you want to send notifications to lots of clients but for some the connection is very slow (you would probably need to buffer the data)? Do you have hard limits of maximum buffered data until you close the connection? End to end backpressure (for which channels are quite good) doesn't seem like the best option for 1:N broadcasts, because then the slowest receiver slows down all others.

And what do you do with connections which are sending you lots of (probably unexcepted) data? Stop reading from that socket?


We're using twisted, but I believe Python 3's asyncio has a similar feature with use of non-blocking sockets, which is that you can add a hook to be triggered when too much data accumulates in user-space (can't be flushed to the kernel's tcp buffer).

In our case, when notifications buffer for a slow client, this API gets triggered and we mark the client connection as 'paused'. Until that state is cleared by more data getting to the client, notifications go to the database instead with just a flag on the client connection to check the db when the pending data was retrieved.

We do a similar thing on the receiving end to pause reading off the socket if we're already doing more work on behalf of the client at once than desired.

twisted documents this as producer/consumer: http://twisted.readthedocs.org/en/twisted-15.4.0/core/howto/...


Why 10-20% of RAM? How much RAM does each machine have? What else are they doing? Are they virtualized?


He said "within 10-20% of the machines RAM", i.e., utilizing 80-90% of the machine's RAM, without exceeding it.


Amen.

This post reminds me of another post I recently saw on HN, in which the author (someone with an Erlang background) lays out all sorts of reasons why he chose Ruby for a highly concurrent application that launches lots of (heavyweight) threads. Upon seeing the link on HN, my first thought was, Ruby!!?? But then I read the post and the reasons were all very sensible and practical-minded, so in that case Ruby was arguably a much better choice than Erlang, Go, Scala, Rust, etc. for a highly concurrent application.

Edit: here's the post I mentioned about Ruby being used for a highly concurrent application: https://news.ycombinator.com/item?id=10394450


Just wanted to share my own very small case study. I had a homework assignment to build a polite crawler. I initially built it in Python, and it was awfully slow. I rewrote same thing in Go, it turned out to very very fast (10x at least IIRC). I liked the fact how quickly I was able to write something so quickly (with not so shabby design) in Go with so much less experience in it. Go is definitely awesome for writing concurrent code quickly. Its not big industry story, but as a busy student I still feel great about using Go. Reason being, we had to use same crawler for doing other stuff, for which a fast crawler was really handy and saved me hours.

The problems I observed with Go was that its regex seemed to be slower than Python, and memory usage was way higher. I explicitly added some GC requests.


Did you try pypy before rewriting?


Didn't solve my problem. PyPy still retains multi-threading and GIL stuff if I remember correctly.


Did you try requests_futures lib? It's all about async and network speed with crawling. Not so much cpu.


I thought it would be IO bound (that's why I started with Python at first place), but since I was extracting links as well and working a bit on graph it turned out to be more CPU intensive. But well, maybe I could have written better code, better libraries, maybe multiprocessing (would have been painful though with multiprocessing). I do admit, I didn't look much into how I could improve it within Python. I just went with Go because it was quicker that way for me.


well.. extracting links etc is super fast with lxml's xpath. It is written in C, and I don't think it would be faster if you write your own parser.

For example, to extract links from hacker news homepage, you would just do

    xpath('//tr/td[@class="title"]/a/@href')
This will be really fast. You can do it even faster with a more specific xpath. I extracted about 10k links a second from documents this way and was still network bound. Usually you are primarily limited by websites throttling you.


I was using beautifulsoup with lxml backend I believe. I should have mentioned earlier. There were some other graph manipulation stuff too, like favoring links with more inlinks, keeping web crawler polite but still busy by looking at other domains. This is more expensive that extracting links I guess. I had a submission deadline, but whatever I tried in that time with Python didn't work. It was just easier to write faster code in Go (except maybe where regex are involved, now I remember I used some Go markup parser instead that is now in their library).


Most people who rewrote their apps from X to Go and saw improved performance and readability benefited much more from the rewrite than they did from Go. At best, the fact that Go has a relatively weak ecosystem, means that they had to write from scratch a lot of things they were getting for free in X. But, because in X is was a library, they only used 5% of the features, but paid a high performance cost and had a complex API to work with.

Go's a good language for some things. But it does nothing special or significant to close the massive productivity gap between dynamic and static language. Yes, it's terse compared to many other static language and it has stuff like implicit interfaces, but those are superficial (but nice) things when it comes to what and how you do things in dynamic land. Go might actually be a step back due to its poor type system and poor reflection capabilities.

I think Go's great for a seemingly new breed of "infrastructure" systems which is becoming more important due to how systems are starting to be designed (services hosted in the cloud). It's great for building CLI which don't require your customers to install anything else. And it's good for a services / apps that needs to share memory between threads (which, in my experience, is where dynamic languages really start to fail).

But for a traditional web app / service? It's horrible. At least as horrible as most static languages. It sucks for talking to databases (more than almost any other language I've used). It sucks for dealing with user-input. Like most static languages, the stuff you need to do to handle a request, which has essentially 1ms of life, is cumbersome, error prone, slow, inflexible and difficult to test.


This is patently absurd. It's mind boggling that people still hop on here and declare how something is "amazing" or "horrible" for a particular problem-set as fact.

You want my anecdote? Go is brilliant for web services. We've decreased server costs significantly while decreasing response times by orders of magnitude for write-heavy APIs. Concurrency primitives that do bleed into parallelism have made a mockery of interpreted dynamic languages.

But don't believe me. Ask Cloudflare, or Google, or Dropbox, or any other number of companies how horrible Go has been.

But just for shits and grins, I'll bet it'd take me moments to find people in situations where Go didn't meet their needs or domain requirements.

Please stop with the absolutes. They're absolutely ridiculous.


What is the point of your comment? Your doing the same thing your telling people not to do. You're not even disagreeing with the previous post which essentially states the same things you do. The only thing your comment tells me is that which developers you work with are far more important than the language itself.


> Most people who rewrote their apps from X to Go and saw improved performance and readability benefited much more from the rewrite than they did from Go

Precisely, and isn't this presentation the perfect example of this phenomenon?

* Initially we had an implementation in language X

* We then rewrote it in language Y - the lessons learned by making the system anew (this time knowing the exact problematic spots, what really to optimize for, etc) - we got a better system. Long live language Y

* We then rewrote it in language X - the lessons learned [...] - we got a better system. Long live language X

Good programmers can productively write good and fast code in C, Python, Java, Go, or whatever. The skill of the developer and the understanding of the problem matters much more than the programming language


Regardless of how good you are, its easier to write code that has a predictable runtime memory footprint if you can actually predict how things will behave. Predicting an M:N scheduler is.... not easy.

This is part of why Rust removed the M:N scheduler and light-weight threads before Rust 1.0. It's hard to predict your memory use if the run-time is going to be creating/destroying OS threads, and juggling your lightweight threads (goroutines/etc) between real OS threads.

I agree entirely that the next iteration you write to solve the same problem is going to be better than the one before. The problem-scope is well defined, and you're already familiar with it and where the prior implementation was lacking. In this case the extra predictability of knowing what was going to be occurring at once did help.


Possibly, the better understanding of the problem domain would make a rewrite from scratch better than the existing stuff. That the improvements have little to do with the language in which the newly conceived solution was implemented.


I mentioned during the talk, that the reason for giving it a spin in Python in the first place was that we had to fill in gaps in a Go library. While there's lots of Go libraries, many of them are immature, and have chunks of functionality missing.

So while I was blocked on a coworker filling in one of the gaps we both needed, I was able to rewrite it all in Python.

I'll probably give this talk again at a venue where it'll be recorded, which should add a lot of missing context to these slides.


It is unusual to claim that programming in a statically typed language is more error-prone than in a dynamically typed language, even if only when dealing with HTTP requests specifically. Could you elaborate? It sounds like there might be a story behind this.


Thanks for calling me out on that, I might have a hard time justifying the claim. It was a reference to the fact that, at your system boundaries the benefit of a strongly-typed language aren't only offset by the lack of flexibility, but, in the case of Go, it's weak reflection capabilities and type system.

A web app has 4 (often more, rarely less) such boundary:

- Getting input from users

- Querying a database

- Getting results from a database

- Outputting results to the user (html, json, ....)

Within these boundaries, yes, static languages are less error prone. But you get no compile time checks AT the boundaries. You'll need integration tests (and it's easier to write tests in a dynamic language (where IoC is a language feature) than static languages.

You deal with these boundaries via automated mapping (with annotations, or external files (like in Hibernate)) or manual mapping. Automated mapping might not be much more error prone, but it's certainly much more cumbersome (especially with weak reflection). Manual mapping is also much more cumbersome. Does this cumbersomeness make it more error prone? I don't think it helps.


>- Getting input from users

That data is always a string (given the nature off HTTP requests). So the only issue there is converting the string to an integer when necessary. But since you should cleanse any data that arrives via HTTP request, you'd need to validate that your "integers" are actually purely numeric even in dynamically typed languages. So there's really no extra work there between dynamic and static languages.

>- Querying a database

You'd be querying an SQL database with either parametrised queries or ORMs. Both of which are data-type agnostic (ie you wouldn't be needing to convert integers into strings to embed into SQL strings).

As for No-SQL databases, there might be an issue with some and statically typed languages. But that's not an issue I've ran into with the languages and APIs against the (admittedly limited) range of no-SQL databases I've used.

>- Getting results from a database

This is where your argument is the strongest. Sometimes there can be an issue if you don't know what return values you're expecting from the database. But that's easily overcome if you actually chat to your database architects before hand. But in all honesty, I'd be disappointed in any web developer who wasn't the least bit interested in the datatypes of the records he's querying nor the structure of the database he's effectively writing a frontend for.

>- Outputting results to the user (html, json, ....)

All HTML is string, so that's a moot point. JSON, XML, etc is more a data structure problem than a data type problem. In fact I sometimes argue that JSON is statically typed since it has strings (in quotations), integers (no quotations), boolean (true / false), arrays and hashes / maps. So the real problem with exporting formats like JSON and XML is really a question of how good a language API is. Take C# for example, there's several different APIs available for encoding XML, some are appallingly bad and need about a page of boilerplate code, others are ridiculously simple. To go back on topic with Go, I've only ever worked with JSON, but outputting that in Go is very easy as Go's JSON encoder basically just takes whatever your data structure is and returns it's JSON encoded string counterpart (much like how Perl and Javascript work with JSON).

I do get the points you're making, and you're right that sometimes statically typed languages do make you jump through a few additional hoops. But most of the times these issues only arise if you're a careless programmer - in which case you're going to run into all sorts of dumb issues even with dynamically typed languages (eg if you don't validate your input data then you're going to write less secure web applications - regardless of your language of choice. That's why I sometimes look at statically typed languages as just another layer of data validation with regards to web development)


While there is a nugget of truth in your statements, the generalisations are so excessive that the overall point is completely wrong.

Lets take a few statements:

> Most people who rewrote their apps from X to Go and saw improved performance and readability benefited much more from the rewrite than they did from Go.

Yes, the rewrite would certainly have helped. However some compilers and runtimes are just faster than others. You wouldn't say that rewriting C code in Perl would result in faster code would you? Of course not. But that's the implication of your post. Go does outperform some languages. Granted it's still slower than some others out there, but not all languages are equal in terms of performance so it's pretty naive to imply they are.

> At best, the fact that Go has a relatively weak ecosystem, means that they had to write from scratch a lot of things they were getting for free in X. But, because in X is was a library, they only used 5% of the features, but paid a high performance cost and had a complex API to work with.

That's completely rubbish. Go is a young language, that much is true. But it's ecosystem is actually very impressive given it's age. Since you're talking about web development, lets look at all the libraries you might want:

  x SQL / no-SQL databases:    check;
  x compression:               check;
  x hashing / encryption:      check;
  x image manipulation:        check;
  x monitoring (eg New Relic): check;
  x httpd frameworks:          check;
  x html templating:           check;
  x smtp (sending e-mails):    check;
  x JSON / XML:                check;
  x web sockets:               check;
I've no doubt missed some stuff that you probably use, but that's not a reflection of stuff missing from Go either. The fact is, in all practicality, there isn't much you need to rewrite in Go aside the web application itself.

> Go's a good language for some things. But it does nothing special or significant to close the massive productivity gap between dynamic and static language. Yes, it's terse compared to many other static language and it has stuff like implicit interfaces, but those are superficial (but nice) things when it comes to what and how you do things in dynamic land.

This sounds more like a rant about how much you hate statically typed languages than it does about how poor Go is compared to [insert preferred language]. For what it's worth, statically typed languages do also offer some productivity bonuses over dynamically typed languages: a big one being the fussier compiler / runtime checking can pick up subtle bugs (eg 0 / "0" / false) that might otherwise take a little while to trace. And I am aware that some dynamic languages have typed checking operators, eg ===, but you can't always guarantee what you're libraries are going to handle / return so you're still sometimes left tracing values up the code base to find the origin of the problem rather than having the compiler explicitly tell you at the first point where your uncleansed data arrives.

So there are also advantages with going down the statically typed route.

> But for a traditional web app / service? It's horrible. At least as horrible as most static languages.

Sorry, but now you're just descending into unashamed language bigotry. A large proportion of the worlds cloud services are supported by statically typed platforms such as ASP.NET, Java, and Go. This forum, HN, is written in Haskell, which is also statically typed. Saying "most static languages [are] cumbersome, error prone, slow, inflexible and difficult to test." is so far off the mark that it's just plain ignorant.

Which is a real pity as there would have been a few good points raised in your rant if you hadn't have jumped off in the deep end with your ridiculous generalisations.


> This forum, HN, is written in Haskell, which is also statically typed.

No, it's written in Arc, which is a variant of Lisp and is not statically typed.


Weird, I could have sworn I read that HN was written in Haskell. That error aside, my points were still valid. Most of latch's post were an exaggerated and largely inaccurate generalisation. Personal opinions of Go aside, it's daft to argue "most static languages [are] cumbersome, error prone, slow, inflexible and difficult to test."

I know I've been heavily down voted in my previous comment, but I've developed in well over dozen different languages of different paradigms for the last 3 decades - so I have quite a broad range of experience as well as being language agnostic - ie I'm not just some angry fanboy :p


I wasn't going to reply, but your misquote is dishonest. I specifically said that it was those thing with respect to dealing with web requests within the further scope of talking to the database and dealing with user input. I also pointed out cases where Go's either "great" or "good", with one of those being some types of web services (thus further scoping my "generalisation".)


I did get your scope, but I don't agree with it. I've written web applications (with database hooks, obviously) in both statically typed and dynamically typed languages. Defining the return types from your DB lookups takes only a few seconds of additional mental overhead than dynamic languages do. But that cost does have additional benefits that can save time debugging. So in the grand scheme of things, there really isn't much between ones productivity dynamic and typed languages (assuming someone of equal proficiency in both paradigms)

I will grant you that Go does get a little more awkward if you're dealing with null data types in the database as you then need to start casting interfaces. Which gets a pain real quick. But it's rare that you actually need null types in the database - usually that requirement can be circumvented at the database design level (eg using default values in the table design or defining flag fields).

Sometimes a different language will require you to architect your platform a little differently, but that's kind of the point of having different tools.


> I specifically said that it was those thing with respect to dealing with web requests within the further scope of talking to the database and dealing with user input.

Although I corrected your interlocutor on a factual inaccuracy, I actually fully agree with him/her on the principle, and as the author of a strongly-typed database access library (Opaleye for Haskell) I'm in a very good position to!


There are people who have been working in Go for years, successfully, but who don't post comments with the same frequency and dogged determination as the middlebrow dismissers.

Both Python and Go are fine. They both have their strengths and weaknesses. I personally wouldn't write a web app in Go (at least, anything beyond the most basic admin interface). I also personally wouldn't write a very large and complex Python system given the huge unit testing burden necessary to ensure safe refactoring down the road.

The biggest reason I like Go is because it makes it really hard for engineers to create huge, complex abstractions. Engineers (and especially less experienced engineers) just love them some abstractions. In my experience, most abstractions aren't justifiable. The net effect is that it usually makes their software harder to learn, harder to debug, inflexible (ironically), and late for whatever deadline they were supposed to hit. You can't write Java enterprise software in Go, and I really appreciate that.


    > the middlebrow dismissers
The totally misplaced condescension is one of many reasons the Go community appears to be a net negative.


Come visit, we're quite friendly!

You know what really grinds our gears, though? People who don't read documentation.

If you don't read documentation, you'll get a negative vibe. Because your question is literally sitting at the top of the FAQ. It's been asked 2^1024 times before. WHY WON'T YOU READ THE DOCS?


This is just bizarre.


I don't even know what "your question" you're referring to is, much less where to find it in the Go FAQ.


> You can't write Java enterprise software in Go, and I really appreciate that.

You can, and I've seen it, unfortunately.


"Anybody can fuck stuff up beyond recognition with any language" -anonymous


It's certainly much easier in some languages than it is in others.


Do you have any increased optimism about the refactorability of python given the recent addition of type annotations?


Definitely, but given the relative recency my comment still stands. Ultimately, all libraries in use in a given application must support this as well for it to be truly comprehensive.

As a side note, sometimes you do need that performance gain, without wanting to resort to C or C++. I hope to see Python make some gains there with the addition of the type information.


I would really like to see a type check package that you can run alongside your unit tests to verify type correctness. MyPy is nice, but I don't want to do type checks at runtime.


"huge unit testing burden necessary"

Why do you consider unit testing a burden? I find unit tests the best way to formalize specifications before even starting to write code.


Unit testing is a burden because (a) a programmer needs to think of the cases that must be tested, (b) a programmer needs to actually write them, (c) they can only act as a safety net. When possible, it's very advantageous to encode your invariants into the type system. Yaron Minsky gives a good example in his Effective ML talk[1] about how you can take a common data structure and refactor it in such a way that the type system prevents you from ever creating an illegal state. (Note: this doesn't absolve statically typed languages from having suites of tests, it just allows them to remove large swaths of those tests if they can be enforced by the type system.)

[1] https://youtu.be/DM2hEBwEWPc?t=1085


Type systems are a burden because (a) a programmer needs to think of the types he or she should design, (b) a programmer needs to actually utilize those types to write his or her program, (c) they can only act as a safety net.


As always, there is a tradeoff. Like the gp mentioned, type systems allow you elliminate much unit test code that would otherwise be required.

Personally, I do love me a type system. Even if you have to think harder about how to architect your code, I think this kind of thinking is required for software to be good.


You can write Fortran in any language.


I have to note this, because I think it deserves quite a bit more attention.

SSL is extremely expensive, on RAM (perhaps the implementations have optimized for throughput over RAM). I have yet to benchmark any SSL implementation in any language, with any binding, that can use less than 20kb per SSL connection. I mentioned in my talk here that SSL is very expensive, here's my benchmark suite that others may add to: https://github.com/bbangert/ssl-ram-testing/

I have implementations in several languages, so far both Go and Python 3.4 can get as low as the 20kb cited. If you can get your per-connection state below 20kb, then merely adding SSL means doubling or worse your RAM requirements, which is huge.

I appreciate that everyone loves obsessing on the language wars, but the SSL RAM overhead affects us regardless of language. I covered that in one of the slides near the end, it'd be great to see some movement on reducing the RAM footprint here.


Doesn't a TLS terminator proxy solve this? E.g. I usually put my application services behind HTTPS-enabled nginx and it works wonderfully.


Nope, so, the goal here is to reduce how many machines (each with their own RAM limits) are used. The task is holding open bidirectional SSL wrapped long-lived websocket connections. They're held open for hours at a time, since we need to send notifications when we get them.

Every connection has a base cost of the TCP kernel send/recv buffer, which in our case we dropped a bit to 4kb each. So that's still 8kb per connection right there. If we terminate the SSL on a separate machine from where we handle the connection, then it means we'll be using 8kb more memory per connection. Probably even greater because nginx has its own send/recv buffers for data.

I'm sure our use-case is a unique one, most people care about raw through-put so the majority of SSL optimization has focused on lowering CPU use under high load rather than memory use under massive amounts of connections.


What is your (unique) use-case about? What service do you provide to your users?


Do you have any insight about the mention in the slides of Google having a 10kb solution?


Do you have results for the different tests? I did not see any in the repo.


This echoes some of my own experiences. Python (even CPython) waits on the network just as fast as Go (or anything else).

I've had good luck writing applications in Python, then profiling them and implementing critical sections that are CPU bound in C as modules. Any sections that are memory hogs can be converted to stream processors.

Lately I've started implementing the modules with Rust and the results are promising. It seems like a nice balance of developer productivity and application performance.


The claim that pypy uses less memory than Go seems...rather extraordinary.

I worked with a fairly complex http api app that ran as a rather svelte wsgi framework under gunicorn, and we saw at least 10x or more increase in memory usage than cpython when we switched to pypy, once the jit was fully warmed up (memory usage seemed to hit steady state after about an hour). pypy has also historically (in my experience) been a bit more "lazy" (deferring GC of individual objects longer) than cpython when an object falls out of scope.

My general rule of thumb for pypy has historically been that you trade memory (requires more of it) for speed (faster) when compared against cpython.

Maybe the reimplementation itself was just far more efficient? Hopefully the talk itself clarified that point. I would be very interested in hearing more about that particular aspect.


This was before Go 1.4, which dropped memory use per goroutine from 8kb -> 2kb. They're fairly close now (minus my leaking goroutines). ;)


PyPy's JIT requires quite a bit of memory. CPython doesn't have a JIT so it doesn't share that problem. However PyPy also has optimizations specifically to keep memory usage down. Especially if you have a lot of objects, it's certainly possible to consume less memory than CPython.

PyPy using less memory than Go does seem weird but depending on how much the GCs differ and how they are configured it could simply be that Go's GC doesn't give up memory as freely to the OS.


I don't know the details, but the memory difference was almost certainly due to the different approaches to concurrency. Python coroutines just need to save their exact stack frame while a Go goroutine will spawn a "massive" stack that likely has much more space than needed. The first case might only need tens or hundreds of bytes for a dozen local variables, while the latter case is a fixed overhead of several KB (8 previously, but 2 with Go 1.4 according to the slides).

The JIT memory is constant at runtime (proportional(-ish) to the amount of code, which is fixed) while it is desirable to have the number of coroutines be as large as possible.


The memory used by the JIT will at some point remain constant but before that happens, it will grow as the JIT considers more and more traces to be worthy of compilation.

It can take a very long time until the memory consumed by the JIT actually remains constant. If you do continuous integration and deploy several times per day, your application might never reach that point.


The fact it may be less than the long term result isn't so relevant: there's an upper bound so the memory use of the JIT is O(1) with respect to the number of coroutines.


Your theory regarding PyPy's JIT is disproved rather easily. Python's grammar has been shown to not be context free. This implies that in the best case, the parser is supralinear in both time and space. Just the parser. Now add the rest of the JIT.


My hand-waving is "disproved" even more easily than that, by passing a dynamically generated string to `eval`. However, that's missing my point (and my "-ish"): the JIT is a constant overhead. Assuming they're being careful to not dynamically generate code for handling each coroutine (seems like an obvious thing to avoid, and a reasonable assumption), the JIT'd code is shared across all coroutines, so as you accept more and more connections, requiring less memory for each coroutine will eventually outweight the memory cost of the JIT.


i've seen a lot of these posts ending along the lines of "it's time for rust". two languages that are always conspicuous by their absence are D and ocaml.

D in particular seems like it would be the logical upgrade path from python or ruby. it has a comfortably familiar C lineage, supports a variety of programming paradigms, and has good concurrency support. i wonder why people don't at least give it a look. (personal experience - i tried to use it twice, several years ago, and gave up because the tooling was bad, but from what i've heard that's very much improved today)


D has replaced Python for my regular programming.

D is the sweet spot for Python programmers to upgrade to without going backwards to Go (Programming language design wise) nor weighed down by all the new (and very good) stuff in Rust.

D has everything from a nice IDE(Xamarin Studio), debugger, package management (Dub), statically compiled binaries, pretty decent std lib (not as good as python or Go, but very good nonetheless).

I still write Python if it's a "script" that has to run on a $work server, where it is safe to assume that Python would be available and sufficient for most tasks.


People saying "backwards to Go" instantly reminds me of the following quote:

Are you quite sure that all those bells and whistles, all those wonderful facilities of your so called powerful programming languages, belong to the solution set rather than the problem set?

        — Edsger W. Dijkstra


Interestingly, ALGOL failed because it was too complicated to implement.. And Djikstra played a huge role in the formulation of that language.

Why aren't you using ALGOL?


ALGOL-60 saw fine use in its day; Dijkstra criticized ALGOL-68, which indeed failed because it was too complicated to implement.


Hmmm... It is interesting how many years urban myths can persist.... 45 years and counting....

"In December 1968 the report on the Algorithmic language ALGOL 68 was published. On 20–24 July 1970 a working conference was arranged by the IFIP to discuss the problems of implementation of the language,[1] a small team from the Royal Radar Establishment attended to present their compiler, written by I.F. Currie, Susan G. Bond[2] and J.D. Morrison. In the face of estimates of up to 100 man-years to implement the language, using up to 7 pass compilers they described how they had already implemented a one-pass compiler which was in production use in engineering and scientific applications."

cf. https://en.wikipedia.org/wiki/ALGOL_68-R


Too complicated for its time or in absolute terms ? It surely did a lot.


Another language that is missing from such posts is Nim. It is a far more logical upgrade from Python than any of the other languages that you mention (or ones that others mention in reply to you) in my opinion. Especially so for the use cases explained in this post, for example Nim supports async very well with a syntax that is very similar to C#'s async await.


I've tried a couple of times to play with Ocaml. In seem like something that should be great but it just falls short. Some of that is tooling that isn't fully baked. Another issue is the syntax. The weird split between the interface description file and the code file. The lack of a unified DB interface. The lack of proper Windows support.


i enjoy using ocaml a lot, but the lack of a good orm and the bad windows support are indeed very painful. the tooling used to be bad, but at least under linux i'm pretty happy with it these days for my small hobby projects. it's one of those languages that i'd love to be able to use at work; i miss the type system when i'm doing c++.


Having worked with and without ORMs, I'm not sure what a "good ORM" is. On the other hand, a somewhat type-safe generic SQL query builder would be very nice to have.


the other end of the process is pretty useful too - unmarshalling raw sql results into ocaml objects or records, converting foreign keys into references, etc.


That weird split is how most languages with modules work.


In languages that don't fully parse the dependencies (which Go does), that split uncouples "the implementation needs recompiling" from "all the dependencies need recompiling."


Not all languages are C and C++.

That split doesn't forbid reading the module metada.


Or Elixir, Clojure, Erlang, F# depending on context they could be a match as well.


Go is one of my working languages. Like with every other language (Python, OCAML, C#, some Java, Swift) i have a love-hate relationship with go.

What i agree on:

What i agree on with the presentation: Concurrency using goroutines and channels is f... hard besides very primitive scenarios. Even fork-join isn't that easy. There also the lack of expressiveness hurts: It's nearly impossible to build higher level abstractions above the channels/goroutines. You always have to do the bookkeeping of your goroutines.

I also agree on the error-handling problems: It's often hard to locate errors. It requires a big amount of discipline by the programmers to achieve some kind of ability to locate errors. No, i don't want the Java/C#-'i throw exceptions everywhere'-style back, but Go is the other extreme. Some more lightweight-panic wouldn't be bad.

What i can't agree on:

That you cannot mock without interfaces in Go is typically not a problem: There is no real encapsulation (_, ., no constructors) so in many cases you just an instantiate your structs as you need them. The classic for mocking - time - is problematic as in every language. IO is typically behind the various io.Reader/Writer... interfaces: No problems there.

The criticism about memory consumption i don't get: Every system i saw ported to go from Java, Ruby or Python had a much lower memory footprint than before. And typically go allows to optimize allocation quite well when needed.


I agree that error handling in Go is a headache. But since we are forced to handle every single error where it occurs, we can at least make the best of it and add context information before returning it.

I'm using a small utility library to wrap the original error and add function name, line, file name and optionally a descriptive message that explains what failed.


You are not forced to handle errors. Or do you check for the errors returned by fmt.Print? I guess not.

This was my point: In many areas go forces the developer to the right thing (no unnecessary imports, gofmt...) and does not rely on developers discipline. But when i comes to error handling it does.

What i would wish for is some extended error handling supported by the compiler. I don't want a stack trace, but the compiler easily could produce for example a line number where the error was returned.


What's unusual is that this project uses PyPy in production. PyPy has been a long time coming. Until recently, Python was defined by CPython, which is a naive interpreter. As this article points out, performance is roughly an order of magnitude better with PyPy. Now PyPy has to be taken as seriously as CPython, if not more seriously. In a few years we may look at CPython as the "toy" implementation.


I'll be interested in coming back to Python when it isn't a headache to deploy into production. I'm tired of an install requiring a GCC compiler on the target node. I'm also tired of having to work around the language and ecosystem to avoid dependency hell.


The way I deploy Python apps at $EMPLOYER:

- CI system detects a commit and checks out the latest code

- CI system makes a virtualenv and sets up the project and its dependencies into it with "pip install --editable path/to/checkout"

- CI system runs tests, computes coverage, etc.

- CI system makes a output directory and populates it with "pip wheel --wheel-dir path/to/output path/to/checkout"

- Deployment system downloads wheels to a temporary location

- Deployment system makes a virtualenv in the right location

- Deployment system populates virtualenv with "pip install --no-deps path/to/temp/location/*.whl"

The target node only needs a compatible build of python and the virtualenv package installed; it doesn't need a compiler and only needs a network connection if you want to transfer wheel files that way.


You really should look at Armin's (same guy who wrote Flask and Jinja2) platter tool. It takes a few of the steps out and for an almost identical workflow as you, we are switching to it.

http://platter.pocoo.org/dev/

Really nice stuff


Actually, that's where I got the idea... but when I last looked at Platter it was covered in "experimental only, do not use in production warnings".

Considering I'd have to write a build script to use Platter, it didn't seem like it would be a lot of work to write a few extra lines and not require an additional dependency.


The way I deploy Go apps at $EMPLOYER2:

- go get

- go test

- go build

- copy to target

It's possible with Python, it's easier with Go. It's a place where we could use a lot of progress.


Presumably, though, what both of you actually do is:

build.sh

Once you'd done the up-front work of figuring out how to do deployment sanely, it became equally easy for both of you.


It seems weird how you can't easily package python into an executable without Docker.


You can with various levels of success with a few "freeze" programs. They basically bundle up the entire environment into an executable, so the executables are stupidly large (more-or-less the size of your /usr/lib/python directory plus the python binaries), but they mostly work.


I've done it before, but it was kind of a pain and I got the impression nobody else used that stuff. I wonder why it's not more popular/easy.



FWIW we deploy python code as debian packages that we build with dh-virtualenv.

This bakes a whole virtualenv with all python dependencies (including compiled C libraries) into a .deb package. The packages tend to be big-ish (3MB to 15MB), but the target system only needs the right python version, nothing else.


This was posted here, and not a bad idea:

https://nylas.com/blog/packaging-deploying-python


docker?


Doesn't solve the issue of needing a C compiler for third party extensions, and definitely qualifies as a work-around for the existing toolset.

Yes, it helps. But you can use Docker with Go programs as well (and drop a lot more of the base image in the process).


The way we do this is to have a base image that has already yum installed or pip installed all modules (non trivial, anyway) that our package needs. Then the docker image that needs to be rebuilt (that depends on the first one) is just a minimal pip install away.


Actually, nice thing about docker is you can build a compilation container (pre-built with all your C/C++ apps ready to go and shared amongst your coworkers), compile your extensions using that, and then only install them into your target container (sans compilation tools). It's a little more grunt work that way, but you get better control and reproducibility without the explosion in image sizes.


This page displays exactly nothing with JavaScript disabled. I realize that no js makes me something of a Luddite, but there are solid reasons for turning it off, particularly on mobile. Is it really too much to ask of google that the text of a presentation in some way live inside HTML? Novel concept, I know.


Javascript is like Flash, it's great for the author of a page to show off, or force an ad on you, but how does it actually benefit the end user? Not at all, never has.


Really now. JS can potentially improve UI and UX, providing users with a smoother experience.


I notice you are careful to say "potentially" because it never actually happens.


It can yes, but like anything else it often provides the opposite, even setting aside the download time.


Huge interfaces are not how you do it. Little ones, each expressing a conceptual unit of the functionality. Loggable. MessageConsumer. Stuff like that, even if it only wraps one method. Make each major subsystem into an object, connect to it through the mini-interfaces it provides, and test its consumers by stubbing those mini-interfaces, not the whole object.


I've been porting a project from Ruby to Go in search of making it a bit lighter. The project is about a 1000 lines now so I'm not allowed to criticize Go yet, but so far it's very nice. I picked the language up in just a few hours as I went and the whole project took just about a week and a half.

So as someone who is clearly in no position to be criticizing other projects yet, isn't Heka exactly the sort of project you shouldn't do in Go? I say that because I have the feeling you should use Go only for very concrete cases, given its lack of proper abstractions.

I.e. writing a tool that receives log lines over HTTP, extracts metrics and forwards those to StatsD? Perfect use case for Go. But writing a tool that lets you plug in arbitrary frontends to forwards to arbitrary backends? Perhaps they got it to work nice, but that sounds more like a case for a more general language.


> Perfect use case for Go

Go shouldn't have "use cases". One should be able to do almost everything with ease with a language built in 2008/9.

And unfortunatly that is not the case. Go has excellent concurrency features, but is limited by dumb language design decisions which make it painfull to test and to write good reusable and composable libraries for.

I'd love to replace my entire stack with Go but I can't. Something I would write in Ruby in 10 days takes 2 months in Go. And worse, I cant write the code the exact way I want which does piss me off. Give me some choice, not random constraints. Aside from concurrency , I shouldn't have to ask myself how I write something in a specific language. This is the goal of refactoring, and it comes later.

I will personnally invest in Crystal and dump Go as soon as it runs on Windows. It has channels, and that's all I need.


> Something I would write in Ruby in 10 days takes 2 months in Go.

Have you looked into Elixir? It has Ruby like syntax, but uses an actor model for concurrency(it runs off of the Erlang VM). For handling concurrent tasks it tends to benchmark around Go's speed, but is much nicer to do things in. While it is admittedly immature, the ecosystem still has a decent amount of packages and the tooling isn't bad.


Being a Ruby to Go to Elixir convert myself, I can only second this. Elixir is a really great language.


> Go shouldn't have "use cases". One should be able to do almost everything with ease with a language built in 2008/9.

Assuming that Go is a general-purpose high-level language, yes, but is it? It was created to replace C++ in critical infrastructure, not Python/Ruby as the end-all be-all of default platforms for every situation. Its syntax simplicity and speed absolutely makes it attractive to a wider audience, sure, but if it does the job it was designed to do very well, should we be angry that it doesn't do all jobs well?

Developer time and execution time are both important metrics when considering a language, and Go is very well situated when the major developer time gains offset the minor execution time losses vs C/C++. That it's less well situated when the developer time losses vs Python/Ruby are incurred on a project when the execution time gains are irrelevant isn't a failing of the language, it's a trade-off.


For heka, one of the things we wanted besides performance, was a small, easy-to-distribute binary you could drop on a system and 'just run'. Go is fantastic for distribution thanks to the small staticly linked binaries you can get out of it. To accommodate the pluggable filters and such later, a Lua sandbox system was added... and oddly some of the other Go pieces ended up being faster in Lua which is why Lua decoders/inputs are now an option for heka as well.

Note: I gave this talk.


Not advocating anything, but could C++ have been a good choice? You'd get top performance, small memory footprint, opt-in static linking, at the price of a less "fun" language and verbosity. Regarding memory leaks, it's easy to write a leak-free C++ program these days with smart pointers. However, it doesn't come "batteries included" for net-related stuff, and ASIO can give a couple of headaches.


It could've been. Though now that so much of heka is involved in moving data between the Lua layer, turning it into plain C is a better option. trink, one of the heka authors has a project called hindsight that does just that.


I don't have much to add, other than I went to the talk this past Wednesday (Python meetup in SF) and really enjoyed it. You did a great job. I agree with the currently-most upvoted comment here that I really value these kinds of war stories. Making technology choices is a huge part of what I have to do on a week to week basis and it was great to hear your experience.


Does anybody have some comments on the closing statements on Google's 10KB secret?



Is it possible that it's just a typo, though?


Not sure what you mean, but the site linked above says they do it <10KB and then a few paragraphs down says how they patched it to go down to a bit more than 5KB. So, I don't think it's a typo.


That would be pretty hilarious.


I use and like both python and go. The presentation mirrors my own experiences, though I haven't come back again yet.


I use and like both too; webapps in python (django/flask) and microservices in go. My systems also use some large programs written in go but more like black boxes (nsq, heka, influxdb).


what is strange is that everyone seems to be using Python 2.7 in some form. There is all this new work being done in Python 3 asyncio but there is nothing that is being used... and this is from a guy who pretty is a core developer in several python projects.

I look at Ruby or Go... or even Java and every new language feature has a much more rapider adoption curve.

is this a pretty solid statement that the entire Python 3 + asyncio path is a dead end ?


The difference is that Python 3 is far more backwards incompatible than an update to Ruby, Go or Java. If there's any component of your Python project that was written for Python 2 and it hasn't been ported, your project as a whole is probably staying in Python 2.

I certainly wouldn't call the Python 3 path "dead", as starting a new project you'd be silly not to use Python 3. It's just a very slow process.

Also, porting from 2 to 3 isn't even that hard (there's a script that does 95% of it for you).


I understand this in its abstract sense - but I think (IMHO) the fundamental problem is that every framework out there is using Python 2.7.... which means nobody is using it.

I'm not sure if you have worked in the Ruby or Go ecosystem... but if you are starting a new project, you are using latest Ruby+latest Rails. this is not the case in Python. I asked a question here and on the /r/python forums. What framework should one use to build an API using asyncio+postgresql. Nobody seems to be doing it.

the answers I got mentioned using Tornado with Python 3 code on top of it.

People have compared asyncio with goroutines - but even for a project that is a migration from Go -> Python by a fairly advanced developer ... Py 2.7 is being used.


> every framework out there is using Python 2.7....

That's just FUD, it might have been true many years ago but not any longer. http://python3wos.appspot.com/, there are a few red holes but from what I could tell some of those have substitutes that in many cases are better even. It might also be true for in house libraries but at least there you have the chance to upgrade yourself.


I know that. But I'm hard pressed to find anyone who is using this stuff in production.

the ecosystem is still running 2.7. I cant wait to switch to Py3.


> the ecosystem is still running 2.7. I cant wait to switch to Py3.

Which one? Pyramid and SQLAlchemy work like a charm in Py3. Until you really name what your need is, that's just fud.


We use ayncio and async/await coroutines of Python 3.5 on each of our packages.

I hope that PyPy will have a Py3.5 compatible interpreter someday, but CPython is good enough for now.


We used Python 2.7 for pypy compatibility. I really dig the new async/await primitives added to Python 3.5 and think the asyncio path is the best path forward. I hope the work to get all the latest asyncio stuff working on PyPy picks up some steam.


The main issue with asyncio is it is incompatible with some of the best packages in the ecosystem atm, e.g sqlalchemy


hey - thanks for replying.

if Pypy + asyncio was available, would you have built everything using that stack ? There have been all these benchmarks that asyncio is so much slower than threads [1] How would you compare that with Go ?

[1] http://techspot.zzzeek.org/2015/02/15/asynchronous-python-an...


For certain things that they probably shouldn't be used for, not across the board.


Yes, it is time for Rust.


Yeah, Rust sounds like a great Python replacement. /s

I do like Rust, but must we clutter every Go-related thread with comments about it? The zealousness is annoying.


Well, the last slide said "time for Rust" and the presentation was from Mozilla.


I meant that partly as a joke, I do happen to like Rust though. And in the context of this technical problem, its memory control would make it an attractive possibility (as would C/C++/etc). Lots of context missing as these are merely slides, and I said quite a bit more than what is presented here.


Well, Python-related threads have been cluttered with comments about Go for a few years now...


If you're problem is that you're memory bound, using a language that allows precise control over memory allocation and layout is certainly a good idea.

The constant conflict might be annoying but using Rust or even C++ would certainly be a very reasonable choice.


Sure, Rust is the way to go.


I have my own Go Heka story. I attempted to switch from Apache Flume to Heka mainly because of Flume taking vast amount of memory. I was hoping Heka would work but I think there must be some problems with the Golang AMQP drivers as memory usage would just continue to grow. This might have been my fault as I had to alter the Heka AMQP drivers to do things with the AMQP message headers.

The problem was pretty simple: pull event messages from AMQP and then shove them into elastic search and file system. Heka and Flume were both sort of overkill so I decided to write it in Rust. I got extremely far but alas there were some issues with the Elastic Search Rust library that I'm still resolving. Surprisingly the AMQP library worked pretty well.

I will vouch for the OP's point on error handling as Rust has a similar issue to Golang but not as bad because of the awesome type system (still I hate to admit but I really miss exceptions at times).

Anyway to relate again to the OP I went back to what I know best.. boring ass Java and wrote the app in a hour or so. It took about the same memory as Heka (surprising since its Java) and appeared to be slightly faster than Heka (elastic search indexing became the bottleneck for both so take that with a grain of salt).

Long story short.. I think the drivers and libraries really are the deal breakers and not so much the languages themselves (with some minor exceptions like the GIL).


> Go is generally 50-100x faster than CPython

(slide 20)

I find that a weird sentence in an otherwise carefully written article. The author is talking about writing software which does a lot of socket IO. So I would expect the performance discussion to make some reference to this; I assume what he's talking about in the quote is the behavior of pure CPU-bound code but he doesn't discuss to what extent this is really relevant to his project.


This is interesting, there are places I would still choose Python over Go. But usually these are front-ends due to the richness of the l10n, i18n and template options that Python has. On the backend I exclusively use Go and have not seen route leak issues or some of the other things. The only thing that I do feel is that some Go code is harder to mock and fully test effectively.


My take on the points mentioned:

1. Goroutine memory use - Post happens to be about 1.2 and 1.4 and I started with 1.5.

2. Debugging - yes handling errors are a bit tedious. I have not written much boilerplate to handle errors. I just copy error handlers a lot. Maybe that’s why error strings are a lot common.

3. Goroutine leaks - This is scary, I have used goroutines with channels but properly so far. Yes you can write code that leaks. This is something you will have to check yourself.

4.Testing - not done much.

Overall - I feel author learned of some negative aspects about Go and turned away. Some of them like Goroutine memory footrpint will improve with time (e.g. 1.5 earned a lot of praise for GC improvement). A lot places author mentions possible improvements for e.g godebug or latest Go with SSL but did not try it as much. So it may not be as relevant to new Go adopters.


"Go experts still can't write leak-free code"

Big statement, anyone confirm this?


Now I know to ask about goroutine leaks next time my colleague who loves Go talks about how bad space leaks are in Haskell :P


The next two slides have screenshots of issue tracker searches for "goroutine leak".


yeah I saw those, what about examples from independent people reading the article on HN?


I am surprised the GIL hasn't been mentioned once in the presentation or any of the comments.


Why would it? Using twisted (or py3's asyncio), the event loop is spinning, there is no thread for Python to be switching to. Except that we're using a thread-pool of course (to make external blocking network calls), and those threads are.... making network requests, which is I/O, and the GIL releases during that.

On a network bound daemon, which is not CPU-bound, the GIL is really not an issue, so it never came up.


"On a network bound daemon, which is not CPU-bound" - I think that's a fairly unique constraint which makes your case special. For me, the GIL is unfortunately where Python is unable to compete with Go.


node.js has a GIL and nobody talks about that either. The GIL is not relevant because Python is so slow that running it in two threads is hardly an improvement.


This is the weirdest and most nonsensical description of the GIL I've ever seen, and assuredly one coming from a Python/JS non-programmer. In both languages the serialization of runtime is a feature explicitly designed to simplify the programming model and its implementation.

There is only one case where the serialized runtime presents a problem, and that is in CPU-bound 90s-style shared memory parallel computation that the industry as a whole has been trying to escape for the past 20 years, because thinking about individual threads and the lifetimes of shared memory allocations turned out to be an incredibly shitty abstraction.

Even if you want to shoot yourself in the foot, both Python and Node.js provide facilities to allow e.g. concurrent array access (in Python via the multiprocessing package). The reason those approaches aren't more popular in those languages is exactly because the model itself is defective. Anyone worth their salt working in a computation-heavy domain stopped writing explicit threading code a long time ago.


But don't you need to run a bunch of node.js instances in production on a multi-cpu system behind nginx or haproxy? Would it be necessary had it not been for the GIL?


Most node.js code spends barely any time on the v8 thread. The node code merely invokes routines in libuv, which has a thread pool for anything computationally intensive.


I get linear scaling with PyParallel: http://pyparallel.org/


If I can get a nuanced presentation out of a slide deck with no audio, you've built your slide deck incorrectly.

If I can't get a nuanced presentation out of a slide deck, then maybe an article or corresponding speech is required!


Well, I've read that it takes ten years to build great software. So golang has a while to go. Python, while not perfect has the infrastructure already built.


btw, python with pypy gonna be a bright future.


What color theme and IDE is the author using :D?


Glad that some people see the light.


"drank koolaid, used tool without knowing what its good at, lost 2 years and probably a million or two in engineer resources"

Gratz. Next time, use C. PyPy won't solve your problem. You're writing a low latency, high performance, large throughput, super-optimized message router. Even the libs you like in Cpython are.. freaking C. Do you know why? Do you understand what happens underneath the language? Do you know why PyPy is faster?

Don't get me wrong - and in fact, maybe you get me right:

Go rocks. Python rocks. It's just not the tool for that very job. Today, it's still freaking C.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: