Hacker News new | past | comments | ask | show | jobs | submit login
Open Sourcing Our Go Libraries (2014) (dropbox.com)
176 points by joss82 on Jan 6, 2015 | hide | past | favorite | 87 comments



Using Go (or Scala or Java or whatever) doesn't magically scale a large system.

Scaling problems in large systems are rarely solved at the micro level, i.e. you don't "scale" by simply gaining the ability to run more operations in a single thread. This is always the problem with "language X is more scalable than Y" debates. From my experience scale has little to do with language, everything to do with how you use it.

This post brings to mind another written by Alex Payne a few years back about scaling in the large vs. scaling in the small:

https://al3x.net/2010/07/27/node.html

The danger with making hand-wavy claims with very few technical details is that it perpetuates the notion that there are certain magic bullets out there that will magically make your system "scale" if you use them. In the past few years we've seen quite a few large companies in SV and SF make the switch to some shiny new language (Scala, anyone?) only to find several million dollars and many months later that they're still having conversations about how to make stuff scale. Only now they're talking about making it happen in a language that is fairly new to their organization and with which they have only a few real experts.


Your comment reads to me as if you think the author implied that they think Go "magically" provides scalability. I inferred nothing of the sort. Rather, I assumed they wanted to use a language which provides concurrency as a primitive, to make it easier for them to write concurrent code.

This switch (Python to Go) for this motivation (better concurrency support) seems reasonable to me, particularly since Python does not have a good concurrency story. (I love Python. But I would not choose it if I wanted high concurrency.)


Also you should consider that the more you scale, the more "scaling in the small" saves you money/machines/operators/...

Why ? Less resource-hungry code requires less resources and doesn't cause problems quite so quickly (tends to cause harder problems though). 8 years ago I switched the directory on a site that required >30 servers to run to tmpfs, and did some serious sql optimization in the php code. The month after that I turned down 20 of them (actual servers + caching and load balancing servers that weren't necessary anymore).


Everything you said is true, of course, but I think it applies more generally to inexperienced startups with first-time technical founders, or larger corporations with a "B" technical staff.

In this case, you're talking about Dropbox, which already has achieved the scale, in both the large and small senses, that require a deeper level of problem solving than simply switching languages would provide.

Even if I personally find Go an odd choice for this project (why not C++, which already has the library support they need, and much more?), their present-day successes demonstrate that they didn't expect a scalability panacea from Go, just a better runtime for a subset of their critical path code.


Nothing is magic but in this case moving from a language with a GIL to a language built for (single machine) concurrency probably scales a lot.

The OP says as much in the comments: "For us, one of the biggest latency wins comes from the fact that go can truly execute sql statements in parallel (whereas python's GIL serialized these parallelizable operations). In general, single-threaded go is at least 5x faster than pure python (without c-module)."


Go over Python may not be a magic bullet, but it's a damn useful tool nevertheless. While Go does not magically scale to datacenter-level systems, python cannot, without significant work, use even the resources on a laptop. There's a huge range of problems that are larger than 1 core and smaller than 48 cores (or 72 cores, or however many are in your largest server). And as the sibling comments have mentioned, straight-line single-thread performance is not irrelevant. Starting with a language that's slower than perl isn't a good beginning.


You're right regarding larger system architectures where it's more about the services and less about what language each is written in.

1) Regarding concurrency, Go really pushes you towards writing things in a scalable way with the goroutines and channels, while still giving you mutexes for when those are the best fit.

2) Regarding single-threaded performance, this does indeed start mattering once you un-bottleneck-yourself on the architecture and concurrency fronts. 250 boxes are cheaper than 500.


You're right, but consider this: they managed to rid themselves of Python's GIL by not using Python. If they can actually reap the benefits of parallelism, it means a lot for scalability.


I find the comments most interesting. An example:

For us, one of the biggest latency wins comes from the fact that go can truly execute sql statements in parallel (whereas python's GIL serialized these parallelizable operations). In general, single-threaded go is at least 5x faster than pure python (without c-module).


It's a slightly odd comment, SQL queries performed by a C driver are one of the times that you actually get to release the GIL and let the thread do its work without the python interpreter. Depending on the SQL engine, you're ok waiting for the sql backend in parallel, but you may suffer single threaded python object creation processing the result set.


They are not executed in parallel though, only asynchronously.


How/why are they not executed in parallel ?


They MAY be executed in parallel. Concurrency != parallelism[0] BY DEFAULT.

That said, if you're running Go against n number of CPUs, then yes, the concurrency may in fact happen in parallel.

[0] http://blog.golang.org/concurrency-is-not-parallelism


that is interesting - does Gevent not give this in python (or twisted) ?


It does. I execute many db and filesystem queries in parallel with single threaded Python using gevent.

I'm not quite sure whether dropbox didn't or couldn't get gevent or a similar system to work - it's fairly straightforward.


I'm a huge fan of Python and gevent, and not the biggest fan of Go, but there's something to be said about built-in concurrency constructs like goroutines compared to something quite hacky like gevent (gevent.monkey.patch_all() everywhere). gevent is my favorite async networking library for Python, and I use it in almost all of my projects, but one can't deny it's basically one giant hack over CPython.


I agree that gevent is a hack. however goroutines correspond to greenlets rather than gevent. You could write clean asynchronous io libraries in python using just greenlets and then it's not a hack.

I wonder if dropbox evaluated this option or just rejected python outright.


True, but the API for pure Greenlets is a bit ugly.

In Go, the API for goroutines and channels is superb, concise, and clear.

For me, the negative traits of Go outweigh the positive, but I really wish other languages would adapt some of its concurrency constructs as native language features.


"switch from dropbox formatting to std formatting"

https://github.com/dropbox/godropbox/commit/5ed34e410e1c9fe8...


The first thing I usually do in a new language is find out how people use it. With go and go fmt, I don't have to worry about most of it. (golint and go vet cover my other mistakes most of the time)


Did they just run fmt on it?


yes


I wonder what dropboxes reasoning is for not doing that in the first place.


Probably because Go indentation and spacing can be a little annoying at first when you're used to Python (or simply because they followed some kind of generic coding standard).

It took me a while to "give up" and find fmt-massaged code natural.


The first thing I usually do in a new language is find out how people use it. With go and go fmt, I don't have to worry about most of it. (golint and go vet cover my other mistakes most of the time)


The first thing I usually do in a new language is find out how people use it. With go and go fmt, I don't have to worry about most of it. (golint and go vet cover my other mistakes most of the time)


Serious question, not having looked at their caching lib just yet, are they going to be able to beat groupcache, written by golang upstream?


Groupcache addresses a very narrow use-case of caching: immutable data. Key's in groupcache cannot be changed or deleted, which is what allows for some of the cool things it does like distributing keys to multiple nodes automatically and prevention of stampeding. It's useful for things like caching lots of small static files (which I believe is what google uses it for), but it's not useful as a db cache where things are constantly changing.

Just glancing at dbox's caching package it looks like a much more general cache, with deletes and sets and all of that. So the two aren't really comparable.


Thanks for the comparison, that makes a ton of sense. As someone else also pointed out, since a lot of Dropbox's infra is python, it does make sense for them to have a drop in memcached replacement. That means groupcache is effectively out.


Interesting to note they released these back in July of last year.


I'm a little surprised they went with memcache instead of groupcache (unless they are also using memcache on their Python processes). Would love to know more about that choice.


Probably because of other services--not necessarily written in Go--which already use memcache. Don't fix things that aren't broken, right?


Awesome work but I imagine at some point it really is going to make sense to split those projects into their own repos.


when/where is Go faster than Python?


1. Go, being a statically compiled language/having a closer to the metal memory model, generally has higher single threaded performance.

2. Go allows true concurrency in many situations Python cannot (due to GIL etc); Go also supports lightweight threads that are multiplexed over actual OS threads.

3. Go makes it easier to control and reason about heap allocations.

4. Go is even easier than Python to integrate with C/ASM code.


> 2. Go allows true concurrency in many situations Python cannot (due to GIL etc);

It would be more accurate to say that Go allows parallelism in situations where Python (in the standard implementation, at least) does not. Calling parallelism (what the GIL prevents) "true concurrency" isn't particularly helpful.


True, I was imprecise. Thanks.


Since they mention "scalability", I think they're more concerned with the fact that Go has concurrency baked into the language. Python does not.


It's a statically-typed and compiled language, by most accounts you can expect it to be faster in a lot of places.

This has more to do with the interpreter, of course, you can compete with PyPy.


Go has a much better runtime system than Python (can spawn lots of green threads in parallel using goroutines), but it's also broken in so many ways (e.g.: type system).


I hope Dropbox replace the python client agent to Go. That will hopefully cut down the memory consumption.


So I guess that Python JIT compiler (https://tech.dropbox.com/2014/04/introducing-pyston-an-upcom...) isn't working out so well then?


We still have tons of Python code, and likely always will. Therefore, Pyston is still really interesting to us.

Only specific, common, core services (storage etc.), are being written in golang; the "long tail" of application code on the backend is (and will remain) in Python.

- Dropbox infrastructure engineer


Seems to still be actively developed: https://github.com/dropbox/pyston/commits/master


Snark aside, I don't see what one has to do with the other. It's perfectly sensible for them to keep pushing pyston to have an alternate solution.


I wasn't snarking, merely asking a question. It's valid to ask whether the Python compiler is working out since the stated goal is "to produce a high-performance Python implementation that can push Python into domains dominated by traditional systems languages like C++."

Since Dropbox has rewritten a chunk of systems in Go it suggests that Pyston isn't working out.


If you check out the announcement timings, it was far too soon for Pyston to have an effect. You can't churn out PyPy-grade JITs like they were burgers...


What version of Python does Dropbox use? 2 or 3?


This is one the cases where I wouldn't mind reverting the original title to what OP linked and let readers draw their own conclusions.


[flagged]


This is probably not a great thread in which to have that conversation. Java isn't even implicated in the story, and this is a language-war topic.

We'd all be better off if this thread were actually a discussion of the libraries Dropbox released, and if subthreads like this dropped to the bottom of the page.

I downvoted you for that reason; not because it's a dumb question, but because it's in a very unfortunate spot on the thread.


There are several reasons we've been using go for web backends where I work:

- Very easy deployment (we can just copy the same binaries between our 64 bit linux boxes and run them)

- Go's libraries for developing minimal web interfaces are excellent

- Being strongly encouraged to do all the explicit error checking made our Go code much more reliable and easy to debug than our previous python versions.

Of course, all of this depends on how big the project is. We just have several components of a project that require a simple webpage frontend with a small REST backend to interact with Redis. Our entire server for both the frontend and backend is ~1200 lines of Go and easy to understand. If your project is much bigger, you may end up needing features that aren't present in Go (as was the case with Dropbox), which may slow you down some.


Java might be one of the few languages for which there isn't an immediately compelling sell.

For scripting languages like Python or Ruby you can sell out-of-the-box performance and concurrency benefits.

For lower-level compiled languages you can sell the ease of cross-platform binary compilation and direct compatibility with C.

The only thing I can think of that might be interesting to someone with a monolithic Java codebase is the syntactical simplicity and speed of testing / deploying. But if you have such a codebase that's a major prospect for potentially minimal benefit. Not to mention you have a much larger set of developers to choose from.

edit: weirdest downvotes ever


The one major benefit is much lower memory overhead due to having real value types and not needing to box simple types.


Can someone explain why this was being downvoted?


Would that be over Java the Language? Java the Virtual Machine? Java the Ecosystem (Libraries, alternative languages)?

Java the Language:

1. Go is more concise. You might not care, but I do.

2. Go uses explicit error return codes.

My experience is that practically and counter intuitively, explicit error return codes (that have to be explicitly checked for errors) result in more robust code than exceptions (checked or not). YMMV.

Java the Virtual Machine:

1. Go produces single file native binaries. You may not care, but if you do, that's a win.

Java the Ecosystem:

1. Go interfaces to C more easily;

2. Go has a more vibrant, more enthusiastic community right now.

3. Go advances more quickly.

(Of course, I can think of a lot more reasons that the Java ecosystem wins - but you asked to sell you on Go...)

But if you're looking for a new language, rather then Go specifically, I urge you to have a look at Nim (nee Nimrod). It's very impressive. I think it's what Go wanted to be.


> 2. Go uses explicit error return codes

Small nitpick, but the default error value returned is a string.


Even more nitpicky, it is an interface. It is where you get into problems with nil interfaces and not returning an explicit nil value for the error.


>Java the Virtual Machine:

> 1. Go produces single file native binaries. You may not care, but if you do, that's a win.

So does Java. It is only a matter of picking the right implementation.

There are certified implementations with AOT compilers.


Go compiles to native code and uses less memory than Java does. Go is slower than Java right now (for most things), but is less optimized so there is hope that it catches up and maybe one day is faster. Both languages have GC. Go has OOP, but not with classes. Go does not have generics, see the 50,000 blog posts for details. Go has less boilerplate than java but still some (mainly it is explicit in checking for errors). Go has built in HTTP server, and you don't have to deal with containers, application servers, etc. Go has some pretty incredible tools that come for free (race detector, gofmt, etc) but is lacking in some tools that Java excels with (IDEs & debuggers).

If you're already writing Java, Go isn't a huge win. If you're using Ruby/Python/JS and want more performance Go is a choice instead of the JVM.


The difference that seems to matter to most people is that Go has a standard library that's (currently) free of cruft, and free of "historical reasons".

Java's standard library is bytecode-compatible with Java1 programs, which will run fine on today's JVM's. This results in several very dumb and ugly inconveniences.

For example, take a look at the ArrayList javadoc. Notice the Object methods (clone(), and toArray()), which are there because the return-type polymorphism doesn't work (you cannot call a different method based on what you assign the function result to). So it's not possible to provide a valid Clone() method that returns something other than Object without breaking java1 programs that use it. (Go's has compiler hacks for core datatypes for some things, which of course means it's a matter of time until things break, but for now ...).

Of course if this was the only instance of this sort of problem, nobody would care, but you can find examples like this in most core classes.

TLDR: Go is in it's honeymoon period.

More detailed explanation :

http://stackoverflow.com/questions/17509659/why-standard-jav...


Very different use cases. If you can simply choose between the two, then your situation is quite unique. It means that you don't need to rely on the wealth of Java libs available already; hiring is not really an issue; probably you're starting a new project and you either don't have any kind of time pressure or you don't know neither of the languages. :)

There isn't anything that you can write in Go but you cannot in Java. (Probably it's true the other way around as well.)


Java dev who loves Go here.

Go takes what I consider 'best practices' in Java, things like delegation-rather-than-inheritance, and manage-concurrency-through-BlockingQueues-and-ExecutorServices, and makes them the 'idiomatic' way to do things.

It's not so much huge wins as a bunch of small wins, a much tighter stdlib, and making it much harder to do some common java antipatterns like implementation inheritance and overdone design patterns. Oh, and the C interop is amazing.

The error handling, having to explicitly check return codes, seems like a huge burden at first but pays off later when you realize you can't have random exceptions bubbling all the way up the stack from anyplace in the code.


> Can someone sell me on Go over Java?

Do you need generics? then Go isn't for you.


>Do you _think_ you need generics? then Go isn't for you.

FTFU ;-)


In my case I frequently need to link to C libraries, so Java is a non-starter. (I realize many others have the opposite problem.) Go allows me to easily build wrappers for the libraries I need, then get to business.

From there it's the concurrency primitives, goroutines and channels, that make Go a true winner. It's easy to build network services that can move work through various stages quickly and in parallel. SEDA-style workloads are a pleasure to implement in Go.


> I know there is a lot of hate for Java on HN

I can't speak to the technical merits of either language, but something can be said about choosing a language with a lot of hype as opposed to one that's hated in the community. From a recruitment perspective, choosing Go may be more favorable in a lot of developers eyes.


Java is hated in 'the community'? Well, Java is not hated at all in the Java community, it just happens that HN is a place where people are hyping the latest & greatest stuff - until it turns out that it isn't _that_ better than the old stuff. Go does not represent a fundamental paradigm shift, it's just a language with slightly different compromises to Java. We have a _lot_ of experience with Java in the industry so we know the advantages and weaknesses -- not so much with Go.

"From a recruitment perspective, choosing Go may be more favorable in a lot of developers eyes." -- from a recruitment perspective it's extremely easy to find (experienced) Java devs, and not so easy to find (experienced) Go devs.


> it just happens that HN is a place where people are hyping the latest & greatest stuff - until it turns out that it isn't _that_ better than the old stuff.

That brought a smile to my face. I've been saying this about Java since 1994.

> from a recruitment perspective it's extremely easy to find (experienced) Java devs,

From my experience, that is true in the literal sense of "experienced = having experience". But not so true in the sense of "experienced = competent and knows their stuff", which is often implied. One of the things about new and fringe languages is that they are effectively a competence filter:

It is much more likely that e.g. a Lisp programmer, a Haskell programmer, an OCaml programmer or a $NEW_HOTNESS programmer for most values of $NEW_HOTNESS is a good one. I suspect Go is beyond NEW_HOTNESS status to work as a filter, though.


One nice thing about go is that you don't have to hire good "Go devs", you can just hire good devs and they can learn Go in a week. We do that on Juju. 2/3rds of our devs wrote their first line of go on the job, and were still productive right away.


I can't speak to the technical merits of either language, but something can be said about choosing a language with a lot of hype as opposed to one that's hated in the community. From a recruitment perspective, choosing Go may be more favorable in a lot of developers eyes.

The problem is that the 'HN community' has IMO a pretty small overlap with the overall software development community.

Hype in the former does not equate to hype in the latter.


I'm detecting a bit of a trend here.

* Start with python * Find out it's a little horrible. * Hire Guido * Python still a little horrible. * Switch to Go


I expect your detection is a little broken :)

Much more along the lines of:

* Start with Python

* Have a great time and build successful apps with it

* Become global-scale infrastructure provider

* Hit problems that only occur at that scale and switch tools

The take-away is, you're almost certainly not going to end up at the size of Dropbox. If you're spending ages as a startup futzing around using bleeding-edge technology just because it's cool, you might well be wasting time – or at least prematurely optimising.

This is the perfect example – Dropbox has the engineering talent to write a set of libraries to implement basic features. Earlier-stage companies don't have that luxury.


Nice comment. I really believe strongly that the fast iterations that python allows for is why companies are so successful. The argument that language x does not scale is a bit naive to the fact that they got to the point where they have enough users to make scaling an issue. Go is wonderful and I love programming in Go, but I would choose either python or node for web apps that require fast iteration. I believe that when offering services over the cloud, utilizing multiple languages for well built individual components is a great approach.


Dropbox is not dropping Python: "Dropbox will continue to develop majority of its features in Python. We have only migrated performance critical components to Go."


Hey now! No actual reading of the source material please. /s


It's more or less a matter of time until everything will be migrated to go. Since it has better performance and works good for them, there's no point not to do it.


Not necessarily. First easy reason for not migrating: the cost to do so outweighs the benefits. Why port working code to a new language when all you're doing is maintenance?

I don't know about you but its hard to sell upper managers on complete rewrites of things when the end result is: no real change but it might/should/could run faster. Unless performance is a concern to be addressed the risk of changing technology stacks doesn't seem a great idea.


You don't have to rewrite anything, the old language can still die. If you're writing all your new code in X and your codebase growth is accelerating, the existing code in Y will look less and less relevant over time.


Perhaps, but it doesn't sound at all that the case is all new stuff is go for them.


I'm detecting a different trend:

> HN post about any language > Snarky comment about any language > Language war

It's depressing that your comment has percolated to the top of the thread; it has nothing whatsoever to do with the story.


I flagged it.


I don't think they found out Python is horrible - the first line is praising the language, it just didn't suit their needs anymore.


>the first line is praising the language, it just didn't suit their needs anymore.

Well, praise for a language is telling people how it suits your needs, so...


Apparently it suits their needs quite well:

>For clarification, Dropbox will continue to develop majority of its features in Python.


I use C++ because it meets several key needs that other languages don't. That really isn't praise. I like C++, but it's not wonderful and perfect. It's just the right tool for the job.


It's certainly possible to praise a language without it suiting your needs.


Except they've managed pretty well with Python for everything for several years. And we're talking about Dropbox here, not some small startup. Maybe it's not "a little horrible".

Besides that, they're migrating only some parts of the backend. They're not "switching". The rest is still Python.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: