Go Performance Tales

jws · on April 17, 2014

I found the bit about Go using AES native instructions to accelerate map key hashing most interesting. This accounted for a >50% throughput increase when he found aws c1.xlarge instances which had AES-NI as compared to those that didn't.¹

This is the kind of detail most developers would not be aware of, and to be fair, even now knowing it exists the only reference I can google up at golang.org is the C source code of runtime/alg.c where you will see…

    23		if(use_aeshash) {
    24			runtime·aeshash(h, s, a);
    25			return;
    26		}

… no hint that it might reduce your hosting costs by 33% or account for some huge variation in performance between one test machine and the next, or even individual runs if you are spinning up cloud instances to do your testing.

␄

¹ Does your CPU have the AES, SSE3 and SSE4.1 cpu capability bits all turned on? If so, you will hash mightily! Do you even know where to look to check?

ominous_prime · on April 17, 2014

I had a similar realization recently, where I would occasionally get an ec2 instance that would cause my server to core dump.

Turns out that a library I was using set `gcc -Mnative` by default, which was emitting some instructions which weren't valid everywhere. I was requesting the same virtual hardware, but the physical variations caught me off guard.

js2 · on April 17, 2014

/proc/cpuinfo?

DrJokepu · on April 17, 2014

I find it interesting how insightful, technical articles like this receive hardly any comments while the usual "softer" articles that tend to dominate the Hacker News frontpage these days receive dozens if not hundreds of comments. I wonder what this says about us, HN readers.

riffraff · on April 17, 2014

http://en.wikipedia.org/wiki/Parkinson's_law_of_triviality (whence the "bikeshedding" expression comes from, AFAICT)

munificent · on April 17, 2014

There's a difference between comments and upvotes. With articles like this (which I really liked), I certainly upvoted it but because it's so technically detailed, I don't have anything useful to add.

canadev · on April 17, 2014

Same here. I really enjoyed reading it, but I don't understand a lot of what it is talking about. I also bookmarked it. When the post had 0 comments I was tempted to post something to the effect of what I just said, but I figured it didn't really add any value.

timc3 · on April 17, 2014

That the audience is diverse, but common “softer” articles appeal to a wider demographic.

joncooper · on April 17, 2014

While I'd love to see more of that thing, I don't really think of HN as a technical venue. More like a virtual pub frequented by a variety of folks who participate in the VC ecosystem in one way or another.

mattgreenrocks · on April 17, 2014

Is there a better technical venue?

Hint: it probably isn't /r/programming :)

joncooper · on April 17, 2014

I'm not aware of one that's broadly-based though I'd love to find it. Slashdot used to be, although I haven't frequented it in many years.

ajtulloch · on April 17, 2014

https://lobste.rs has some good content, although the bulk is already on HN.

mattgreenrocks · on April 17, 2014

This is common on every aggregator: most content is not selected for depth or insightfulness, but rather ease of consumption. The extreme of this is Reddit's proliferation of images and memes.

To be fair, HN doesn't feel like it has a hacker spirit to it. It's too preoccupied with business crap and self-promotion.

elwell · on April 18, 2014

Maybe that's a good thing.

Dewie · on April 17, 2014

The softer the topic, the more room for subjectivity. The more room for subjectivity, the more room for controversy. Mountain long threads tend to be highly emotional/opinionated bickering that doesn't go anywhere, because the topic is so subjective that people can easily stick to their own biased worldview, reiterating the same old talking points.

RodericDay · on April 18, 2014

case in point: this branch of comment tree, and your post in particular.

Dewie · on April 18, 2014

oh, you.

jsnell · on April 17, 2014

Just a note on the zlib optimization patches. The blog post is linking to an old version, there's a newer one from a month ago. Also, the patch still appears to be a bit buggy (e.g. corrupt output being generated by the new deflate strategy), so don't plan on actually deploying it.

SixSigma · on April 17, 2014

If you want more details on Go profiling, this Go Lang blog post is a great place to look

http://blog.golang.org/profiling-go-programs

stcredzero · on April 17, 2014

I've been looking into false sharing and other pitfalls of parallelism on multi-core systems.

http://en.wikipedia.org/wiki/False_sharing

It seems like our multi-core hardware isn't that well thought out yet. In particular, locks can cause false sharing. Even go channels are based on locks!

I would love to see some kind of annotation in golang for padding certain structs to cache-line boundaries. This value can be read from most processors, so it could be done in a cross-platform fashion. The gringo ring-channel library has to do its own padding to avoid false sharing.

https://github.com/textnode/gringo/blob/master/gringo.go

    type Gringo struct {
        padding1 [8]uint64
        lastCommittedIndex uint64
        padding2 [8]uint64
        nextFreeIndex uint64
        padding3 [8]uint64
        readerIndex uint64
        padding4 [8]uint64
        contents [queueSize]Payload
        padding5 [8]uint64
    }

There was a similar technique for padding Java objects, but it now turns out that the JVM is smart enough to optimize the padding out!

jamesmiller5 · on April 18, 2014

Interesting point. I doubt go would let something as low level like struct padding bubble up into the language though, perhaps there could be some kind of library level "Unsafe" field annotation, though I'd prefer the compiler to optimize this automatically.

stcredzero · on April 19, 2014

My first thought is that it's going to be pretty challenging for a compiler to optimize this without runtime information.

tjaerv · on April 18, 2014

Here's a good article on solving false sharing in Java with cache-line padding:

http://mechanical-sympathy.blogspot.com/2011/07/false-sharin...

yukichan · on April 17, 2014

The biggest performance issues I think that some people run into with Go involve reflection which seems to be slow. Something that does a lot of JSON parsing for example maybe could be much slower in Go than in Java, Python or JavaScript I think. I don't have any data, but I've known people to complain about it that work with Go. I wonder if a JIT or AOT compiler might help.

schmichael · on April 17, 2014

There are libraries that generate specialized marshallers to fix precisely this issue: https://journal.paul.querna.org/articles/2014/03/31/ffjson-f...

joncooper · on April 17, 2014

That is an interesting project but won't yet help to speed up the (very slow) JSON unmarshalling.

schmichael · on April 17, 2014

Indeed, that's on the TODO list at the bottom.

timtadh · on April 17, 2014

I have also played around trying to achieve a high performance trie in Go.[1] My approach is to use the Ternary Search Trie structure. Unfortunately, I have not yet approached either the performance of the native hash table or a hash table in Go (although Sedgewick tells us you should be able to beat a hash table). My TST does not yet have the Patricia Trie optimization (of collapsing long internal runs). Perhaps with that addition it will get closer to hash table performance.

Also everything he said about channels also holds true in my experience. I haven't tried writing a C library for Go yet but his discovery is pretty interesting for when I dive into that.

[1] : https://github.com/timtadh/data-structures/blob/master/trie/...

sagichmal · on April 17, 2014

Good article. It closely reflects the experience we've had at SoundCloud in our more heavily-stressed services.

camus2 · on April 18, 2014

do you still use ruby and how?

sagichmal · on April 18, 2014

Sure, the monolithic app at the center is still Rails. We're working to deprecate it and go all SOA/microservice, but many of those services are JRuby.

tl;dr still very polyglot here.

awda · on April 17, 2014

> All of our existing indexes were based on string data which had associated integer IDs in our backend

You already have a perfect hash function :).

greiskul · on April 17, 2014

The IDs might be sparse though.

sltkr · on April 17, 2014

Sparse is fine if they are more-or-less randomly distributed.

awda · on April 17, 2014

I was mostly aiming for unique -> no hash collisions.

logicchains · on April 17, 2014

This is probably a stupid question, but I wonder if the author could have used slices instead of maps with integer keys. It would have used more memory, but it would probably also have been significantly faster. A significant proportion of the performance issues I see raised on the Go mailing list seem to involve maps.

cronos · on April 17, 2014

Probably no. Author says that he uses IDs as keys. I can assume that they are 32 or 64 bit long. So even if they are 32 bits, you would need a 4GB slice to hold any potential key. Or make a dynamic array (well, slice) that would give even more overhead than a map (and would closely resemble what map does internally better anyways). So only if those keys were really small, that would be a possible alternative.

awda · on April 17, 2014

If 'select min(keycolumn) from footable' is relatively stable and keys are created ascending (databases do this, right?), a resizing slice (with 0-min() elided from the bottom) seems like a good alternative to a hashing structure.

If the key-space is relatively sparse, this will waste more memory than the hash, of course.

wolf550e · on April 18, 2014

Do they have to use zlib? For on the wire compression LZ4 is better, for archiving lzma2 or zpaq is better.

ihsw · on April 17, 2014

> Using a map[int]* Metric instead of a map[string]struct{} would give us that integer key we knew would be faster while keeping access to the strings we needed for the indexes. Indeed, it was much faster: the overall throughput doubled.

I'm a little sceptical of this -- type assertions are fast but it's an extra step to initializing a struct. It would have been nice to see tests done comparing map[string]struct{} to map[int]struct{} and comparing map[string]* Metric to map[int]* Metric.

Also, there is no way to escape an asterisk, so I apologize for the awkward space after each one.

vetinari · on April 17, 2014

My understanding of this section was, that in * Metric they keep the original string ID and struct{}. If they went straight to map[int] struct{}, they would lose the string ID.