Hacker News new | past | comments | ask | show | jobs | submit login
Golang data structures (github.com/workiva)
147 points by laex on July 4, 2015 | hide | past | favorite | 58 comments



This point about the B-tree is salient:

  Unfortunately, to make the B-tree generic we require an
  interface and the most expensive operation in CPU
  profiling is the interface method which in turn calls
  into runtime.assertI2T. We need generics.
One of the most frustrating aspects of go is having to resort to interface{} every time you want a generic implementation of a data structure.


This resonates strongly with me right now. Benchmarking simple code that happens to contain interface{} casting always leaves me... feeling a little down.

I benchmarked some go code this week between use of an `atomic.Value` and a `sync.Mutex`. Use is vastly biased towards reads (almost no writes, and thus no need for exclusive locking), so I expected the atomic.Value, being a teeny tiny wrapper around CAS primitives, would be faster.

And it's not. Single-goroutine use of atomic.Value is actually slower than use of the exclusive sync.Mutex in my benchmarks. Why? Compare-and-swap should be just about the fastest synchronization primitive possible. So, what else is wasting time?

As far as I can tell, it's the interface conversions that atomic.Value (and I, as the consumer) have to do. The sync.Mutex sits next to the var instead, so the var still has its own type info that way, and so though the lock is heavier, there's no casting. (Mind, this is not a huge speed difference -- this is faffing around at the 10s/100s of nanosecs here. I'm far enough inside a hot loop it matters; many applications won't be. But still: it just startles me atomic.Value is slower here. I'm going to go out on an opinionated limb here, but: It shouldn't be.)

I'm still going to use the atomic.Value, because it's algorithmically the right choice: as the structure it's covering grows, the amount of cost the exclusive locking would cause will increase, and with atomic.Value it won't. But it's... it just feels Odd to look at that benchmark.


I've come to the conclusion that if you actually look at it, the universal structure in Go is not interface{}, it's []byte. Anybody can read from and write to a []byte. Everything else is just a crutch to make things more bearable, but in the end, you will deal with []byte. In this sense Go kind of feels sad because []byte is far from having all the information you need, and if you want a "generic" B-tree that only handles []byte, you will waste a lot of time (un)marshalling.

Perhaps one of the design decisions was to make this cost clear to the developer. It's also a sign that Go doesn't want to be so high-level as to make low-level stuff secondary.


I don't follow. []byte comes up anywhere I marshal and unmarshal types for transmission and storage, but nowhere else. In fact, compared to Ruby, I think Golang is somewhat hostile to []byte and string.


It's just not worth using interface. And, I think, by the way, that has always been the intent from a syntax point of view.

The fact that it's slow on the CPU side is not great, but I don't think it's a major problem in the eyes of the core dev team: as far as I can tell, they do not believe that generics support as commonly envisioned is in line with go's mission.

Go is the anti-lisp / anti-unix of languages; designed to not give a team of junior devs enough rope to hang themselves. If we'd like a generics proposal that would get buy-in, I think we need to adhere to that dream while we imagine working generics and propose something; I bet it would be accepted.


sigh. Then you're back at using a more complex solution by copying and pasting generic structures and algorithms.

Moreover, the idea that generics allows junior devs enough rope to "hang themselves" is preposterous. There are more than one type of generic system out there, we don't need to replicate C++ templates here.

Let's not fool ourselves here. There's a difference between "we just haven't figured it out yet" and "generics doesn't fit into the vision of Go". The former attitude is perfectly acceptable. The latter is really annoying and misleading because Go does use generics -- internally.


> Let's not fool ourselves here. There's a difference between "we just haven't figured it out yet" and "generics doesn't fit into the vision of Go". The former attitude is perfectly acceptable. The latter is really annoying and misleading because Go does use generics -- internally.

I think there's some misconception here, Russ Cox (one of the core Go team) posted this on HN a few weeks ago https://news.ycombinator.com/item?id=9622417


I wonder if they've spoken to Phil Wadler. Having done Haskell and Java maybe he could get it right for Go?


Isn't that sort of like saying it's not worth using data structures other than maps and slices? In go, if you want a complex data structure you either use interfaces -- as the authors of this library did -- or you rely on some sort of code generation approach.

I've followed the go team's discussion of generics for a long time now. They've always claimed they're open to proposals that fit the philosophy of the language, but they're seemingly not in any hurry to address the issue. And that's fine. But I won't pretend it isn't frustrating every time I want to build something with a non-trivial data structure. Sometimes a map or a slice just doesn't cut it.


Never used Go, but I've written a lot of C in the past and I think the point you're missing is that they probably intend for you to use a special purpose data structure when a map or slice doesn't fit your needs. When it's special purpose, there's no need to use generics.

When I needed a generic data structure in C, I would do code generation (but only after I'd have a use for it in more than one place). The number of times I ever needed to do this for anything other than an array or hashtable is (not really surprisingly) vanishingly small.

So, coming from the perspective of Go is a better C (that avoids the mistakes of C++), this choice makes sense to me. I don't necessarily think it's the right choice -- I think generally generics are useful, but tend to use generic functions in my C++ code more than generic types or containers.

Might be worth noting that I work in game development, where using a library data structure isn't really considered for most things, because a special purpose one is likely able to be higher performance, due to assumptions it's able to make, operations it doesn't need to implement, etc. I guess I can see the situation where not being able to find an off the shelf generic data structure is frustrating, but I think it might be a bit overblown.


You don't ever need a concurrent linked list or hash map?

Lock-free concurrent data structures always struck me as the killer use case for generics as far as data structures are concerned: reasoning about atomics (especially when you have to annotate with relaxed/acquire/release/sequentially consistent ordering) is really tricky and one small mistake will lead to incredibly difficult-to-reproduce bugs. You really don't want to rewrite those over and over. Even something like a j.u.c-style concurrent hash map with striped locking needs careful thought around growth and reallocation lest you deadlock; it's so much easier to just have someone else write the data structure.


Writing bug free data structure implementations is hard. Doing it in a way that is highly performant is extremely hard.

The idea of cutting/pasting code seems pretty backwards to me compared to just using one of these libraries:

http://java-performance.info/large-hashmap-overview-jdk-fast...


Thing is, in Go that library cannot be written generically.


> or you rely on some sort of code generation approach.

Exactly.


Here's an idea: How about a code generator that interprets special syntax we put into source files? The code generator would run before every compile and its output would be fed straight into the compiler. An apt name might be CPP -- compile preprocessor.

It might be a bit difficult to predict what the input to the compiler actually looks like after CPP has done its job. So we could restrict the kinds of things that CPP can do to make it more predictable. Let's say CPP can only substitute symbols in places where types are asked for...


So do it. If it's a great success, it might end up in the language. (Seriously.)


There are languages that have what you want. Why not use them?


As an outsider who has only tinkered with Go and watched most of Rob Pike's talks, my impression is "C is to Go as Java is to JavaScript" in the sense that both have a thin veneer of a familiar language on top of a fundamentally different, esoteric language. For JavaScript it's Self (prototype inheritance) and for Go it's NewSqueak (CSP).

I don't think Go is really about catering to hordes of cheap junior devs or rebelling against Lisp or Unix, etc. It's (in my interpretation) a way to use channels and goroutines without overwhelming people with academic syntax. As a tool, "the right job" is small and focused on throughput and concurrency - not wiring up webs of generic abstractions. Obviously some people are also finding the language general enough to do the latter, but not without a bit of frustration.

If my understanding is reasonable, people shouldn't be asking "why doesn't Go have generics," but rather "should I be using Go for this? Does this program even have one 'chan' in it?"


> I don't think Go is really about catering to hordes of cheap junior devs

Rob Pike disagrees with you.


> .. anti-lisp ..

So is Scala. Meaningless.

> .. anti-unix

Actually, the issue is precisely that Rob and team are entirely too comfortable on _nix, so there is a (typical for Go) duality where (L) language is syntactically light and semantically small, (R) *nix is your IDE and piping source through source level transformers are assumed to be 2nd nature to the user.

[p.s.]: (L) is motivated by making another work-day tool for the broadest range of developers and experience levels, and (R) possibly emerged as a de-facto approach given the zippy compiler and source level parsers built into the Std. lib, and comfort at that level is a virtue of a much smaller set of developers that map into (L).


> Go is the anti-lisp / anti-unix of languages

Strange, I thought Go was all about the unix philosophy ?

> It's possible to use these ideas to construct something analogous to type-safe Unix pipes.

https://golang.org/doc/faq

> designed to not give a team of junior devs enough rope to hang themselves

yeah, I thought Google only hires PhDs ... if a noob from a code bootcamp can understand generics within 6 weeks, certainly a guy from a prestigious university hand picked by google can ... please stop being insulting.


> designed to not give a team of junior devs enough rope to hang themselves

I find this to be an utterly silly goal.


Not really. Go doesn't fit Google hiring process "we only want CS Phds".

It is also not one language that one can use on the interview exercises.

So given Rob Pike's statement about an easy language for newly hires, one has to wonder where they are coming from.


> they do not believe that generics support as commonly envisioned is in line with go's mission.

More like they can't add generics without breaking the backward compatibility of their mediocre type system.


More like they can't add generics without breaking the backward compatibility of their mediocre type system.

Reference? Proof?

(Genuinely interested.)



I don't see anything in Russ Cox's answer that proves your statement. If there is any conclusion to that post, it is 'We can probably do it. But we are not doing it now because we are too afraid that we do it wrong, like Java.'

My problem with the Go team's line of reasoning is that they always cite Java (erasure gives problems) and C++ (too long compile times), but there are many languages that have implemented parametric polymorphism nicely ages ago (C#, ML, Haskell).

Sometimes I wonder if it is the lack of a person like Philip Wadler on the Go team who can hash out how to best fit generics into Go's type system.


> Go is the anti-lisp / anti-unix of languages; designed to not give a team of junior devs enough rope to hang themselves.

Sounding more like a patronising, holier than thou language to me.

And given that Java implemented generics (sort of) and the world didn't collapse indicates to me junior developers could handle it.


Go is such a pleasant language to use, it's a shame that it has a couple of overly-opinionated warts (from annoying missing imports to hair-pulling lack of generics) that keep me from fully 100% evangelizing it.


Missing imports can be solved in Vim using the vim-go plugin and GoImports, I'm not sure what the state of other editors is like.


Ditto with SublimeText and Atom. Installing the popular Go plugins (GoSublime and go-plus respectively) makes the import thing a non-issue. The appropriate packages are imported and unused imports are removed every time the file is saved.

I suspect that the OP might not have used Go for very long if he didn't discover any of these solutions.


I'm aware of them, and grateful for the handful of people that created those solutions and wrote them up, to help the many thousands of others that run into the import annoyance when they're learning Go.

I personally don't use them because the emacs version never worked quite right for me and at that point I'd rather just deal with Go directly and not have an additional annoyance.


Just a suggestion, but if you plan to execute `go build` at the terminal, is it possible to execute `go-imports` just before? I think using this and gofmt (again from the command line) makes developing in Go very smooth.

I agree that it is a shame that these options aren't built-in and many newbies would have to discover them on their own.


I should drop that into my (small) build script, yeah. thanks.


Imports with emacs is a pretty simple add: http://tleyden.github.io/blog/2014/05/27/configure-emacs-as-...


The lack of Generics is getting embarassing (and the suggestions, such as "source file generation" even more so).

With C, which is custom-data-structure land (and besides pointer use is encouraged) you can get by.

But Go purports to be safe and modern and uses a GC.


The runtime architecture of Go is really nice (goroutines, concurrent GC). But, indeed, the generics are missing.

Therefore, I think they should make Go suitable to be used as an intermediate language (the compiler would become a back-end). Having Go as a compiler back-end could spawn a lot of interesting new languages.


Someone tried to get Go running on the JVM: https://code.google.com/p/jgo/

The JVM can support goroutines (similar to Quasar Fibers) and the various GC implementations are significantly more advanced than what Go has. But I suspect Go is popular mostly for its simplicity than features.


But I suspect Go is popular mostly for its simplicity than features.

Definitely. I think there are a couple of reasons:

* It's familiar to anyone who knows C-like languages.

* The tooling is really good for such a young language (go get, go fmt, goimports, pprof, etc.).

* Java's tooling is still lightyears ahead, but a significant chunk of programmers dislikes the JVM for various reasons.

* It integrates well with the UNIX/C environment.

* Simplicity (someone mentioned in a thread a couple of days ago: Go seems to look what C++ does, and then does exactly the opposite).


sounds like a cool idea. somewhat like wasm (WebAssembly) ? Are there plans to fork go and build generics on top of it ?


Also, C has macros, which makes it easier (but ugly) to make 'generic' data structures.


But you never want a generic implementation of a data structure. You want concrete implementations of data structures. I have written Go for over 6 years now and written many hundred thousand lines of code. The time I'd save avoiding writing concrete data structures would probably not even be measurable. There are simply other areas of programming that dominate time.

Having written so much Go code, there were situations where generics would have been useful, but they were of a very different kind. Generics and typed unions would be very helpful for writing compilers (what I do). But even then it's a matter of making code clearer and and more type safe, rather than saving time/not writing data structures.

Other than compilers I have never felt the need of generics in Go, and I have never felt the need to write data structures containing interface{}. interface{} is very useful as a parameter type, to reflect on it, but it's almost always a mistake to use it inside a data structures (there are exceptions, but people usually abuse it).

There's no question that some form of generics would be useful to some people in some context, but its necessity is overblown.


> But you never want a generic implementation of a data structure. You want concrete implementations of data structures.

Yes you want a generic implementation of a data structure. Because it's dry. Now personally I stopped arguing with Gophers about that. This kind of arrogant speech is exactly why I dropped Go. Let me decide what I want and what I need , and how to implement it . If the language doesn't allow that then I definitely dont need that language.


Something I've always wondered - in the futures packages (https://github.com/Workiva/go-datastructures/blob/master/fut...):

The author has a variable `triggered` to check the state of whether or not the future is complete. Now to protect that variable from data races, the author uses a standard lock. Now since you need to protect a simple value what I would do in this situation is use an atomic variable w/ atomic.StoreInt32 (you could also probably not even have to use an atomic load in GetResult).

Of course the best option would be to benchmark (and I'm assuming the author did, given that the structures are performant), but I'm wondering if my "gut" is right and under what situations others have seen that would necessitate a lock vs. an atomic instruction (high contention on readers, few writers)?


I think you are wrong in thinking he is protecting the `triggered` variable with the lock. He's using a Mutex to synchronize access on the entire struct.

snippet from https://github.com/Workiva/go-datastructures/blob/master/fut...

    f.lock.Lock()
    f.triggered = true
    f.item = item
    f.err = err
    f.lock.Unlock()
    f.wg.Done()


I'm confident the following is identical (if nothing else about the code changes). Notice how access to `f.item` and `f.err` aren't read protected in `GetResult`.

    func (f *Future) setItem(item interface{}, err error) {
	f.item = item
	f.err = err
        f.lock.Lock()
        f.triggered = true
	f.lock.Unlock()
	f.wg.Done()
    }
Any reader of `f.item` or `f.err` (in `GetResult`) is essentially waiting for triggered to be true before reading the value of those respective fields. If after an atomic load or synchronized access of triggered returns `true`, both versions of `setItem` should guarantee that a subsequent read of `f.item` will return the "promised" value.

In any case, because the reads (in `GetResult`) aren't protected as well, and the call to `setItem` which writes to `item` and `err` only happens on a single thread, the only thing the code perfectly protects now are reads of triggered.


Now someone will hopefully create "type writers" for gen [1] for compile-time type checking and convenience.

[1] http://clipperhouse.github.io/gen/typewriters/


For an alternative to Futures, check out tv42's "topic":

http://github.com/tv42/topic

I used it here in my multiplexing serial-to-TCP adapter: https://github.com/chrissnell/tnc-server/blob/master/tnc-ser...


This is much closer to what I'd expect to have available after using clojure's core.async. pub/sub on top of channels feels pretty good without needing tons of extra complexity.


This probably solves an exact need of mine in the near future - thanks!


Some of these sound really great. A major pain point with go concurrency is the absence of a fast lockfree map implementation that doesn't destroy your GC times. The 'ctrie' sounds very appealing; I'm curious to try it.


Has anyone used Ctries in any real scenario (has anyone implemented them in a language like C)? How do they fare as opposed to other data structures? Is the latency of CAS instructions insignificant?

I'm thinking of using it in our codebase, in theory it should speed things up quite a bit.


Our Matchbox[0] library uses a copy of the ctrie in this repo

[0] https://github.com/Workiva/matchbox


Nice work. It is a pity that Go has no generics, though. With them, data structures would be much more efficient and easier to work with.


Whoah.


Not sure why you got downvoted. I'd speculate its because your usual contributions to HN are usually much higher quality.


Man...


Go on.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: