This mirrors my own experience using go to write software used in biological research. Performance is good enough and the quickened dev cycle due to less verbosity and excellent tooling makes implementing variations on algorithms easy.
Ie I wrote a random forest implementation [1] and i've been able to quickly add in a bunch of different published and experimental variations on the algorithm as needed for the data sets I work on.
Eh. (blog author here). It was exceptionally annoying, but in the end, the paper got in. You have to develop a thick skin, but I've found that in the long run, the process works pretty well. The feedback we got from our first submission to OSDI was awesome, and helped seriously improve the quality of the research, not just the paper. (It led us, for example, to adding the formal proof and the machine-checkable proof.) The NSDI feedback was crappy. I'm disappointed, because I like NSDI a lot. I think people were just having a bad-Paxos day. The final SOSP feedback was again really good.
I'm someone who's happy you put the proofs in there.
As soon as I read "has no designated leader" I stopped reading and skimmed through to make sure safety and liveness proofs were in there somewhere before I spent any significant time reading the rest.
This feels like it's probably a nice advancement over S-Paxos and I'm looking forward to getting through it -- thanks for making it accessible.
Also, this is awesome: "Copyright is held by the Owner/Author(s)"
That last awesomeness is SIGOPS: They bought out the copyright for all papers at SOSP'13 on behalf of the authors. three cheers. All of the SOSP papers are available -- already -- open-access: http://sigops.org/sosp/sosp13/program.html
(For the non-academics: SIGOPS is the chapter of the ACM for operating systems. It runs SOSP and similar things.)
Great work. Do you have a link to the machine-checked proof? In the technical report there is only a formal specification and a detailed but not machine-checkable proof.
It would be really great to see systems researchers write machine-checked proofs :)
The specification can be model checked, but we don't have a machine-checkable proof. I agree that that would be very useful, and we'll think about writing one at some point in the future.
Whoops, Iulian's correct - I was being imprecise earlier. We have a formal specification in TLA+ that we put through the TLC model checker for several weeks of runtime on the biggest machine we could find, not a machine-checkable proof.
But to the original questioner, yes, we should put the TLA+ spec online instead of requiring someone to copy/paste from the tech report... please hold. :)
I think it's a fair weakness. Since Go has a mark&sweep GC, which runs at unpredictable moments, for unpredictable durations, the performance and responsiveness of programs can be significantly affected by unpredictable factors, like small code changes, which might or might not trigger the GC. That is not to say that GC is inherently slower than C++ malloc/free or reference counting, but only that small changes in code can lead to large, unpredictable changes in performance.
> Getting the performance variation down was a little tricky. In several spots, we had to think carefully (and experiment with some dead-ends) about how to reduce the amount of garbage we were generating.
They do say it was worth it in the end, and that other languages would have similar drawbacks.
From a systems perspective: Dealing with lots of communication and having lots of concurrency _where the concurrent actors are doing a lot of different stuff_ is easier in Go.
Achieving high performance on a single machine and building reusable data structure libraries is easier in C++. Things like vectorization, cache-optimized layouts, etc., are all more easy here (or just in C).
> building reusable data structure libraries
> vectorization, cache-optimized layouts
Could you explain a bit more about this features of C++? I believe that you're referring to templates for the first point, but I'm unsure about the rest - SIMD? stack-allocation/memory-pool-allocation?
1) Templates
2) SIMD, but also auto-vectorization. gcc/g++ do an OK job of using SSE instructions when they can. If you have a loop like:
for (int i = 0; i < 16; i++) {
vec[i] *= 16;
}
gcc can sometimes emit SIMD code for you, without having to think about it. If you need to go farther, it's easy to use something like ispc to write vectorized code and link it into your program. It's more work to take advantage of hyper-optimized C from Go.
Cache-optimized layouts: Not where it comes from, but things like forcing a structure to be aligned on a 64-byte (cache line) or 4096 (page) boundary, in order to, e.g., eliminate false sharing between threads. It's harder to micromanage the memory layout in Go while retaining its safety benefits and native datatypes.
Finally, gcc havs more CPU-specific builtins such as __builtin_prefetch(), which let you have even more control over what data goes into what level of the cache (or doesn't).
I'm interested in compiler development, so this kind of information is really valuable for me... Do you think something similar was achievable in Go, if you had thing like vectors (i.e. fixed-length numerical arrays that would have math-like semantics, and which would compile to SIMD), prefetch attributes/instructions for heap-allocated variables, and memory-alignment attributes for structs or even for specific heap allocations?
Good question. I'm not sure I have a great mental list that differs from the "go community conventional wisdom" (whatever that is), but it was basically:
- Avoid creating garbage unnecessarily, but don't go too crazy. It's often enough to, e.g., statically size those things you know are static. We tried adding a structure cache to our RPC/marshaler to further reduce GC, but it didn't prove helpful and just complicated our interface.
- Pre-size data structures when you can (maps, arrays): Reduces performance variation as they dynamically resize.
- Be careful of using introspection for things like marshaling. It's easy to create a bottleneck.
- Batching is just - or more - important as it is in C++.
- When you're doing complex stuff, be deliberate with how you prioritize request handling ().
More on (): We found it incredibly important for the performance of our system to prioritize processing of existing requests over processing new request arrivals - up to a point. Making progress on an existing request helps reduce the number of goroutines/objects/etc. that're processing, so that everything runs faster. But you don't want to go too far or you starve for work. The key here was to think about it instead of letting your design happen to you.
The most important is already known: Profile, don't guess. The profiling Go article is a good start.
Ie I wrote a random forest implementation [1] and i've been able to quickly add in a bunch of different published and experimental variations on the algorithm as needed for the data sets I work on.
[1] https://github.com/ryanbressler/CloudForest