Hacker News new | past | comments | ask | show | jobs | submit login

In my current job, I was given a task to write a service which should handle billions of events eventually in the fastest possible way. The language choice was given for me to decide, and I was thinking that maybe I'll do it with C++14, Scala, Go or Rust. C++ has it's quirks and I'm not really enjoying it's build tool of choice, cmake. Scala I can write fast, but scaling the app in Mesos would consume lots of memory; every task would take a huge slice of RAM because of the additional JVM. Go I just don't like as a language (personal taste) and I think the GC adds a bit too much of pausing for this app, so I gave Rust a shot.

The first week was madness. I'm fairly experienced developer and the Rust compiler hit me to my fingers constantly. Usually the only way out was to write the whole part again with a different architecture. On the second week I was done with my tasks, very comfortable with the language and just enjoying the tooling. Also I was relying on tests way less because of the compiler, even less than with Scala. If it compiles, it has a big chance of working. Cargo is awesome. Let me repeat: Cargo is awesome. Also I like how I can write code with Vim again and even though for some things I need to read the Rust source, it is pretty easy to get answers if you're in trouble. In the end I wrote some integration tests with Python and I'm quite happy with them.

Now I want to write more Rust.




On the other hand, as your system gets more complex, you will start missing tried and test libraries from some of the more mature languages you have mentioned. Maybe the extra time you have to spend writing that code will compensate for the extra memory you would have spent because of the JVM. Not taking anything away from Rust - I'm just giving an alternative perspective.


That is a risk we've analyzed and are willing to take. Scala can stay on the data side of our platform, but the real time part will be either Rust or C++14 (and Python where we don't need raw speed and can handle everything with one thread). For the libraries the situation is not that bad. Something I was missing from Scala side was Kamon. It is a really awesome library I've been using (and contributing) for some time. For Rust I needed to do my own solution with threads and a bit of macro love. It works, it is using a constant amount of ram and when we get our next Rust project running, I might open source that part for everybody to use...


As far as I can tell, Kamon is an application monitoring system - what do you monitor? I am curious as I have never felt the need to monitor apps this way (but I have never worked on HFT apps obviously).


Also, one thing I would love to see is the syntactic sugar for binding, like Haskell's `do` and Scala's `for`. Would be super useful for lots of things, e.g. validating requests in a web server.


Counters, times and such. These are to be sent to a service like InfluxDB and monitored with a tool such as Grafana.


I bet a fair number of those tried and tested libraries expose a C interface. As far as I know, Rust have an FFI. Those libraries are only a set of bindings away. So I think we can do away with Java libraries, as well as the occasional header-only C++ template madness¹.

1: I mean, it's madness if you want to talk to it from the outside. If you stick with C++, this is "merely" flamewar material (take the Boost devs and the C++ committee on one side, and game devs like Mike Acton, Jonathan Blow, or Casey Muratori on the other side).


> I bet a fair number of those tried and tested libraries expose a C interface.

Some of them do, or have C equivalents, but many of them don't. Once you get into "there's only one library that does it and it's one guy's graduate thesis" territory, often the only option is on the JVM.


> "there's only one library that does it and it's one guy's graduate thesis"

Those libraries aren't tried and tested. They can be reimplemented. And in some contexts, they shouldn't be trusted as a black box.


The last time I was in that situation, the only option was lisp. I'm still supporting a lisp environment for that one piece of code.


Have you looked at languages like Nim [1]? Garbage collectors don't have to be of the stop-the-world variety.

[1] http://nim-lang.org/ http://nim-lang.org/docs/gc.html


That does look like a stop-the-world garbage collector, just with a tunable maximum runtime.


> Go I just don't like as a language (personal taste) and I think the GC adds a bit too much of pausing for this app

Not saying you have to like Go, we all have different preferences. But FYI, the gc has gotten much better in the latest releases. I don't think you would be bothered by gc pauses. (my understanding is that it' both faster and more "spread out")


Personally, I don't care about the pauses, I care about having a complex runtime.

Consider a widely deployed library written in C -- say, zlib or libjpeg or SQLite or openssl. Could you rewrite those in Go or Haskell or Scala? No. Because nobody wants a big surprise when linking a harmless C library all of a sudden brings in an entire GC that's starting extra threads, etc.

In other words, Rust is the first practical language in a long time that could be used for a new library that might be used in millions of different applications written in dozens of languages.


We're doing real time stuff where these pauses matter. Just that we don't really need GC for anything and adding it to certain services cause more harm than it causes good.


If the execution is measured in nanoseconds, the GC pauses are measured in millisecond, sometimes tens or hundreds...

Also RAM has a cost, for garbage collected service you need to reserve some extra RAM. Scaling these processes add an extra memory overhead we're not so eager to take.


Oki, got it. Can't decide if that sounds like horrible conditions or a fun challenge =)

For reference, the Go 1.5 GC propaganda:

> Go 1.5, the first glimpse of this future, achieves GC latencies well below the 10 millisecond goal we set a year ago.

Blog post: https://blog.golang.org/go15gc

Slide: https://talks.golang.org/2015/go-gc.pdf

Talk: https://www.youtube.com/watch?v=aiv1JOfMjm0

Edit: Meanwhile, over in Java land they measure stop the world pauses in seconds, not milliseconds:

http://stackoverflow.com/questions/15696585/long-gc-pauses-i...

https://blogs.oracle.com/poonam/entry/troubleshooting_long_g...


10ms is also 2/3rds of a frame @ 60FPS which is going to guarantee dropped frames if you're doing anything soft real-time.

Also most GC'd languages don't give you strong control over data locality(by nature of everything being a reference) so you pay in cache misses which are not cheap.


Go gives you decent control over layout, much better than most scripting languages. It's not quite as good as C, mostly due to not having a wide variety of very complicated data structures to choose from; you've got structs, arrays, and maps. But the first two in particular let you do many things to control locality.

It's also worth pointing out 10ms is the maximum, not the minimum. There's plenty of workloads where you're not going to see pauses anywhere near that long. It's certainly not impossible to write a 3D game in Go and get 60 fps with high reliability, especially with the amount of stuff nowadays that actually runs on the GPU. You're not going to be able to do this for your next AAA game, but I wouldn't see a huge reason that an "indie" game couldn't use Go for that. (Probably not my first choice, all else being equal with library support personally I'd suggest Rust for this for a lot of other reasons (the type of concurrency you often get in games is going to be well supported with Rust's borrow checker architecture), but contrary to some discussion I would say it is still on the table.)


It's not worth it, because as soon as you go down this route you're going to be constantly thinking about the GC.

If I add a cyclic reference here will that make GCs longer? Maybe I should have a free list and reuse these objects (after all they're responsible for most of my allocations)?

As soon as you're thinking like that as you write each line of code you've lost all of the benefits of the language being high-level, and you'd be better of controlling memory manually.


> It's certainly not impossible to write a 3D game in Go and get 60 fps with high reliability, especially with the amount of stuff nowadays that actually runs on the GPU.

Maybe not impossible, but highly highly improbable. If you say that then you have no idea what it takes to run 60fps constantly. It is HARD, VERY HARD.


> It's certainly not impossible to write a 3D game in Go and get 60 fps with high reliability...

But it's hard to write a 3D game in Go and get 60 fps with certainty.


But so what? It's impossible to write a 3D game in anything and get 60 fps with certainty. If the McAfee Disk Bandwidth Consumer decides to kick in, you may take a penalty on disk access. If the system specs are slightly lower than you planned for, you don't get your guaranteed 60 fps. If the GPU overheats and underclocks, you don't get 60fps.

It's not that hard to write something in Go where you pay two milliseconds every 5 minutes or something, or less than that. Again, let me reiterate, 10ms per sweep is the max, not the min. Plus it sounds like people think that Go somehow guarantees that you're going to pay this on every frame or something, rather than potentially separated by seconds or minutes.

As I said, Go probably isn't my first choice for a game anyhow, but people are grossly overstating how bad it is. Games may be high performance, but they also do things like run huge swathes of the game on a relatively slow scripting language like Lua or some Lisp variant or something. It's not like the network servers that are Go's core design motivation are completely insensitive to performance or latency either.

(Plus to be honest I reject the premise that all games must be 60 fps on the grounds that they already aren't. I already disclaimed AAA vs. Indie. But it's still not a bad discussion.)


GC pauses are not the real reason people hate GCs even when they say otherwise. It's mostly about the cognitive overhead GC imposes on you and compromises that it forces you to take. If it's there - you are always aware of it, you cannot ignore it, you know it's unpredictable, but you know there are better predictable choices and it makes it very hard to feel good about the quality of the software you write. It's like it forces you to accept mediocrity.


Lua, slow? I don't even know where to begin to start on that one.

Like I said in another part of the thread, different tools for different domains. You certainly can get a consistent 60 FPS on any consoles, which are also much less forgiving about techniques that would be perfectly fine on a PC.


For another comparison 10 ms is nearly your entire frame at 90hz for current VR headsets.


I encourage you to read the links on the comment you responded to. If you had, you would have found that as of Go 1.5 the pauses are 2ms or less for a heapsize < 1GB. In Go 1.6 this reduced further. The 10ms pause that you're thinking of it the upper limit, not what actually happens even at heapsizes of 100GB+


2ms is still more than the entire budget we'd dedicate to animation on a lot of scenes. You want predictable performance in these cases and tracing GCs are fundamentally opposed to this.

Generally you're much better off using a scripting language like Lua for the places where you want the advantages GC brings but scoping it so you can limit the damage it does to your frame time.


What about microseconds using the category of collectors designed for your application area (real-time or minimal delay)?

http://www.cs.technion.ac.il/~erez/Papers/real-time-pldi.pdf


CAS on every property write? That adds up really quickly which is why Rust has Rc vs Arc.

The issue with GC is it's not deterministic. There's a whole nother aspect that has to do with free heap space. Quite a few GCs need some multiple of working set free to do the compact phase. If they don't have it then GC times start to spiral out of control.

On an embedded device(or modern game console) you'll be lucky to have 5-10mb free. On PSP we used to have only 8mb total since 24/32mb went to video+audio. We still ran Lua because we could constrain it to a 400kb block and we knew it would never outgrow it.

Just like everything in software there's different tools for different domain spaces. Otherwise someone would just write one piece of software that fits every problem and we'd all be out of a job.


I don't have the working set numbers on that example. I just had latency, which you were discussing, which maxed out a 145 micro-seconds per pause on highest-stress test. Usually lower. Far as working set, there's a whole subfield dedicated to embedded systems. One early one from IBM I found benchmarks for on a microcontroller had 85% peak performance with 5-30% working set for GC. They basically trade in one direction or another.

The more common strategy I see in CompSci for your use-case is to have a mix of memory pools, GC, and safe manual. Ada had memory pools and I'm sure you get the concept. Safe manual is when static analysis shows a delete can happen without safety consequences. So, that can be unmanaged. Then, what's left is handled by the concurrent, real-time GC. In typical applications, that's a small fraction of memory.


Yup and in games that exactly the space that Lua/UnrealScript/etc fit neatly into.

The issue is with using GC based language for areas where you need high throughput and low latency(there's the whole cache miss thing which GC exacerbates).


10ms latency is crap if you're doing something like realtime ad exchange bidding, where you have <60ms bid times. Really do not want that hiccup.


Curious, how long can a pause be before it's a problem? (I do not write these kind of services)


"It depends". Some services are insensitive to pauses of several minutes (email processing; you may not want it there all the time but spikes to that level are often acceptable). Some services are sensitive to pauses in the single-digit milliseconds (for instance trying to maintain a high frame rate).


Two years ago I was working on a system that cared about single digit microseconds, luckily with low throughput.


Lots of people have to worry about single digit microseconds with high throughput.


I don't think I implied otherwise.


How do you handle the reference counting pauses in Rust then; rely on them being deterministic? Or do you completely avoid the reference counting boxes?


How does reference counting pause? There's no stop-the-world GC action, as the garbage collection is amortized over every object deallocating itself.


That's not entirely true, if your decrement results in a cascading free of an entire tree of objects you will pay for the deallocation time for that entire tree -- decrement, free, decrement children, free, etc..

And unless your RC algorithm is more clever than most that's going to be stop-the-world.


Rc is scoped to a single thread, so it'll be, at worst, stop-the-thread.


Not the OP, but in my experience it's fairly rare to encounter someone using the `Rc` type in Rust. It's nowhere near as prevalent as `shared_ptr` seems to be in C++, for example.


There is also the `Arc` type, which is an atomic reference counter. You still need those, especially if you need to share stuff between multiple threads.


You can often get away with using scoped threads.


This is a good article about different wrapper types in Rust http://manishearth.github.io/blog/2015/05/27/wrapper-types-i...


Yeah, a lot of that has to do with the borrow checker guiding you towards single-ownership designs with is a good thing(tm).


I played a little bit around with Go currently, and thereby ported a part of an older application for which I already have C++, C#, Java and other implementations and benchmarks to compare them.

My current results is that Go performance can be really good (C++ level) if I try to avoid allocations as much as possible. With a more sloppy implementation that e.g. did not reuse buffer objects ([]byte) between multiple network packets the performance dropped significantly to about 1/3 of the C++ implementation, and the GC time dominated.

Fortunatly it's quite easy to avoid garbage in Go, as we have value types there and the synchronous programming style means lots of stuff can be put on the stack and there's not so much gargabe due to captured objects in closures and similar stuff. All in all I'm quite confident that with a mediocre amount of optimization Go can be suitable for lots of application types. Although I would not necessarily try to use it for low-latency audio processing if something else (that produces garbage) is running in the same process.


If you're going to work at that low a level, what are the advantages of golang over rust?


Is the JVM overhead really significant these days? You're talking about wasting what, maybe 32MB for each instance? (which can have multiple threads, you don't need one per core). Is your RAM really so limited that you'd run out of RAM before maxing out all your CPUs or I/O in a Scala version?

(I am kind of interested in Rust, but not going to touch it until it gets proper HKT support - you have to duplicate so much code otherwise)


It is a lot, when we need to scale by adding more Mesos tasks. Every task launches a new JVM instance, which adds to the total memory usage. Also by using a GC and not wanting to collect all the time, you need to add more memory overhead, which is not required with Rust.

Comparison: I have one app written with Scala and consuming 350 megabytes per Mesos task. The similar-sized Rust app is using 8-10 megabytes of RAM. When you have 1000 instances running during a peak, this escalates fast.


A bit odd to be running a VM on a VM. Did you consider using a Java "application server" or similar that can keep a VM running and run tasks within that?


So, for jvm you create less tasks but with more ram, e.g. 100 tasks 4G RAM each, and problem is solved?


Having more of the smaller tasks spreads out the load more evenly and when rust apps starts in less than a second, it's easy to fine tune and reason about having more tasks running. With JVM, we just waste memory because of the overhead from GC, and I don't feel comfortable with the idea.


Why would you need to start up an additional JVM for every task. Can't one JVM run multiple tasks?


The mention of Mesos may imply that each instance would be a separate container.


link to source code? want to see what 2-weeks old rust programmer writes and how feasible it is to train our team


Nothing open source yet. I might take the metrics system out and publish it on Github when I've used it in another service and it is a bit more generalized.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: