In my current job, I was given a task to write a service which should handle bil...

karterk · on May 26, 2016

On the other hand, as your system gets more complex, you will start missing tried and test libraries from some of the more mature languages you have mentioned. Maybe the extra time you have to spend writing that code will compensate for the extra memory you would have spent because of the JVM. Not taking anything away from Rust - I'm just giving an alternative perspective.

pimeys · on May 26, 2016

That is a risk we've analyzed and are willing to take. Scala can stay on the data side of our platform, but the real time part will be either Rust or C++14 (and Python where we don't need raw speed and can handle everything with one thread). For the libraries the situation is not that bad. Something I was missing from Scala side was Kamon. It is a really awesome library I've been using (and contributing) for some time. For Rust I needed to do my own solution with threads and a bit of macro love. It works, it is using a constant amount of ram and when we get our next Rust project running, I might open source that part for everybody to use...

annnnd · on May 26, 2016

As far as I can tell, Kamon is an application monitoring system - what do you monitor? I am curious as I have never felt the need to monitor apps this way (but I have never worked on HFT apps obviously).

pimeys · on May 26, 2016

Also, one thing I would love to see is the syntactic sugar for binding, like Haskell's `do` and Scala's `for`. Would be super useful for lots of things, e.g. validating requests in a web server.

pimeys · on May 26, 2016

Counters, times and such. These are to be sent to a service like InfluxDB and monitored with a tool such as Grafana.

loup-vaillant · on May 26, 2016

I bet a fair number of those tried and tested libraries expose a C interface. As far as I know, Rust have an FFI. Those libraries are only a set of bindings away. So I think we can do away with Java libraries, as well as the occasional header-only C++ template madness¹.

1: I mean, it's madness if you want to talk to it from the outside. If you stick with C++, this is "merely" flamewar material (take the Boost devs and the C++ committee on one side, and game devs like Mike Acton, Jonathan Blow, or Casey Muratori on the other side).

lmm · on May 26, 2016

> I bet a fair number of those tried and tested libraries expose a C interface.

Some of them do, or have C equivalents, but many of them don't. Once you get into "there's only one library that does it and it's one guy's graduate thesis" territory, often the only option is on the JVM.

loup-vaillant · on May 26, 2016

> "there's only one library that does it and it's one guy's graduate thesis"

Those libraries aren't tried and tested. They can be reimplemented. And in some contexts, they shouldn't be trusted as a black box.

jleahy · on May 26, 2016

The last time I was in that situation, the only option was lisp. I'm still supporting a lisp environment for that one piece of code.

Varriount · on May 26, 2016

Have you looked at languages like Nim [1]? Garbage collectors don't have to be of the stop-the-world variety.

[1] http://nim-lang.org/ http://nim-lang.org/docs/gc.html

TD-Linux · on June 1, 2016

That does look like a stop-the-world garbage collector, just with a tunable maximum runtime.

gizzlon · on May 26, 2016

> Go I just don't like as a language (personal taste) and I think the GC adds a bit too much of pausing for this app

Not saying you have to like Go, we all have different preferences. But FYI, the gc has gotten much better in the latest releases. I don't think you would be bothered by gc pauses. (my understanding is that it' both faster and more "spread out")

jeffdavis · on May 26, 2016

Personally, I don't care about the pauses, I care about having a complex runtime.

Consider a widely deployed library written in C -- say, zlib or libjpeg or SQLite or openssl. Could you rewrite those in Go or Haskell or Scala? No. Because nobody wants a big surprise when linking a harmless C library all of a sudden brings in an entire GC that's starting extra threads, etc.

In other words, Rust is the first practical language in a long time that could be used for a new library that might be used in millions of different applications written in dozens of languages.

pimeys · on May 26, 2016

We're doing real time stuff where these pauses matter. Just that we don't really need GC for anything and adding it to certain services cause more harm than it causes good.

pimeys · on May 26, 2016

If the execution is measured in nanoseconds, the GC pauses are measured in millisecond, sometimes tens or hundreds...

Also RAM has a cost, for garbage collected service you need to reserve some extra RAM. Scaling these processes add an extra memory overhead we're not so eager to take.

gizzlon · on May 26, 2016

Oki, got it. Can't decide if that sounds like horrible conditions or a fun challenge =)

For reference, the Go 1.5 GC propaganda:

> Go 1.5, the first glimpse of this future, achieves GC latencies well below the 10 millisecond goal we set a year ago.

Blog post: https://blog.golang.org/go15gc

Slide: https://talks.golang.org/2015/go-gc.pdf

Talk: https://www.youtube.com/watch?v=aiv1JOfMjm0

Edit: Meanwhile, over in Java land they measure stop the world pauses in seconds, not milliseconds:

http://stackoverflow.com/questions/15696585/long-gc-pauses-i...

https://blogs.oracle.com/poonam/entry/troubleshooting_long_g...

vvanders · on May 26, 2016

10ms is also 2/3rds of a frame @ 60FPS which is going to guarantee dropped frames if you're doing anything soft real-time.

Also most GC'd languages don't give you strong control over data locality(by nature of everything being a reference) so you pay in cache misses which are not cheap.

jerf · on May 26, 2016

Go gives you decent control over layout, much better than most scripting languages. It's not quite as good as C, mostly due to not having a wide variety of very complicated data structures to choose from; you've got structs, arrays, and maps. But the first two in particular let you do many things to control locality.

It's also worth pointing out 10ms is the maximum, not the minimum. There's plenty of workloads where you're not going to see pauses anywhere near that long. It's certainly not impossible to write a 3D game in Go and get 60 fps with high reliability, especially with the amount of stuff nowadays that actually runs on the GPU. You're not going to be able to do this for your next AAA game, but I wouldn't see a huge reason that an "indie" game couldn't use Go for that. (Probably not my first choice, all else being equal with library support personally I'd suggest Rust for this for a lot of other reasons (the type of concurrency you often get in games is going to be well supported with Rust's borrow checker architecture), but contrary to some discussion I would say it is still on the table.)

jleahy · on May 26, 2016

It's not worth it, because as soon as you go down this route you're going to be constantly thinking about the GC.

If I add a cyclic reference here will that make GCs longer? Maybe I should have a free list and reuse these objects (after all they're responsible for most of my allocations)?

As soon as you're thinking like that as you write each line of code you've lost all of the benefits of the language being high-level, and you'd be better of controlling memory manually.

jdright · on May 27, 2016

> It's certainly not impossible to write a 3D game in Go and get 60 fps with high reliability, especially with the amount of stuff nowadays that actually runs on the GPU.

Maybe not impossible, but highly highly improbable. If you say that then you have no idea what it takes to run 60fps constantly. It is HARD, VERY HARD.

AnimalMuppet · on May 26, 2016

> It's certainly not impossible to write a 3D game in Go and get 60 fps with high reliability...

But it's hard to write a 3D game in Go and get 60 fps with certainty.

jerf · on May 26, 2016

But so what? It's impossible to write a 3D game in anything and get 60 fps with certainty. If the McAfee Disk Bandwidth Consumer decides to kick in, you may take a penalty on disk access. If the system specs are slightly lower than you planned for, you don't get your guaranteed 60 fps. If the GPU overheats and underclocks, you don't get 60fps.

It's not that hard to write something in Go where you pay two milliseconds every 5 minutes or something, or less than that. Again, let me reiterate, 10ms per sweep is the max, not the min. Plus it sounds like people think that Go somehow guarantees that you're going to pay this on every frame or something, rather than potentially separated by seconds or minutes.

As I said, Go probably isn't my first choice for a game anyhow, but people are grossly overstating how bad it is. Games may be high performance, but they also do things like run huge swathes of the game on a relatively slow scripting language like Lua or some Lisp variant or something. It's not like the network servers that are Go's core design motivation are completely insensitive to performance or latency either.

(Plus to be honest I reject the premise that all games must be 60 fps on the grounds that they already aren't. I already disclaimed AAA vs. Indie. But it's still not a bad discussion.)

zzzcpan · on May 27, 2016

GC pauses are not the real reason people hate GCs even when they say otherwise. It's mostly about the cognitive overhead GC imposes on you and compromises that it forces you to take. If it's there - you are always aware of it, you cannot ignore it, you know it's unpredictable, but you know there are better predictable choices and it makes it very hard to feel good about the quality of the software you write. It's like it forces you to accept mediocrity.

vvanders · on May 27, 2016

Lua, slow? I don't even know where to begin to start on that one.

Like I said in another part of the thread, different tools for different domains. You certainly can get a consistent 60 FPS on any consoles, which are also much less forgiving about techniques that would be perfectly fine on a PC.

shinymark · on May 26, 2016

For another comparison 10 ms is nearly your entire frame at 90hz for current VR headsets.

nindalf · on May 26, 2016

I encourage you to read the links on the comment you responded to. If you had, you would have found that as of Go 1.5 the pauses are 2ms or less for a heapsize < 1GB. In Go 1.6 this reduced further. The 10ms pause that you're thinking of it the upper limit, not what actually happens even at heapsizes of 100GB+

vvanders · on May 26, 2016

2ms is still more than the entire budget we'd dedicate to animation on a lot of scenes. You want predictable performance in these cases and tracing GCs are fundamentally opposed to this.

Generally you're much better off using a scripting language like Lua for the places where you want the advantages GC brings but scoping it so you can limit the damage it does to your frame time.

nickpsecurity · on May 26, 2016

What about microseconds using the category of collectors designed for your application area (real-time or minimal delay)?

http://www.cs.technion.ac.il/~erez/Papers/real-time-pldi.pdf

vvanders · on May 26, 2016

CAS on every property write? That adds up really quickly which is why Rust has Rc vs Arc.

The issue with GC is it's not deterministic. There's a whole nother aspect that has to do with free heap space. Quite a few GCs need some multiple of working set free to do the compact phase. If they don't have it then GC times start to spiral out of control.

On an embedded device(or modern game console) you'll be lucky to have 5-10mb free. On PSP we used to have only 8mb total since 24/32mb went to video+audio. We still ran Lua because we could constrain it to a 400kb block and we knew it would never outgrow it.

Just like everything in software there's different tools for different domain spaces. Otherwise someone would just write one piece of software that fits every problem and we'd all be out of a job.

nickpsecurity · on May 26, 2016

I don't have the working set numbers on that example. I just had latency, which you were discussing, which maxed out a 145 micro-seconds per pause on highest-stress test. Usually lower. Far as working set, there's a whole subfield dedicated to embedded systems. One early one from IBM I found benchmarks for on a microcontroller had 85% peak performance with 5-30% working set for GC. They basically trade in one direction or another.

The more common strategy I see in CompSci for your use-case is to have a mix of memory pools, GC, and safe manual. Ada had memory pools and I'm sure you get the concept. Safe manual is when static analysis shows a delete can happen without safety consequences. So, that can be unmanaged. Then, what's left is handled by the concurrent, real-time GC. In typical applications, that's a small fraction of memory.

vvanders · on May 26, 2016

Yup and in games that exactly the space that Lua/UnrealScript/etc fit neatly into.

The issue is with using GC based language for areas where you need high throughput and low latency(there's the whole cache miss thing which GC exacerbates).

cmrdporcupine · on May 26, 2016

10ms latency is crap if you're doing something like realtime ad exchange bidding, where you have <60ms bid times. Really do not want that hiccup.

gizzlon · on May 26, 2016

Curious, how long can a pause be before it's a problem? (I do not write these kind of services)

jerf · on May 26, 2016

"It depends". Some services are insensitive to pauses of several minutes (email processing; you may not want it there all the time but spikes to that level are often acceptable). Some services are sensitive to pauses in the single-digit milliseconds (for instance trying to maintain a high frame rate).

dllthomas · on May 26, 2016

Two years ago I was working on a system that cared about single digit microseconds, luckily with low throughput.

jleahy · on May 26, 2016

Lots of people have to worry about single digit microseconds with high throughput.

dllthomas · on May 27, 2016

I don't think I implied otherwise.

falcolas · on May 26, 2016

How do you handle the reference counting pauses in Rust then; rely on them being deterministic? Or do you completely avoid the reference counting boxes?

anp · on May 26, 2016

How does reference counting pause? There's no stop-the-world GC action, as the garbage collection is amortized over every object deallocating itself.

cmrdporcupine · on May 26, 2016

That's not entirely true, if your decrement results in a cascading free of an entire tree of objects you will pay for the deallocation time for that entire tree -- decrement, free, decrement children, free, etc..

And unless your RC algorithm is more clever than most that's going to be stop-the-world.

TheHydroImpulse · on May 26, 2016

Rc is scoped to a single thread, so it'll be, at worst, stop-the-thread.

kibwen · on May 26, 2016

Not the OP, but in my experience it's fairly rare to encounter someone using the `Rc` type in Rust. It's nowhere near as prevalent as `shared_ptr` seems to be in C++, for example.

pimeys · on May 26, 2016

There is also the `Arc` type, which is an atomic reference counter. You still need those, especially if you need to share stuff between multiple threads.

Jweb_Guru · on May 27, 2016

You can often get away with using scoped threads.

pimeys · on May 26, 2016

This is a good article about different wrapper types in Rust http://manishearth.github.io/blog/2015/05/27/wrapper-types-i...

vvanders · on May 26, 2016

Yeah, a lot of that has to do with the borrow checker guiding you towards single-ownership designs with is a good thing(tm).

Matthias247 · on May 26, 2016

I played a little bit around with Go currently, and thereby ported a part of an older application for which I already have C++, C#, Java and other implementations and benchmarks to compare them.

My current results is that Go performance can be really good (C++ level) if I try to avoid allocations as much as possible. With a more sloppy implementation that e.g. did not reuse buffer objects ([]byte) between multiple network packets the performance dropped significantly to about 1/3 of the C++ implementation, and the GC time dominated.

Fortunatly it's quite easy to avoid garbage in Go, as we have value types there and the synchronous programming style means lots of stuff can be put on the stack and there's not so much gargabe due to captured objects in closures and similar stuff. All in all I'm quite confident that with a mediocre amount of optimization Go can be suitable for lots of application types. Although I would not necessarily try to use it for low-latency audio processing if something else (that produces garbage) is running in the same process.

dorfsmay · on May 27, 2016

If you're going to work at that low a level, what are the advantages of golang over rust?

lmm · on May 26, 2016

Is the JVM overhead really significant these days? You're talking about wasting what, maybe 32MB for each instance? (which can have multiple threads, you don't need one per core). Is your RAM really so limited that you'd run out of RAM before maxing out all your CPUs or I/O in a Scala version?

(I am kind of interested in Rust, but not going to touch it until it gets proper HKT support - you have to duplicate so much code otherwise)

pimeys · on May 26, 2016

It is a lot, when we need to scale by adding more Mesos tasks. Every task launches a new JVM instance, which adds to the total memory usage. Also by using a GC and not wanting to collect all the time, you need to add more memory overhead, which is not required with Rust.

Comparison: I have one app written with Scala and consuming 350 megabytes per Mesos task. The similar-sized Rust app is using 8-10 megabytes of RAM. When you have 1000 instances running during a peak, this escalates fast.

lmm · on May 27, 2016

A bit odd to be running a VM on a VM. Did you consider using a Java "application server" or similar that can keep a VM running and run tasks within that?

reality_hacker · on May 26, 2016

So, for jvm you create less tasks but with more ram, e.g. 100 tasks 4G RAM each, and problem is solved?

pimeys · on May 27, 2016

Having more of the smaller tasks spreads out the load more evenly and when rust apps starts in less than a second, it's easy to fine tune and reason about having more tasks running. With JVM, we just waste memory because of the overhead from GC, and I don't feel comfortable with the idea.

fn1 · on June 1, 2016

Why would you need to start up an additional JVM for every task. Can't one JVM run multiple tasks?

gdw2 · on June 1, 2016

The mention of Mesos may imply that each instance would be a separate container.

webscalist · on May 26, 2016

link to source code? want to see what 2-weeks old rust programmer writes and how feasible it is to train our team

pimeys · on May 26, 2016

Nothing open source yet. I might take the metrics system out and publish it on Github when I've used it in another service and it is a bit more generalized.