Hacker News new | past | comments | ask | show | jobs | submit login
A JVM Does That? (2011) [pdf] (azulsystems.com)
224 points by bshanks on July 7, 2016 | hide | past | favorite | 165 comments



If you are part of the HFT crowd who uses the JVM then you know who the author, Cliff Click, is.

His Blogs (former and current) are goldmines for high performance JVM information.

http://www.azulsystems.com/blog/author/cliff

http://www.cliffc.org/blog/

I had one CTO of a public HFT firm tell me that short of FPGA's Azul's pause-less GC (https://www.azul.com/products/zing/pgc/) was the biggest performance win they'd had in the past few years, note: this was a few years ago.

EDIT: someone emailed asking for how HFT firms write their java code. I haven't written java in 3 years so I'm probably not the best person to author a list but this is what I'd include:

In order of importance:

- Measure, measure, measure. Every HFT firm can tell you to the nanosecond, as much as that's possible, what their time is from receiving a packet to when it's replied to. Focus as much, if not more on the 95%, 99% times as the average time.

This time increasing is considered a bug in the same way that an app crashing due to user input is considered a bug, which is to say that you just don't ship with this kind of bug.

- no GC's, everything else starts to become less important if the GC is called every 30 seconds. Some guys will have only one GC call a day, which means you have a really large Eden GC space

- short call stacks, and no recursion

- one physical core per thread

- non locking data structures and use the one writer principle where possible

- don't use the kernel if possible, ie solar flare Ethernet cards that have userland drivers

- cache friendly data structures

Watch this video: https://www.youtube.com/watch?v=iINk7x44MmM


If performance is THAT critical in HFT, why do they pick java as the starting language?

I understand that java is very fast, faster than most people believe, but does it beat hand-optimized assembly? Fortran? C?

Why Java?


Here are my impressions from having worked as a programer on an HFT system during 2009-2011.

When an HFT system makes money, some other person or system is loosing money i.e. it is a zero-sum game. And that person isn't going to sit idle; they are going to analyze their mistake and up their game. The industry parlance for this is, I think, that the "strategy dissipates into the market". This causes HFT systems to be always in active development.

Spotting a strategy and putting together a system to take advantage of it, and spotting where your system is losing money and fixing it - both these need to be done with quick turn around time.

Most strategies can be thought of as picking up nickels from in front of a on-coming train. Your best case "win" is a tiny amount of money; but your worst case "loss" is much much more money. The trick in HFT - the HF part - is to make these tiny wins over and over. But much actual money is involved, with very bad consequences should the program run into inopportune bugs.

All of these - constant development, time to market pressure, high reliability - give the impression that Java is a better language to code this HFT systems in than C or Fortran.


You seem to be saying that C & Fortran aren't a) Fast b) suitable for developing strategies.

...Now I've never used Fortran myself.... but none of those arguments seem to stack up.


I think he's saying that reasonably fast code can be written reasonably fast in java. Writing fast C code is perhaps a more specialised task?


But I thought HFT required unreasonably fast code?


As with most big systems there are degrees. Most of the code isn't in the fast paths so can be written in whatever language makes sense for the developers. For all the love it receives on HN, Java has plenty of practical success as a development language.

The majority of the fast paths have performance characteristics that Java is suitable for as long as it's actually executing your program. hence the focus on GC, networking and concurrency primitives. And mostly Java is chosen where the bulk of the rest of the codebase is Java.

Where you need ridiculously fast, then you have to drop down to what suits. Be that C, ASM or FPGAs. But these have costs wrt dev time and/or ease of establishing correctness. So they tend to be used judiciously.


But it requires it in an unreasonably short amount of time. Hence the tradeoff.


It's close to impossible to get certain kinds of code correct in C, for instance, string parsing. Also until pretty recently, C and C++ did not have a sane memory model so what happened when threads was involved was dependent on what CPU you are on, the sign of the moon, etc.

Java could very well be the first programming language designed by adults and it shows. (That is, they solve the hard problems, not just pretend they don't matter)


The amount of contortions you need to do with your "adult" language arguably negates a lot of advantages of using it. Just because you don't have pointers doesn't mean that all bugs are now eliminated.

Take Android for instance, getting solid performance out of the JVM there is a huge bear because the language is just not suited to low latency operations. I won't claim that C is a perfect language but I hate it when I see people throwing around that you're committing some sin for each line of C code you write.

If you want cache friendly operations and predictable performance you're not going to find it in any JVM or language that has a GC.


> If you want cache friendly operations and predictable performance you're not going to find it in any JVM or language that has a GC.

I don't know you and it's always dangerous to argue with people on HN due to who might be reading but I'd very strongly disagree with your above quote.

Predictable performance can be obtained on a very low latency, I've seen it:)

Cache friendly can be tougher and I won't argue the point as I think it will end up being a "no true Scotsman" argument where I claim Java can be cache efficient and you counter that C can do better so therefor Java isn't truly cache efficient


That's true, I'm very familiar with tuning Java to be cache aware. I've played around with FlatBuffers[1] in my spare time to good effect which seems to be the only sane way of doing cache aware layout in Java.

That said the amount of restrictions you have makes it really painful(needing to use pre-allocated wrappers if you want to read more than intrinsic types, not being able to grow/shrink anything). There's sun.misc.Unsafe, but that isn't available on every JVM implementation.

A GC is going to by nature introduce pauses if since at some point it's going to need to do work on the memory you hold. There's a set of tunable tradeoffs but it will impact latency.

I guess my meta point is if throughput and latency are paramount to you then you should really consider a language suited for it. For me today that would be Rust or C++ depending on how risk-adverse the organization you're working with is.

[1] https://google.github.io/flatbuffers/


There was an article on HN talking about the improvements go has made to their gc over the last few releases.

Umm... here

https://blog.twitch.tv/gos-march-to-low-latency-gc-a6fa96f06...

It sort of looks like language developers are finally getting the point that while you can easily deal with a throughput problem by just throwing faster processors, more processors, and or memory at at it, working around long gc pauses is just pain all around.


Personally, I am looking to Rust or Pony, but both are too young, and the tooling is quite there, not to mention recruitment potential. Erlang/LFE/Elixir are great for certain distributed HFT tasks, but not the core for speed and latency.

The JVM is in HFT in because of sheer inertia; it has been worked on for so many years to shoehorn it in, and some smart people have made it work. The tooling is mature, and there are alternative syntaxes close to it like Kotlin that remove a lot of the noise for code review and maintenance.

The JVM is the reason I stayed away from Clojure, even though I like Lisp/Scheme syntax and the semantics. Where is Clojure-clr nowadays?


>There's sun.misc.Unsafe, but that isn't available on every JVM implementation.

There's also talk about the possible removal of sun.misc.Unsafe in Java 9¹.

¹ http://blog.dripstat.com/removal-of-sun-misc-unsafe-a-disast...


Jeeezus... I've been writing Java since before many HNers were born, and somehow I never noticed that class.

The Oracle people are right about this one. Rip it out and sterilize the wound with fire.


Oracle backpedaled from that pretty quickly.


  Take Android for instance
Android <> the JVM. There was a recent, fairly well publicised legal battle around this fact.


I just think the people who make Android are high on drugs or something.

For instance the subjective experience of using Android is that you can never close any app that you've opened other than by uninstalling it. Even after you turn your machine off and turn it back on you still see windows for every f--king app.

Then they run a bunch of articles about what an idiot you are if you try to close these because it won't save your phone's battery.

Well I admit I do have some cognitive limitations and it is --hard-- to scroll through 30 apps just to switch from (say) the web browser to the PDF viewer, but I guess Google thinks it is great this way because you always have a Google Plus window open.

Then there are all the articles about the fancy power management they are going to have someday that doesn't face up to the fact that an android device may or may not charge if you plug it into a charger, might turn itself off when it is running, and that the most reliable way to turn it on is to do a hard reset... And this isn't any old piece of junk, this is a nexus device.


> For instance the subjective experience of using Android is that you can never close any app that you've opened other than by uninstalling it

Is there a reason the user should explicitly terminate an app rather than just stop using it? Why not have a GC button, too? "Closing" and "file system" are meaningless to anyone who hasn't got a grasp of implementation. Don't show that stuff to the user.


Those are just screenshots captured from the app you're seeing. If you switch to any OpenGL based app you'll see it restart.

FWIW Force Stop from the settings->app section will stop an app unless it's forced to be sticky because another service depends on it.


> Those are just screenshots captured from the app you're seeing.

That's not entirely accurate. Many of the apps can and will stay in memory (it doesn't suspend all of them). Not that that's necessarily a bad thing.


If you want to get really pedantic those are screenshots in the app drawer that are swapped with an application's surface in SurfaceFlinger just as it comes fullscreen.

But yeah, Android doesn't kill apps outright when you switch away, that would be a pretty poor experience to restart each time you opened a link in Twitter and came back from it for instance.


> if you want to get really pedantic those are screenshots in the app drawer that are swapped with an application's surface in SurfaceFlinger just as it comes fullscreen.

Oh I know I was just being specific since you said they were "just" screenshots. Wanted to make sure it was clear some of those apps may in fact still be in memory. ️


I'll admit my Nexus 6P has been endlessly frustrating, but it is possible to "close" apps by swiping sideways on their tiles after pushing the square button.


This is a great comment.

You should not close apps because you may interfere with tracking your location and other data collection.

Data collection is necessary to enrich your "user experience". When we know what you want we can fulfill your every wish! Just say "OK, Google". It will be great!

The developers behind this crap do not hear from satisfied users. Because the truth is no one really cares about this stuff. They care about things like reception and battery life...

except for some nerds like the ones who comment on HN who can easily point out all the stupidity of these "business" models.

When users are like puppets on a string, helplessly dependent. When there are no alternatives, no competition. Is that a business? And I suppose shooting fish in a barrel is game of skill.

Oddly enough, Googlers read HN comments and frequently defend the company, speaking only for themselves of course. Why should they care that anyone sees Android for what it really is? Whining nerds do not count, right? So why pay attention to what they think?

If developers of Android and iOS had any respect, if they had a conscience, then they would not be usurping people's computing resources for their own ends. Sure, users will be oblivious to what is going on and they will not complain. That does not mean it's OK to do these things.


Wow, it's interesting how much HN dislikes your comment given the number of privacy and cryptography experts here.

(Edit: Probably the repeated pejorative usage of 'nerd', I guess.)


Well, it was a rant rather than information. And I disagree with it from the beginning:

> You should not close apps because you may interfere with tracking your location and other data collection.

That's bullshit reason - you'd write a background service, not try to stop users from closing the app. For any single application I'm using I'm happy for it to be cached rather than restarting every time. This is just arguing against a cache layer, because <made up reason>.


Going offtopic from the rant to an Android development question, if I may - I thought background services were shut down brutally at random times, and hard to keep running reliably unless some app which uses them is open? I'm a newbie to Android dev, but I have an app which (attempts to) maintain an always-on connection, and I've gone with a foreground service because that doesn't get killed (as often).


That seems to go against the documentation: (https://developer.android.com/guide/components/services.html)

> A started service must manage its own lifecycle. That is, the system does not stop or destroy the service unless it must recover system memory and the service continues to run after onStartCommand() returns.

A started service shouldn't be killed randomly - after all, that's how the downloads are handled - you don't see those disappearing without a reason.


Downloads are generally handled via foreground services (one significant difference is that foreground services show up as an icon in your notification area, notice downloads or Play Store updates appear there?). Background services are more like OLE in the sense of they let you set your app up so another app can ask it for stuff.


Nothing perjorative at all about the term "nerd".

The issue is not simply privacy, it's control of the hardware.


This is why I didn't get to it until the edit. If I could be sure I wouldn't offend my clients, one of the first questions I'd ask is "are you a nerd?" As it is I have to couch it as "what's your background? are you technical?" and risk getting pinned as a condescending asshole when all I'm asking is "do you want me to tell you what the opcodes are doing to the transistors or do you want to tell me how the machine is 'feeling'?

The real question, I guess, is "do you have the knack"?


I think your comment comes off a bit harsh or crass for the HN crowd but I'll admit much of it is accurate. It's amazing how much I've had Nexus, Motorola and other flagship android devices(including my Nexus 6P) simply restart out of nowhere or where my Nexus 6P doesn't charge (happened 3 times now; really screwed up my mornings).

I love android but it feels like every iteration has plenty of huge bugs never really addressed. iOS always seems more reliable but far more limited with some of the things I want to do with a phone.


> It's close to impossible to get certain kinds of code correct in C, for instance, string parsing.

Complete hyperbole. Nothing stops you from using a safe string library in C.

> Also until pretty recently, C and C++ did not have a sane memory model

Technically true, but in practice it was rarely relevant.


> Complete hyperbole. Nothing stops you from using a safe string library in C.

Nothing technically stops you, it's just really hard. When even djb has suffered a string-processing-related bug¹ you can't reasonably expect to get string processing right all the time. Everyone thinks they can safely handle strings in C. It's like programmer chunibyo² or something.

--

¹ http://article.gmane.org/gmane.network.djbdns/13864

² “8th-grader syndrome”; the sort of overly high opinions middle schoolers have of themselves


Lockless structures is one of those rare cases where it is relevant. Not that I think Java is great for these (Java's atomics support is not great compared to C/C++ or even C#), but at least it's not the free-for-all that C/C++ used to enjoy.


Complete rubbish....

Do you really want a list of string-heavy s/w thats developed in C/C++ and in very widespread use?

Should I start with:

- Most interpreted languages...

- Web servers....


Eh it's HFT, I'm not sure there is much string parsing going on. Or variable length anything.


Some exchanges use text based variable length message formats. String parsing is very much going on even though its the bane of HFT devs.

That's going away as exchanges upgrade their gateways.


Yes, and you are going to write your string parsing routines from scratch, no matter what language you use.


That's just plain terrible. But a good reminder that all the talk about nanoseconds is just that.


The thing is, everybody has to deal with the string parsing, so speeding up your string parsing does have value even if its slow in relation to non-string based exchanges.


Because performance isn't the only requirement and Java is fast enough.

Correctness, maintenance, extensibility, developer productivity, tooling, deployment, and recruiting are all also important.


Garbage collection.

A common pattern in Java based systems is fan-out based processing. One immutable message is created, sent to multiple processors, who then combine signals to make a decision. Tracking the memory here would slow you down. But usually the JVM handles this just fine, and kills those objects in first generation.

Also, Java has a fantastic standard library, probably the best concurrency system, and the JIT can actually make code faster than C++.

It doesn't work for extreme low latency people - those guys write custom code for FPGA - but there is a healthy chunk of HFT for whom Java is a great choice.


The fantastic standard library mostly goes away because it allocates. It's possible to write Java code that doesn't allocate in steady state, but the coding style becomes terrible (e.g. overuse of primitives, mutable wrappers, manually flattened data structures, etc).

There's also the issue that even without GC running you pay the cost of card marking (GC store barriers) on every reference write. There's unpredictability due to deoptimizations occurring due to type/branch profile changes, safepoints occurring due to housekeeping, etc.

It's unclear whether that style of Java coding is actually a net win over using languages with better performance model.


Sometimes you just need to allocate, whether due to necessity or expediency.

If you make sure that "almost all" allocations are short-lived, GC is very fast. Allocation is bumping a pointer and cleanup is O(number of new, live objects). It's considerably faster than malloc/free for general-case allocation.


It all depends on what time scale we're talking about since "very fast" is relative. High performance native systems don't use (in any meaningful manner) naive malloc/free, so that comparison is somewhat moot. I hear this argument quite often when Java vs C++/C is discussed, but it's not comparing idioms/techniques in actual use.

Also don't forget that when GC runs it trashes your d/i-caches; temporaries/garbage allocs reduce your d-cache efficacy; GC must suspend and resume the java threads, which is trips to the kernel scheduler; there are some pathologies with Java threads reaching/detecting safepoints.

GC store barriers (aka card marking) don't have anything to do with thread contention (apart from one thing, which I'll note later). This is a commonly used technique to record old->young gen references, and serves as a way to reduce the set of roots when doing young GC only (i.e. you don't need to scan the entire heap). So this isn't about thread contention, per say -- with the exception that you can get false sharing due to an implementation detail, such as in Oracle's Hotspot.

The card table is an array of bytes. Each heap object's address can be mapped into a byte in this array. Whenever a reference is assigned, Hotspot needs to mark the byte covering the referrer as dirty. The false sharing comes about when different threads end up executing stores where the objects requiring a mark end up mapping to bytes that are on the same cacheline - fairly nasty if you hit this problem as it's completely opaque. So Hotspot has a XX:+UseCondCardMark flag that tries to neutralize this by first checking if the card is already dirty, and if so, skips the mark; as you can imagine, this inserts an extra compare and branch into the existing card marking code - no free lunch.


The idea is, there's a space between "performance doesn't matter" and "so fast it can't use malloc" in the trade-offs of software development. It turns out that space is very large.

"Performance-critical code" can even go in that space in an environment where developer cycles and program safety are things that matter, which is definitely the case in HFT.


Sure, but that space isn't just Java anymore anyway.

Also, what's an (non-toy) environment where developer productivity and safety/correctness don't matter? I always find that statement bizarre when talking about production systems.


No, the GC is a net win from the perspective of code development. The JIT is just one of the things that makes Java not as slow as you'd expect.

As I said, the JVM is an acceptable platform for the slower HFT. That's the kind where a clever predictive strategy matters (maybe with lead time of seconds) and you'll get more money from accurately predicting the future than from shaving off 250us.

Make no mistake - you'll still make money shaving off 250us, but not so much that you want to be bogged down structuring your code the C++ "if we structure it right we won't leak things" way.


You should've made it explicit then that you're referring to slow HFT -- the post I was replying to drew no such distinction apart from saying the "extreme end" uses FPGAs. Obviously if young gen GC pauses aren't an issue, then there's nothing to talk about here but then I'd argue that's not really HFT, although I know the term is quite vague, and is no different than other types of systems. There are other issues with GC and garbage allocations, such as d-cache pollution, but I suppose no need to really discuss them given the type of system you're discussing.

I know you were throwing 250us out there as a pseudo example, but that's actually a very long time even outside of UHFT/MM.

Also don't forget that your trading daemons will be under a fire hose consuming marketdata, so beyond being able to tick-to-trade quickly, you need to be able to consume that stream without building up a substantial backlog (or worse, OOM or enter permanent gapping).


> Tracking the memory here would slow you down.

Meh. A lot of systems use specialized memory pools for such use cases: - allocate memory pages from OS - allocate objects via this pool - release the full pool when done (i.e. just unmap t page, which essentially has no cost at all)

If the task at hand is known to have an upper bound of same sized objects, this basically reduces management overhead to maintaining a single pointer (and you can work with guard pages to just catch the segfault when trying to access out-of-bound memory, which is not that far fetched because it is actually what some JIT'd Java code does to optimize out null checks [0]).

[0] http://jcdav.is/2015/10/06/SIGSEGV-as-control-flow/


> and the JIT can actually make code faster than C++.

How? Demonstrate with real examples


The jit can optimize away virtual function calls.

    ParentClass g = ....might be a subclass...

    for (int i=0;i<100000;i++) {
      g.func(i);
    }
The JIT can optimize away the vtable lookups to find func, and sometimes inline the code. Ok, maybe you could do this in C++ too.

    ParentClass[] g = ....might be a subclass...

    for (int i=0;i<100000;i++) {
      g[i].func(i);
    }
Suppose 99% of g's are the same class. The JIT can optimize away most of the virtual function lookups (particularly if this is one of your hotspots). I.e. the code becomes:

    ParentClass[] g = ....might be a subclass...

    for (int i=0;i<100000;i++) {
      if (g.class == COMMONCLASS) {
        inlined_func(i);
      else {
        g[i].func(i);
      }
    }
In the common case, a simple pointer comparison + inlined code is a lot faster than a function dereferene.


Devirtualization is mostly an issue for Java since everything is virtual by default and the language doesn't have support for compile time monomorphization.

While C++ code does use virtuals, it's nowhere near the amount as Java - there are language constructs to avoid that and move the dispatch selection to compile time.


GCC in fact performs this optimization.


GCC can get runtime information at compile time? That's a truly advanced compiler.


It can in fact make predictions in a number of circumstances. The most common case is FDO (https://gcc.gnu.org/wiki/AutoFDO) - with FDO coverage GCC can easily demonstrate a particular vcall is ~always a particular type and emit exactly the code you describe. (To be clear, this isn't just in theory, but is actually happening in a wide variety of binaries I use.)

This is perhaps not fully "ahead of time", granted, but it's extremely easy to deploy and highly effective, and entirely accessible to C++.


Snark not warranted, profile guidance is a thing. But more likely many of those virtual functions will not be a virtual function in idiomatic C++ for performance oriented design. I have seen that I can get away with compile time polymorphism many a time. For the rest there is compiler devirtualization and profiles.


You could, I guess, annotate C++ child classes as likely, and then perform this optimization. That'd be an extension to the standard of course.


> the JIT can actually make code faster than C++ > How?

A JIT compiler can sometimes beat an AOT compiler because it has more information.

For example, it is entirely feasible for a JIT to heavily optimise a fast path even if the optimised code wouldn't be correct for all cases that the source could be called for. If the JIT detects an uncommon case it can just fall back to the interpreted code.

An AOT compiler will forego optimisations if it can't be sure that it will produce correct code. For example, C++ was generally considered slower than Fortran until restrict was added as the compiler had to be more conservative. However, restrict is, well, restrictive. Conceptually, a JIT could work around this if it had a function that didn't have restrict arguments but was usually called on distinct memory. It could hold an optimised path that assumed restrict but fall back if it detected otherwise.

Now, some of this benefit can be had in an AOT compiler with profile guided optimisations. But usually, the AOT compiler will still tend to the conservative to balance aggression with code bloat.


It principle yes, but in practice? You see a trend with interpreted, then JITed languages eventually getting an AOT compiler, but I can't recall any natively compiled language getting a JIT for performance reasons (LLVM was supposed to allow that, but the JIT hasn't seen much love)


Current Azul Zing. Is trending into a hybrid, storing previously compiled machine code to use upon restart if byte code matches.

I also think that the JVM is the only JIT that has had close to same number of resources pushed at it as some of the Fortran/C/C++ compilers. But to differing markets the JVM focused a lot less on easy to benchmark numeric and actually looks at other types of code.

Then of course there is the hybrid JVM/LLVM using Graal/Sulong [1]. Which hopes to better than JIT or AOT apart.

[1] https://github.com/graalvm/sulong


This mixes two questions: Can java be compiled faster than C++, and can a JIT compiler outperform an AOT compiler. Both java and C++ have problems, C++'s aliasing restrictions are nasty and I'd be surprised if anything java has is as bad.

The second is more dubious... sure, a JIT compiler has information if it spends RAM and cycles on collecting that, but it also has to run quickly and fit in the runtime environment, and an AOT compiler can run arbitrarily slowly, use a whole rackful of servers, and can use PGO without incurring any profile collection costs at (normal) runtime.


Tiered JITs are meant to allow slower and more aggressive optimizations to be done on truly hot code. However, you're right in that they still cannot spend as much time or resources as an AOT compiler.


The jit must be rerunned every time the process is loaded. In image based language like smalltalk the jit state can be saved in the image with performance optimizations. So next time the image is reloaded the jit status is hot.


Yes, Azul has a similar feature (ReadyNow).

This is nontrivial because lots of optimizations depend on class load ordering and runtime profile information.


The JIT has access to the actual run time class hierarchy and can devirtualize method calls. In C++, if a method is virtual, the system can never decide "oh, this method has only one implementation, so let's always call that" because it must statically assume multiple implementations to be present.



Having written plenty of embedded system code myself, I wouldn't want to implement a full-blown application for HFT in C, not to mentioned Fortran or assembly (maybe a network/messaging layer in C/C++, which is very common). Performance is critical but it's not the only major factor, especially when it comes to trading, in which the complexity of the application can quickly grow daunting.


Bugs in HFT code can lose a lot of money very quickly too, so I'd imagine that Java code being less bug prone than C or assembly is also a consideration.


Or crashing, for that matter. If your C HFT program crashes, someone is on the hook for tens of thousands, minimum. (Not saying Java doesn't crash, but it's a bit more rare and gives you more opportunity to not lose $$$.)


You can easily hot-swap classes in a running program as long as their interface doesn't change. I.e., you can tune your algorithm w/o restarting the server.

I went wow the first time I edited a program during a debugging session and Eclipse just recompiled the class and reloaded it into the program.

Yes, there details of getting this right (transferring the state), but nevertheless.. It'd be one heck of a job implementing this _correctly_ in a MT scenario for C and C++. JVM just does "magic" here.


It'd be one heck of a job implementing this _correctly_ in a MT scenario for C and C++.

I think every major debugger can do this for C and C++ now


Does that mean you can do it with live production code?


Doing this with dynamic libs in C/C++ is done frequently in some areas.


Yeah but your program has to be written to do it that way from the start, with a plugin boundary and so on. The JVM can do it for any code, with limitations, but there's a "DCE VM" that is a patched HotSpot. It can do far more kinds of hotswapping.


Maybe the bottleneck is I/O (memory or external), not CPU.


Because not everything needs to be ultra fast. Only the pieces that need to respond quickly to market changes. The vast majority of logic in an HFT isn't this, it's business logic around determining parameters, and so on.


Correctness can be very difficult in Assembler, Fortran, and C; and consider the complexity of the systems they're implementing. Though a lot of it probably has to do with hiring.

They could probably do a lot better with, say, OCaml or Rust. Jane Street uses OCaml for, presumably, this reason.


Jane Street isn't an HFT firm, though I do agree that OCaml would be a much better language than Java (though the lack of manual GC tuning could be a dealbreaker).


Great interview with him from way back in 2007: https://web.archive.org/web/20071020182028/http://theservers...


Azul is still state of the art, certainly for the JVM.


Thanks for the nod! Cliff


It'd be interesting to learn more about what you meant by "there are much better ways to represent Java than bytecodes".

I know you don't do JVM stuff anymore but perhaps one day you could do a talk on how you'd design the JVM differently if you could go back and do it all again.


Right when Java was taking off, there were a bunch of research proposals for better over-the-wire executable formats (mostly funded by microsoft). Designs where you could convert the bytecodes to machine code as fast as L1 could take it with sane 1st-level-JIT code quality. And also have safety, and enough type info the use a 2nd-level JIT for really hot code.


Ah so you meant "java bytecodes", not bytecodes in general. Perhaps like pre-calculating register allocation and such? I'd like to read about these sorts of proposals, do you recall the names of the papers by any chance?


For those looking for some open source HFT Java code, you may be interested in looking at some of the code in OpenHFT [1].

[1]: https://github.com/OpenHFT


There's also this video by Gil Tene on Low Latency Java in the Real World: LMAX Exchange:

https://www.youtube.com/watch?v=6-oS2XHSGvk


this is fascinating. anyone applying these approaches with clojure? i'm wondering what additional considerations (if any) come into scope in the case of other jvm languages


Clojure would be a bad fit because it favors immutable objects (which result in lots of GC) and functional programming, which discourages mutative, iterative programming.


Deterministic behaviour is everything....

...and Clojure is the polar opposite to this due to its structure sharing.


> I had one CTO of a public HFT firm tell me that short of FPGA's Azul's pause-less GC (https://www.azul.com/products/zing/pgc/) was the biggest performance win they'd had in the past few years, note: this was a few years ago.

Even better than using C++? Something sounds off.


I don't know where you got C++ from.

In any case, I assume they stayed with Java as their entire code base was Java with the exception of what ever they program their FPGAs with.


FPGA's are typically used for the real-time front-end in systems like this....

...Which means that the Java side wasn't doing anything particularly quick or real-time.

The Java side was the 'easy&slow' side.... hence they could use Java.


Yeah, I concur with what the other guy says. There is no way they were using Java for the HFT side.


C++ doesn't improve on I/O (including cache misses).


HFT systems aren't normally IO bound.


Sitting at the offices of an HFT firm here.

I think there's more than one kind of fast. There's development speed, and there execution speed. Both are important.

I don't know what qualifies as HFT these days, but a loose definition is people who trade more often than once a day. That's still a whole lot of different people.

At one end, there's textbook, pure footrace arbitrage. You something offered at 100 on one exchange and bid 101 on another. You rush to do them both. This is possibly the most obvious strategy ever, and the only thing that matters is how fast you can get those messages to the relevant exchanges.

On the other end, there's more intricate stuff like statistical arbitrages (a wide category), where there's more than one way to skin a cat. Some principles are known from finance 101, but your implementation will be slightly different to other people's. You still want to execute fast, because the opportunity may not be there forever, but it's not like every time you seen an opportunity you know it's first come, first served.

My sense is that those closer to the former will tend to use c++ over JVM. Most people are not going to have the time to carefully test what's faster, and c++ has the reputation as the thing that will be faster.

Similarly, JVM languages have a reputation for being faster to code in. As strategies get more complex, you need code agility. There's a lot of changing things up when you're writing strategy. Of course, results vary, but if you're building a strategy platform, you probably go for JVM.

But that's just what my gut feeling tells me. I've only met the people I've met, and it's not that easy to find public info on just what people are getting up to. I'd love to hear what kinds of strats are running on Azul.


It's been a while since I was working in HFT, but the HFT firms making the most money were all using Java predominantly. Yeah a lot of firms use C++ but I think that's more because of myth than actual benefits.

Pure simple arbitrage is a race to the bottom very quickly, and it's very difficult to remain profitable with those sort of simple strategies regardless of what language you use. Maybe FPGAs and radio networks could let you make money off of simple arbitrage (someone must be making money from it). C++ isn't going to give much advantage over Java with GC disabled.


I'm at a Denver based HFT firm, we are all Java. We do options market making, triangles on currency futures, calendar spreads, etc.


Can you give some insight into what kind of thing they're doing?


This is interesting

  – (Azul GPGC: 100's of Gig's w/10msec)
  – (Stock full GC pause: 10's of Gig's w/10sec)
  – (IBM Metronome: 100's Megs w/10microsec)
Some how these numbers are hard to come by in Java land where any request for hard numbers receives response like Java GCs are 'generational', 'state of the art', 'best in industry' etc. All the technical arguments while true of course but they do not tell numbers.


I think when comparing numbers like that you might need throughput figures as well. Supposedly there is a tradeoff between GC pauselessness and throughput.


Azul's GC required a read-barrier (some instructions to be executed on every pointer read) - which cost something like 5% performance on an X86. In exchange max GC pause time is something in the low microsecond range (I helped implement the Azul JVM and portions of the GC relating to starting and stopping threads)


A friend of mine that worked for Azul said that older 0x86 processors didn't support the instructions that they needed. Which is why originally they built their own hardware. I'm uncertain what those are, but my impression is you can't do what they needed with a single atomic word access.


There's a talk somewhere (by Cliff again) on how the Azul chips differed from x86. Intel seems really slow to add features useful for high level GCd languages but apparently nowadays x86 has caught up which is why they don't bother making their own chips anymore.


No change to the X86, instead user-mode TLB handler from RedHat allows ptr-swizzeling, that plus some careful planning and the read barrier fell to 2 X86 ops - with acceptable runtime costs. Cliff


Yes, it's being discussed recently https://news.ycombinator.com/item?id=12044044


"Supposedly?" I would have thought inevitably.


Well I've never implemented a GC or built a processor specifically for implementing GC on.


I've seen stock full GC pauses take 10 minutes with 10s of gigs, so there is a lot of variability there


Might be unpopular opinion, but I kinda like to look how people introduce will-be-a-problem-in-the-future things (let's say GC) and than heroically are solving following problems (pause-less GC), it's like Don Quixotes of modern age, except you build windmills yourself.


Priorities. Getting rid of manual memory management can be a worthwhile goal even with slow gc, just like not having expensive runtime memory management automatisms can be a worthwhile goal. Making gc faster (or just more predictable) on top of that is not heroically solving a follow-up problem, it's an optional bonus achievement.


I might be too old fashioned, but the only case where some sort of memory management is pleasant and comfortable - constructing/modifying strings, though even here many languages separate mutable and immutable strings into different beasts.

In all other cases just allocate virtual page from OS (way too old school I understand, people usually don't use this one and believe that C Runtime Library with malloc is the only "approved" way to allocate memory), you get from 4kb to many megabytes of continuous memory - do whatever you want there, and deallocate whole virtual page later on, no leaks, no problems. Sure I do understand that manually managing millions of objects on heap is hard, but managing 1 to 10 virtual pages? it's simple and easy, just forged about generic allocators and use specialized ones (like block allocator for example).

Also it's good to know if you even need to deallocate at all ... for example ninja build system (https://ninja-build.org) simply doesn't deallocate memory after build is done, just because it's too slow to do it (though they do use heap allocations), just kill the app and OS will clean up after you.

PS. On other hand what I'm saying is just a personal rant. Reality is that the industry is mostly moving to tools that don't allow writing unsafe code (for example languages without pointer arithmetic), and this might be very limiting in language expressiveness. There is an another approach - statically proving that unsafe code is safe. This is what "Checked C" is all about, and this is how Windows drivers are working without crashing the kernel (at least BSODs are pretty rare on Windows nowadays). But I do agree that tools for statically proving unsafe code are so non casual that only greybeards are interested in them nowadays :(


I'm probably missing something obvious here, but on page 25 it says:

  Azul Systems has been busy rewriting lots of it
  - Many major subsystems are simpler, faster, lighter
  - >100k diffs from OpenJDK
If Azul Systems' JVM is based on OpenJDK, shouldn't it be open source? OpenJDK is licensed under GPLv2.


OpenJDK is licensed under GPLv2 but it used to be available under commercial licenses as well. Kinda like how MySQL is GPL/commercial dual licensed.


If you were incentivized enough (by being an HFT firm, say), is there anything stopping you from re-implementing the Erlang memory/GC-model on the JVM? I know the JVM has green-threads (somewhere amongst a million other concurrency primitives); but could it be modified to give each green-thread its own heap, and then do background-GC passes of each green-thread heap when that green-thread isn't currently scheduled?

You'd certainly have to write your Java somewhat differently—though you wouldn't necessarily have to move to a full-fledged no-shared-memory message-passing model to see a benefit. (For example, every JVM object could be treated similarly to Erlang's large binaries, where they exist as refcounted objects in a shared heap, and then the ref-handle within the green-thread heap is itself refcounted by the green-thread, so the shared heap only needs to be synchronously updated when a green-thread heap is discovered, on GC, to have released its last ref-ref.)


Each thread does have its own local TLAB in which bump allocations are performed; but you still have the GC effect even if you're using that. Provided you stay entirely within Eden then your GC will just drop the space - but it is preferable here to not generate any garbage in these situations by eg pre-allocating the necessary objects and just doing field writes.



Exist a resource like this for .NET?


> is there anything stopping you from re-implementing the Erlang memory/GC-model on the JVM

Apparently not, never used it, but impressive: https://github.com/trifork/erjang


I am curious if someone could shed some light on page 15 - "Illusion: Consistent Memory Models", specifically:

X86 is very conservative, so is Sparc Power, MIPS less so IA64 & Azul very aggressive

I dont't thing I've heard the term "conservative memory model" before, in what sense are x86 and SPARC conservative?

Also why is Azul mentioned as an ISA?


The X86 memory model is "conservative" in the sense that it puts strong restrictions on the order of observed loads and stores between cores. Azul and IA64 did not. Hence "sloppy" code - but technically correct (at least it would compile) might run on an X86 and break on the IA64. Cliff


The usual term is strong versus weak memory models. In x86 and SPARC (TSO), the only reordering in hardware that you can do is having reads complete before pending writes--all writes happen in order, all reads happen in order, and a pending read must complete before a pending write can. Other architectures allow much more reordering at the hardware level.

If you know the C++11 memory model, x86 and SPARC (TSO) basically make all memory loads implicitly atomic release/acquire. In contrast, most machines follow something akin to release/consume (the DEC Alpha famously has a basically broken memory model that allows *p to complete before p).


Ah OK, strong vs weak I am familiar with, I think that's probably a more idiomatic way to phrase that then. Interesting bit of history on the DEC Alpha. Thanks.


Azul has their own hardware for facilitating pauseless garbage collection with their JVM.

I recall they have a pure software pauseless JVM now as well, but it's not nearly as high performing.


Azul hardware had a very large core count of low-performing cores. If you had enough parallelism then it was hard to beat - but most applications didn't have enough parallel work, so the market wasn't big enough


I… just got corrected by Cliff Click~! (You're like my language implementation hero (along with Mike Pall).)

Vega sounds like it would have a nice niche in network processing. Shame it didn't find a big enough market. It was pretty interesting from what I read.


Thanks, I had no idea they started with hardware. For anyone else not that familiar it looks like Vega is the hardware appliance and Zing is the VM version.


I have board with 2 Vega chips in my living room - 24 cores each; each a full 64-bit RISC with 32 registers and ie754 FP math, read & write barriers in hardware; inline-cache virtual calls in hardware, and transactional memory. Later chip versions had 48 and then 54 cores, and you could jam 16 chips in a cache-coherent shared memory box - with top-10 supercomputer bandwidth. But you needed 54*16 parallel working threads to use it all. :-) Cliff


So in the bit of reading I've done it seems like the idea of read and write barriers in hardware were what allowed them to have puase-less GC and that was the primary reason for Azul producing hardware appliances. Is that accurate? At least initially?

What did that board cost you if you don't mind my asking? What does it cost you to run it is maybe a better question.


It was cheaper to run (power, heat, space) than the equivalent pile of X86's of that era and for years to come. Each box cost between $250k and $750k depending on cores & memory. You didn't buy one unless you had a specific scaling problem that scale-out x86's weren't cutting it.


These slides are great! "Standing on the shoulders of giants..." If time-to-market is important, JVM is the way to go... Manually managed memory is for low-level/OS software, GC is for the rest (apps)... with some exceptions...


(2011)

Good slides anyway. Anyone know of a recording of the corresponding talk?



Thanks! Guess I should have searched on youtube for the title - I searched on google for the conference mentioned in the slides and didn't find it.


I'm curious about going above 32GB of heap size, heard that compressed pointers kill performance?


Going above 32G doesn't kill performance per se, it just means that there is not point in using heap sizes between 32G and 48G due to the larger pointers. If you go above that, it's all good again.


There's another talk that you might want to watch if you liked these slides:

https://www.youtube.com/watch?v=vzzABBxo44g Bits of advice for VM writers.


Is this assertion really true?

Class files are a lousy way to describe programs

Class files seem alright to me


The JVM's memory model was something amazing when I first encountered it.


Thank you pas for this link (via https://news.ycombinator.com/item?id=12025929 )


What does this mean?

  Large chunks of code are fragile (or very 'fluffy' per line 
  of code)


It means what it says it means? "fragile" code is code that can't be touched because touching it causes it to "break" - exhibit bugs that can't easily be fixed. "fluffy" code is bulky code: code that says in 1000 lines what could better be said in 10 lines. Cliff


What's your view on Graal? Is that making headway on de-fluffing and robustifying the code or does this not really tackle the issues you had in mind?


Thanks. "Fragile" made sense, "fluffy" was new to me.


[flagged]


We detached this subthread from https://news.ycombinator.com/item?id=12051639 and marked it off-topic.


HFT reduces the spread, which is actually great for consumers. It also pushes the frontiers on high speed scalable computing out and produces a ton of great research. Finally, it is great for the tech scene in NYC.


> Finally, it is great for the tech scene in NYC.

I dunno, the fact that I have to sell my soul to finance to find a decent tech job doesn't speak particularly well to its health.


> I dunno, the fact that I have to sell my soul to finance to find a decent tech job doesn't speak particularly well to its health.

As opposed to selling your soul to the advertising industry. At the end of the end of the day, unless you are independently wealthy or live in a very low cost area, you're gonna have to make peace with Adam Smith's invisible hand.


As opposed to not having a tech job at all?


> HFT reduces the spread

... by chopping off the difference and putting it in the pocket of the HFT, right? I mean, without them, the earnings would go to the pocket of more loosely programmed organizations (=probably, our banks).


Without market makers the trade wouldn't happen at all.


What would happen if trades were discontinuous? Like, place orders and they're processed once every hour? If some issues are worth x$ to your eyes, the company doesn't actually get or lose value by the minute, it's just that the public isn't aware of corporate facts yet, and as information becomes complete on the traders' side that they decide that issues are worth a bit less or a bit more for them. So if we traded only once per hour, wouldn't the stock exchange provide the same value to the economy, while leaning off a good part of the finance industry who rides on minute-to-microsecond trends?


There was trading going on before HFT was a thing.


But there were market makers, and they charged a lot more for the same service.


I find this description weird. It's not like they provide a service. They act for their own profit and the fact that they make the spread smaller is a bit of a side effect. Why call that a service? You wouldn't pay them directly to do those trades.

Sure, you'd spend more on your buy orders otherwise. So what? It just makes the stock gambling a bit different.


>They act for their own profit

Lol, as opposed to all the other actors in the economy who provide a service.

>and the fact that they make the spread smaller is a bit of a side effect

It's not 'a bit of a side effect', it is what they have to do to earn fills or they will be undercut by their competition. Narrowing the spread is literally what a market maker is paid to do. It is their reason for existence. It isn't a 'side effect'. If you make a spread narrower than your competition, you steal his customers. Otherwise you don't trade and you don't make any money and you are out of business.

> Why call that a service?

It is a service. A very useful one. They offer, for a specified price, to sell (or buy) a financial instrument to (or from) you.

No one forces you to use it. If you don't want to, and think you can do better, you can take your chances and place a limit order in the book instead.

So like all services, it is completely optional, costs you money (to cross the spread), and provides you a benefit (immediate execution and certainty of price).

I suspect what you are struggling with, is that you don't realise that risk transferral is a benefit. Take for example, the dairy farmer who sells his milk to a company (a middleman) that transports it, pasteurizes it, bottles it and on-sells it to a supermarket chain. In this case, you would probably tell me that you can see what service they provide - they add value by pasteurizing and bottling the milk, which they specialize in. Which is true. But there is a hidden value too - they take risk away from the farmer. The milk could spoil, the trucks could breakdown, the vats be contaminated, the supermarkets cancel their orders. All of these are risks that the farmer would have to bear if the farmer wanted to bottle their own milk, take it to a market and sell it. Part of the service provided is to take risk away from the farmer and be paid for it.

That is all a market maker does. They offer you the opportunity to dispose of your risk (the risk that the financial instrument you hold may move against you in the time it takes you to find a buyer or seller) instantly by paying a very (very) small fee to do so.


> Lol, as opposed to all the other actors in the economy who provide a service.

I see what you mean and there may not even be a visible difference from outside.

But I think there's a big difference between "I'm going to make money getting grocery products as cheap as possible and selling them as high as local market can handle it" and "I'm going to open a grocery store in an area which doesn't have it yet and make money providing products to people". Effectively they're the same and some people may think of it as the first case.

I just don't believe anyone doing HFT cares one bit about the rest of the market. They are in it to make money. It just happens to lower the spread. If there was a way to do it by making the spread larger and pissing everyone else off, they'd still do it.

Different example in real life in some countries - homeless people collecting glass bottles from the street so they can return them and get some change back in countries where recycling cost is added to the price by default. You're paying for it when you're buying a drink and not returning the bottle yourself. They get money for cleaning the street. But it just happens to be connected - I wouldn't say you're paying the homeless for the street cleaning service.


> I wouldn't say you're paying the homeless for the street cleaning service.

I certainly would. Are you arguing just because the government is interposed in the process that it is no longer a service you are paying for?

Ok, let's try a different angle. Imagine a world where financial markets aren't electronic and brokers don't exist. You have 5 Apple shares that have a fair value of $500. The only way to sell your shares in this fantasy world is to physically find someone who wants them. Maybe you call people on the phone, maybe you spam some e-mails. You get the full $500 if you find a buyer, but it takes labour and time. The labour is costing you, since you could be doing something productive that you are more skilled at. And the time is costing you, because you want the $2500 now to pay the rent, and if Apple shares plunge you could be in trouble!

So I, for the first time ever, get the idea to set up a lovely looking shop on the corner and offer to buy them from you for $499 each, and I'll handle the rest. You come in with your share certificate, we shake hands, make small-talk, exchange money. Surely you agree this is a service?

So I am slightly baffled. All that has changed is that the process outlined above has been made electronic, and now you claim it isn't a service? Do shops selling shit on ebay no longer qualify as services?


Oh yes. I was just pointing out that there has always been actors having that role and that HFT wasn't special in that regard. They are simply better at providing the service.


Relevant:

JavaCPP, the glue code that acts as a bridge to C++/C

https://github.com/bytedeco/javacpp

JavaCPP's creator works with Skymind, and we use it for our open-source deep learning framework, Deeplearning4j

https://github.com/deeplearning4j/deeplearning4j/

We're push the limits of the JVM (and we love Cliff's work!).


JavaCPP does not compile Java bytecode to C++. It is a Java to C/C++ bridge ala JNA, just better. Major difference.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: