fun fact: the original LMAX was designed for and written in Java. https://martin...

sneilan1 · 2024-07-08T22:51:41 1720479101

I think it made sense at the time. From what I understand, you can make Java run as fast as C++ if you're careful with it and use JIT. However, I have never tried such a thing and my source is hearsay from friends who have worked in financial institutions. Then you get added benefit of the Java ecosystem.

nine_k · 2024-07-08T23:00:48 1720479648

From my hearsay, you absolutely can, given two things: fewer pointer-chasing data structures, and, most crucially, fewer or no allocations. Pre-allocate arrays of things you need, run ring buffers on them if you have to use a varying number of things.

A fun but practical approach which I again heard (second-hand) to be used, is just drowning your code in physical RAM, and switch the GC completely off. Have enough RAM to run a trading day, then reboot. The cost is trivial, compared to spending engineering hours on different approaches.

ramchip · 2024-07-09T00:26:35 1720484795

I worked in trading and we did the first one, in C++. We'd load all the instruments (stocks etc.) on startup to preallocate the "universe", and use ring buffers as queues. Instruments don't change during trading hours so restarting daily to pick up the new data is enough.

I saw a Java team do the second one in an order router (a system that connects to various exchanges and routes+translates orders for each exchange's requirements), and they wrote an interesting retrospective doc where they basically said it wasn't worth it - it caused a lot of trouble without giving a significant edge in performance. YMMV! That was around 2012.

bb88 · 2024-07-09T01:03:40 1720487020

I honestly don't know why the real time trading devs don't make their own OS/programming language for this. It's not like they don't have the money.

nine_k · 2024-07-09T06:40:30 1720507230

Making your own OS or language is hard, if you care about both performance and correctness.

But HFT people do a lot of OS-level hacking, squeezing the last bits of performance from the kernel where the kernel is needed, and/or running parts of the kernel stack (like networking) in userspace, avoiding context switches. CPU core pinning, offloading of everything possible to the network cards, etc, goes without saying.

pjmlp · 2024-07-09T11:29:21 1720524561

That is basically what they do when deploying into FPGAs.

bostik · 2024-07-09T05:43:05 1720503785

> fewer or no allocations

Quite fitting in a thread about HFT that has already referenced game development as a parallel universe of similar techniques.

In the feature phone era, mobile phone games were written in Java (well, a subset: Java EE). Practically all games followed the same memory management strategy. They allocated the memory they needed during the early initialisation, and then never again during the actual runtime of the game. That was the only way to retain even a semblance of performance.

sneilan1 · 2024-07-09T00:10:24 1720483824

So literally remove as much allocation as possible and then reboot each day. That makes a lot of sense to me!

bb88 · 2024-07-09T01:01:00 1720486860

All the java libs that you use can never do an allocation -- ever!. So you don't really get that much benefit to the java ecosystem (other than IDE's). You have to audit the code you use to make sure allocations never happen during the critical path.

Fifteen years ago, the USN's DDX software program learned this the hard way when they needed a hard real time requirement in the milliseconds.

throwaway2037 · 2024-07-09T01:29:44 1720488584

In my experience: Allocation is OK, but garbage collection is bad.

bb88 · 2024-07-09T02:34:17 1720492457

I think back then GC defaulted running potentially at allocation.

shared_ptr is a much better solution for garbage collection. One I wish that java had implemented.

throwaway2037 · 2024-07-09T08:18:57 1720513137

    > shared_ptr is a much better solution for garbage collection. One I wish that java had implemented.

I'm pretty sure there is a large body of (computer science) research work around the topic of deterministic (reference-counted) vs non-deterministic (non-reference counted) garbage collection. There are lots of pros and cons for both sides. Also, I find it interesting that Java, C#, and GoLang all chose non-deterministic GC, but Perl and Python use deterministic GC. (I'm not sure what Ruby does.)

josefx · 2024-07-09T05:31:53 1720503113

I think CPython does reference counting for its memory management, it still has to run a GC since reference counting does not handle reference cycles.