I think it made sense at the time. From what I understand, you can make Java run as fast as C++ if you're careful with it and use JIT. However, I have never tried such a thing and my source is hearsay from friends who have worked in financial institutions. Then you get added benefit of the Java ecosystem.
From my hearsay, you absolutely can, given two things: fewer pointer-chasing data structures, and, most crucially, fewer or no allocations. Pre-allocate arrays of things you need, run ring buffers on them if you have to use a varying number of things.
A fun but practical approach which I again heard (second-hand) to be used, is just drowning your code in physical RAM, and switch the GC completely off. Have enough RAM to run a trading day, then reboot. The cost is trivial, compared to spending engineering hours on different approaches.
I worked in trading and we did the first one, in C++. We'd load all the instruments (stocks etc.) on startup to preallocate the "universe", and use ring buffers as queues. Instruments don't change during trading hours so restarting daily to pick up the new data is enough.
I saw a Java team do the second one in an order router (a system that connects to various exchanges and routes+translates orders for each exchange's requirements), and they wrote an interesting retrospective doc where they basically said it wasn't worth it - it caused a lot of trouble without giving a significant edge in performance. YMMV! That was around 2012.
Making your own OS or language is hard, if you care about both performance and correctness.
But HFT people do a lot of OS-level hacking, squeezing the last bits of performance from the kernel where the kernel is needed, and/or running parts of the kernel stack (like networking) in userspace, avoiding context switches. CPU core pinning, offloading of everything possible to the network cards, etc, goes without saying.
Quite fitting in a thread about HFT that has already referenced game development as a parallel universe of similar techniques.
In the feature phone era, mobile phone games were written in Java (well, a subset: Java EE). Practically all games followed the same memory management strategy. They allocated the memory they needed during the early initialisation, and then never again during the actual runtime of the game. That was the only way to retain even a semblance of performance.
All the java libs that you use can never do an allocation -- ever!. So you don't really get that much benefit to the java ecosystem (other than IDE's). You have to audit the code you use to make sure allocations never happen during the critical path.
Fifteen years ago, the USN's DDX software program learned this the hard way when they needed a hard real time requirement in the milliseconds.
> shared_ptr is a much better solution for garbage collection. One I wish that java had implemented.
I'm pretty sure there is a large body of (computer science) research work around the topic of deterministic (reference-counted) vs non-deterministic (non-reference counted) garbage collection. There are lots of pros and cons for both sides. Also, I find it interesting that Java, C#, and GoLang all chose non-deterministic GC, but Perl and Python use deterministic GC. (I'm not sure what Ruby does.)
https://martinfowler.com/articles/lmax.html