Hacker News new | past | comments | ask | show | jobs | submit login

I work in HFT and now use Julia for all my research, and a couple of my colleagues now do too. Personally I'd rather retire and farm goats than go back to having to write Python professionally: there's just soo much that can go wrong that doesn't happen in a typed language, so much unnecessary stuff you have to keep in your head when coding. It also seems incredibly counterproductive to use a language that's 100x slower than necessary just because it's the only language some people know; the difference in research speed between having to wait one second for a result and ten minutes is massive.

Of course, HFT is a somewhat different usecase than pure ML, as we work with data in a format that's rarely seen elsewhere (high-frequency orderbook tick data). Python's probably less painful for working with data for which somebody else has already written a C/C++ library with a nice API, as then you don't need to write your own C/C++ library and interface with it. My choice is either: write Python, and the research will take 100x longer, write Python and C++, and the development will take 2-4 times longer, or write Julia, and get similar performance to C++ with even faster development time than Python.




Is it just performance that puts you off Python? If so, did you try writing a native extension to accelerate it?

Where i work, we also have analysis work which involves sequentially reading gigabytes of binary data. We came up with a workflow where a tool written in Java runs a daily job to parse the raw data and write a simplified, fixed-width binary version of it, then we have a small Python C extension which can read that format. We can push a bit of filtering down into the C as well.

This has worked out pretty well! We get to write all the tricky parsing in a fast, typesafe language, we get to write the ever-changing analytics scripts in Python, and we only had to write a small amount of C, which is mostly I/O and VM API glue.

The Java and C parts were both written by an intern in a month or two. He's a sharp guy, admittedly, but if an intern can do it, i bet an HFT developer could too.


>Is it just performance that puts you off Python? If so, did you try writing a native extension to accelerate it?

This is what I did originally, but it was way slower to have to write and maintain C++ and a Python interface for it than to just write Julia. Particularly because the some of the business/trading logic basically has to be in the native layer (can't backtest a HFT algo without running it on HFT data, and that volume of high frequency tick data is too slow to process in pure Python).


One of the close colleagues I was alluding to also works in HFT and they do in fact use Cython libraries built in house for extremely low latency applications, order book processing, etc.

Their main alternatives are coding in C++ directly or wholesale switch to Rust, but they prefer Cython. I know they evaluated Julia and found it entirely intractable to use for their production systems.


I'm curious why they found Julia intractable. In my experience it's much quicker to write than Rust, C++ or Cython. It's also much more expressive than Cython.

Is it because they tried embedding it in C++? That can be painful, because it needs its own thread, and can only have one per process, but it's certainly doable.


I’m not sure what you mean by saying Julia is more expressive than Cython, given that Cython is as expressive as C.

In this shop’s particular situation, it’s mostly the switching costs to Julia that cause it to lose the debate. The firm has lots of systems software, data fetchers, offline analytics jobs, research code, etc. With Python & Cython, they easily write all of it in one ecosystem, build shared libraries that span all these use cases, rely on shared testing frameworks, integration pipelines, packaging, virtual envs, etc.

If Julia offered some kind of crazy game changer advantage that required a huge amount of effort to get in Python/Cython, they might consider breaking off some subsystem that has to have new environment management, new tooling, etc., and is not sharable across as many use cases.

But there is no such case. They might get some sort of “5% more generic” or “5% benefit from seamless typing instead of a little rough around the edges typing in Cython”, and these differences would never justify the huge costs of switching or the missing third party packages that are heavily relied on.

I always like to remind people that in any professional setting, ~95% of the software you write is for reporting and testing, and 5% at best is for the actual application. Out of that 5%, another 95% never has serious resource bottlenecks and taking care to write super careful optimized code for the 5% of the 5% can be done in nearly any language. Choose your ecosystem based on what best solves your problems in that other 99.75% of cases.

This is especially true in HFT and quant finance, which is why so many of those firms use Python for everything except the 0.25% of the code where performance is insanely critical, they just use anything that super easily plugs into Python, usually C++ or Cython.


>I’m not sure what you mean by saying Julia is more expressive than Cython, given that Cython is as expressive as C.

I mean expressive in the sense of how much you can get done per unit code/time. Perhaps a better way of phrasing it: for most problems X that I encounter in my work, I can write code in Julia to solve X faster than I could write C/C++ to solve x, and also faster than I could write Cython to solve x. Excellent type inference is a big part of this, along with macros, multiple dispatch, and libraries designed with performance in mind (e.g. https://juliacollections.github.io/DataStructures.jl/latest/...).

>In this shop’s particular situation, it’s mostly the switching costs to Julia that cause it to lose the debate. The firm has lots of systems software, data fetchers, offline analytics jobs, research code, etc. With Python & Cython, they easily write all of it in one ecosystem, build shared libraries that span all these use cases, rely on shared testing frameworks, integration pipelines, packaging, virtual envs, etc.

>I always like to remind people that in any professional setting, ~95% of the software you write is for reporting and testing, and 5% at best is for the actual application. Out of that 5%, another 95% never has serious resource bottlenecks and taking care to write super careful optimized code for the 5% of the 5% can be done in nearly any language. Choose your ecosystem based on what best solves your problems in that other 99.75% of cases.

That makes sense then. In my firm at least (and in my team at least) the case is different: we're mostly full stack, so each member will be responsible for the whole pipeline from research->model_development->backtesting->production_algo_development->algo_testing/initial_trading. In this case 95% of my time is spent writing research code, running research, and writing production code, so if I can double the speed at which my research code runs or double the speed at which I can write it, that translates into a massive increase in my productivity/output.


Did you find something better than tensorflow+python by any chance? I'm desperately looking for something that is mature, stays in loop and does not require me to touch python.


Depends what you're trying to do, but Flux.jl is pretty nice: https://github.com/FluxML/Flux.jl . Failing that, the Julia Python FFI is very good, so it's possible to use PyTorch almost seamlessly in Julia (I previously used Tensorflow 1.x, and it was such a painful experience I'm not brave enough to touch 2.0).


Thanks. 2nd day playing with it and I guess now I'm hooked up on Julia.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: