Note that the compile times of julia are not included in the benchmarks. If you read the website, you'd seen that the grapsh show the first (excluding the compilation) and the second run (with hot cache).
Also in the second run, julia is not the fastest. Julia would not be faster than Rust, its got a garbage collector. This is what you see in the join benchmarks that really push the allocator.
Next to that, the databases run in in-memory mode, so there is not disk overhead. Spark is slower because JVM + row-wise data.
> Note that the compile times of julia are not included in the benchmarks. If you read the website, you'd seen that the grapsh show the first (excluding the compilation) and the second run (with hot cache).
Here's my view: The author of that page has commented here on HN; If my claim was so outrageously wrong as you claim, he would've corrected it.
as mentioned in that thread, GC and strings, or especially a combination of the two, can be very much a downer in terms of julia performance. That's actually pretty surprising since strings are often as important if not more important than numbers for a lot of data processing needs.
I'd also say in terms of compilation time, some autocaching layer outside of precompilation would do wonders.
Also in the second run, julia is not the fastest. Julia would not be faster than Rust, its got a garbage collector. This is what you see in the join benchmarks that really push the allocator.
Next to that, the databases run in in-memory mode, so there is not disk overhead. Spark is slower because JVM + row-wise data.