Julia 1.6 Highlights

Buttons840 · on March 26, 2021

I recently ported a reinforcement learning algorithm from PyTorch to Julia. I did my best to keep the implementations the same, with the same hyperparameters, network sizes, etc. I think I did a pretty good job because the performance was similar, solving the CartPole environment in the a similar number of steps, etc.

The Julia implementation ended up being about 2 to 3 times faster. I timed the core learning loops, the network evaluations and gradient calculations and applications, and PyTorch and Julia performed similar here. So it wasn't that Julia was faster at learning. Instead it was all the in-between, all the "book keeping" in Python ended up being much faster in Julia, enough so that overall it was 2 to 3 times faster.

(I was training on a CPU though. Things may be different if you're using a GPU, I don't know.)

gdpr · on April 1, 2021

Similar experience over here. (G)ARCH models are severely underserved in Python, and I could not be bothered to learn a Probabilistic programming abstraction like Pyro or Stan just to build a quick prototype myself.

Chose Julia instead. Took 4 hours to get everything sorted out (including getting IT to allow Julias package manager to actually download stuff) and have the first model running just putting a paper into code. Since code is just writing the math, this is a vast communication improvement.

After fiddling around withit at home for a week, this was the first professional experience and I'm blown away.

wiz21c · on March 26, 2021

could you tell us more ? It looks like a very in depth / interesting benchmark

Buttons840 · on March 27, 2021

I will make a blog post about it.

stellalo · on March 26, 2021

That’s interesting: did you use Flux?

Buttons840 · on March 26, 2021

Yes. I used Flux.

beeforpork · on March 25, 2021

Julia is such a wonderful language. There are many design decisions that I like, but most importantly to me, its ingenious idea of combining multiple dispatch with JIT compilation still leaves me in awe. It is such an elegant solution to achieving efficient multiple dispatch.

Thanks to everyone who is working on this language!

chalst · on March 26, 2021

Julia is the first language to really show that multiple dispatch can be efficient in performance-critical code, but I'm not really sure why: JIT concepts were certainly familiar to implementors of Common Lisp and Dylan.

chalst · on March 26, 2021

I guess the reason is that Julia's type system and standard libraries really guide users to use types that the JIT can unbox as far as possible.

skohan · on March 25, 2021

What does it mean exactly? Or what is novel here?

socialdemocrat · on March 26, 2021

The combination. E.g multiple dispatch without JIT would be really slow as you are picking a method to run at runtime based on the type of all the function arguments.

That requires a linear search through a list of all possible combinations of input arguments.

In a single dispatch language like most object oriented languages, you can do a simple dictionary/hash table lookup. Much faster.

With the JIT Julia is able the optimize away most of these super slow lookups at runtime. Hence you get multiple dispatch for all functions but with fantastic performance. Nobody had done that before.

mbauman · on March 26, 2021

FWIW, Julia does segment its method tables into multiple layers depending upon size and type. Multiple dispatch is a strict superset of single-dispatch, and indeed the first layer is just a dictionary/hash table lookup on the first argument. If there's only one result there, you're done (and have the same ~cost for the same ~complexity).

socialdemocrat · on March 27, 2021

Thanks, I didn't know that! Has it always been like that? I wondering where I got the idea that it was always a linear search from? Maybe that is just the conceptual way of explaining it.

ddragon · on March 25, 2021

This video is good explaining the idea behind multiple dispatch in Julia if you have time:

https://www.youtube.com/watch?v=kc9HwsxE1OY

JulianMorrison · on March 25, 2021

It's like C++ template specialisation, but it happens when the compiler realises you need a particular version. Which may be at runtime, if you changed something.

dan-robertson · on March 25, 2021

Except the language can choose from suitable templates (eg instead of a generic matrix multiply template for floats, it can use a library like LAPACK) and does so in a systematic way.

It also has a feature (I can’t recall the name) which is a bit like fexprs (let’s say macros who’s inputs are the types of the arguments of a function) that can generate customised code (eg an FFT depending on the input size) on the fly.

gugagore · on March 25, 2021

I believe you're thinking of https://docs.julialang.org/en/v1/manual/metaprogramming/#Gen...

(but I don't find it helpful to compare to fexprs, which I think of as more about deferring evaluation, whereas generated functions are about "staged programming".)

pjmlp · on March 25, 2021

I advise you to check Common Lisp CLOS and Dylan.

eigenspace · on March 25, 2021

What the OP is talking about is julia's method-based JIT strategy coupling very well to multiple dispatch.

JIT is not new, multiple dispatch is not new, and multiple dispatch + JIT also isn't new, but nmo existing langauges combined them in a way that allows for the fantastic, efficient devirtualization of generic methods that julia is so good at.

This is why things like addition and multiplication are not generic functions in Common Lisp, it's too slow in CL because the CLOS is not able to efficiently devirtualize the dispatch. In julia, everything is a generic function, and we use this fact to great effect.

CLOS and Dylan laid a ton of important groundwork for these developments, but they're also not the same.

fiddlerwoaroof · on March 25, 2021

It’s not true that CLOS generic dispatch is slow: Robert Strandh and other have done a bunch of work showing that it’s possible to implement it efficiently without giving up the dynamic redefinition features that make CLOS such a nice system. There’s at least one video game project (Kandria) that’s been funding an implementation of these ideas so that generic functions can be used in a soft real-time system like a video game.

The really nice thing about CLOS, though, is that the meta-object protocol lets you choose an implementation of OOP that makes sense for your use-case.

eigenspace · on March 25, 2021

Yes, I don't doubt at all that it's possible to make CLOS dispatch fast. What I'm saying is that because historically people using CLOS had to pay a (often negligible) runtime cost for dispatch, it limited the number of places developers were willing to allow generic dispatch.

Julia makes the runtime cost of (type stable) dispatch zero, and hence does not even give julia programmers an *option* to write non-generic functions (though it can be hacked in like with FunctionWrappers.jl). I'm not familiar with Strandh's work, but has it made the overhead of generic functions low, or has it completely eliminated it?

Another thing I'll mention is that Julia's type system is parametric, and we allow values (not just types) in our type parameters which is immensely useful for writing generic high performance code. You can specialize methods on matrices of complex integers separately from rank 5 arrays of rational Int8s for instance. This is not a capability that CLOS or Dylan has as far as I'm aware, though the common refrain is that you can do it with macros, but that neglects that it's rather hard to get right, and will have limited use because such macro implementations of type parameters won't be ubiquitious.

________________________________

To be clear though, I'm not hating on Common Lisp or CLOS. The Common Lisp ecosystem is awesome and can do all sorts of really cool things that I wish we had in julia. I'm mostly just pushing back on the notion that Julia doesn't do anything new or interesting.

cbkeller · on March 26, 2021

In this context, I think there might be an argument to be made that Julia is to multiple dispatch (or multiple dispatch + JAOT) as the iPhone is to “touchscreen computers that can make phone calls”.

It’s not that it’s the first, but it seems to be the first where the use of multiple dispatch throughout the community was sufficiently pervasive to kick-start the emergence of the strong network effects we’re now seeing w/r/t composability.

I would not be surprised to see more languages working to emulate this kind of combination of multiple dispatch and JAOT compilation in the future.

pjmlp · on March 26, 2021

Except everyone is forgetting that I also mentioned Dylan, from Apple, and whose goal was to be a system programming language for the Newton OS, with the Dylan team winning over the C++ one, but internal politics made the decision to go with the outcome of the C++ team alongside NewtonScript.

eigenspace · on March 26, 2021

I directed most of my comments towards CL because I know more about it than Dylan. My understanding is that Dylan lacks parametric types, so that comment can be straightforwardly applied to Dylan. IMO Parametric types are a really important part of this.

Regarding performance, I don't know much about this in Dylan. Was Dylan able to completely remove the runtime overhead of multiple dispatch for type stable code?

pjmlp · on March 26, 2021

Yep, some research was done in that domain, https://opendylan.org/documentation/publications.html

Also note that it was competing against C++ as the Newton OS system programming language, it only lost due to politics.

https://news.ycombinator.com/item?id=15107367

eigenspace · on March 26, 2021

Yes, I fully agree it's a darn shame that Dylan was abandoned by Apple.

cbkeller · on March 26, 2021

I think Dylan would be (perhaps ironically) the Newton in this analogy. (or maybe General Magic?)

Pioneering and ahead of its time in many ways, but for whatever reason it seems that the use of multiple dispatch in Dylan seems to have not (yet?) led to the same level of ecosystem-wide composability.

ddragon · on March 25, 2021

Languages with multiple dispatch aren't rare, but a language having it as the core language paradigm, combined with a compiler capable of completely resolving the method calls during compile time, and therefore able to remove all runtime costs of the dispatch, and a community that fully embraced the idea of creating composable ecosystems is something unique to Julia. I don't think anyone has scaled multiple dispatch to the level of Julia's ecosystem before.

pjmlp · on March 25, 2021

Depends, most use SBCL instead of paying for Allegro or LispWorks, so the perception is skewed.

dan-robertson · on March 25, 2021

Common lisp’s typesystem is just not really as useful for this sort of thing. In particular it doesn’t have parameter used types so you can’t make eg a matrix of complex numbers. This breaks (1) a lot of the opportunity for optimisation by inlining (because you can’t assume that all the multiplications in your matrix{float} multiplication are regular float multiplications) or generic code (because you can’t have a generic matrix type and need a special float-matrix); and (2) opportunities for saving memory with generic data structures because the types must be associated to the smallest units and not the container (eg every object in a float matrix must be tagged with the fact that it is a float because in theory you could put a complex number in there and then you’d need to know to do a different multiplication operation).

I guess you could try to hack together some kind of templating feature to make new type-specific classes on the fly, but this won’t work well with subtyping. Your template goes system could probably have (matrix float) as a subclass of matrix, but not of (matrix real) or (matrix number). I think you’d lose too much in Common Lisp’s hodge-podge type system.

A big innovation of Julia was figuring out how to make generic functions and multiple dispatch work in a good way with the kind of generic data structures you need for good performance. And this was not a trivial problem at all. Julia’s system let’s you write generic numeric matrix code while still having float matrix multiplication done by LAPACK, which seems desirable.

The other thing is that Julia is a language where generic functions are a low-level thing all over the standard library whereas Common Lisp has a mix of a few generic functions (er, documentation is one; there are more in cltl2), a few “pre-clos” generic functions like mathematical functions, sequence functions and to some extent some array functions, and a whole lot of non-generic functions.

pjmlp · on March 26, 2021

Hence why I also referred Dylan, which was designed as a system programming language for the Newton.

iib · on March 25, 2021

Wikipedia has a nice table [1] on the Multiple Dispatch page, that describes one studies' findings about the use of multiple dispatch in languages supporting it, in practice.

Although CLOS and others do support it, Julia seems to take the cake by most metrics, highlighting that it is a core paradigm of the language, more so than in the others.

teleforce · on March 28, 2021

Even better check Stanza language for modern version and interpretation of Lisp, Scheme and Dylan. It supports multi-method/multiple dispatches, hybrid dynamic and static typing, high and low level programming to name a few productive features.

[1]http://lbstanza.org/

MisterBiggs · on March 25, 2021

I've been running the 1.6 release candidates, and the compilation speed improvements have been massive. There have been plenty of instances in the past where I've tried to 'quickly' show off some Julia code, and I end up waiting ~45 seconds for a plot to show or a minute for a Pluto notebook to run, and that's not to mention waiting for my imports to finish. It's still slower than Matlab for the first run, but it's at least in the same ballpark now.

peatmoss · on March 25, 2021

In terms of “don’t make me think about why Julia is fast but feels slow for casual use” this release is going to be a game changer.

I just did a “using Plots” in 1.6.0, and it was fast enough to not care about the delta between Plots and, say, R loading ggplot.

Huge kudos to the Julia team.

sieste · on March 26, 2021

I agree, this is a game changer. Previously time to first plot (TTFP) was >1 minute for me, which made julia completely unusable for my day-to-day exploratory data analysis, visualisation, quick random number experiments etc. Now TTFP is less than 10 seconds. I'm now ready (and excited) to jump ship from R and python!

zzleeper · on March 26, 2021

That was also a huge problem for me. I sometimes open Stata/R/Python/Excel, run a few snippets, and I'm done.

All four take seconds to start (or even less), and Julia felt like a huge step back in productivity.

If it's really fixed, might be good trying again.

dunefox · on March 26, 2021

Admittedly not a huge dataset but the TTFP is significantly shorter. :P

       _       _ _(_)_     |  Documentation: https://docs.julialang.org
      (_)     | (_) (_)    |
       _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
      | | | | | | |/ _` |  |
      | | |_| | | | (_| |  |  Version 1.6.0 (2021-03-24)
     _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
    |__/                   |
    
    (@v1.6) pkg> add Plots
        Updating registry at `~/.julia/registries/General`
        Updating git-repo `https://github.com/JuliaRegistries/General.git`
       Resolving package versions...
       Installed Qt_jll ─ v5.15.2+3
      Downloaded artifact: Qt
        Updating `~/.julia/environments/v1.6/Project.toml`
      [91a5bcdd] + Plots v1.11.0
        Updating `~/.julia/environments/v1.6/Manifest.toml`
      [ede63266] ↑ Qt_jll v5.15.2+2 ⇒ v5.15.2+3
      Progress [========================================>]  246/246
    246 dependencies successfully precompiled in 140 seconds
    
    julia> @time using Plots
      3.689727 seconds (6.58 M allocations: 472.965 MiB, 7.49% gc time, 0.13% compilation time)
    
    julia> @time begin
           using Plots
           x = 1:10; y = rand(10); # These are the plotting data
           plot(x, y)
           end
      3.050765 seconds (3.63 M allocations: 218.824 MiB, 4.87% gc time, 59.07% compilation time)
    
    julia> @time begin
           using Plots
           x = 1:10; y = rand(10); # These are the plotting data
           plot(x, y)
           end
      0.001435 seconds (2.61 k allocations: 161.836 KiB)

Sukera · on March 25, 2021

What kind of speed do you see now?

MisterBiggs · on March 25, 2021

No idea if this is really a fair comparison but just to get a brief idea of current speeds:

   julia> @time let
          using Plots
          plot([sin, cos])
          end
        11.267558 seconds (17.98 M allocations: 1.114 GiB, 4.83% gc time)

Versus Matlab which probably takes about 15 seconds just to open the editor but plotting is very fast.

   >> tic
   fplot( @(x) [sin(x) cos(x)])
   toc
   Elapsed time is 0.374394 seconds.

Julia is just about as fast as Matlab after the first run for plotting.

dan-robertson · on March 25, 2021

I wonder how much Julia could be helped with some uneval/image-saving magic. So when you run the repl you instead get a pre-built binary with plot already loaded and several common specialisations already compiled.

staticfloat · on March 25, 2021

We call these "system images" and you can generate them with PackageCompiler [0]. Unfortunately, it's still a little cumbersome to create them, but this is something that we're improving from release to release. One possible future is where an environment can be "baked", such that when you start Julia pointing to that environment (via `--project`) it loads all the packages more or less instantaneously.

The downside is that generating system images can be quite slow, so we're still working on ways to generate them incrementally. In any case, if you're inspired to work on this kind of stuff, it's definitely something the entire community is interested in!

[0] https://github.com/JuliaLang/PackageCompiler.jl

jpfr · on March 27, 2021

Easy, you only have to add a handful of „battery included“ packages to the default system image.

That however means that some packages get a preferred status in the Julia ecosystem.

siproprio · on March 25, 2021

> Matlab which probably takes about 15 seconds just to open the editor

Try this:

matlab -nosplash -nodesktop -r "tic; fplot( @(x) [sin(x) cos(x)]); toc"

cbolton · on March 28, 2021

This took 25 seconds on my laptop (15 seconds the second time, when the filesystem was warm).

eigenspace · on March 25, 2021

I believe that's still not going to capture the startup time of Matlab itself, right?

siproprio · on March 25, 2021

It's faster than opening the editor, though.

leephillips · on March 25, 2021

I’ve also been running the release candidates, and I get something like 6 seconds to first plot on my 2013 laptop, including the time for `using Plots` and the time to actually draw the first plot. A huge improvement; kudos to the developers.

snicker7 · on March 25, 2021

On the package ecosystem side, 1.6 is required for JET.jl [0]. Despite being a dynamic language, the Julia compiler does a lot of static analysis (or "abstract interpretation" in Julia lingo). JET.jl exposes some of this to the user, opening a path for additional static analysis tools (or maybe even custom compilers).

[0]: https://github.com/aviatesk/JET.jl

akdor1154 · on March 25, 2021

Good gracious, thanks for this. If JET goes anywhere, then that+other goodies in 1.6 mean I will likely switch back from Python+mypy.

celrod · on March 25, 2021

> or maybe even custom compilers

Like for autodiff or GPUs.

cbkeller · on March 25, 2021

See also Lyndon’s blog post [1] about what all has changed since 1.0, for anyone who’s been away for a while.

[1] https://www.oxinabox.net/2021/02/13/Julia-1.6-what-has-chang...

wiz21c · on March 25, 2021

Whatever improves loading times is more than welcome. It's not really acceptable to wait because you import some libraries. In understand Julia makes lots of things under the hood and that there's a price to pay for that but being a python user, it's a bit inconvenient.

But I'll sure give it a try because Julia hits a sweet spot between expressiveness and speed (at least for the kind of stuff I do : matrix, algorithms, graphs computations).

odipar · on March 25, 2021

I like Julia (mostly because of multiple dispatch). The only thing that's lacking is an industry strength Garbage Collector, something that can be found in the JVM.

I know that you shouldn't produce garbage, but I happen to like immutable data structures and those work better with optimised GCs.

eigenspace · on March 25, 2021

Julia's garbage collector is quite good.

> I know that you shouldn't produce garbage, but I happen to like immutable data structures and those work better with optimised GCs.

If you use immutable data-structures in julia, you're rather unlikely to end up with any heap allocations at all. Unlike Java, Julia is very capable of stack allocating user defined types.

dan-robertson · on March 27, 2021

I think that’s true for small structs made of floats but not true for something like an immutable lisp-style linked list.

DNF2 · on March 29, 2021

Not just floats, and I'm not sure they have to be that small. All sorts of structs containing bitstypes/value types can be stack allocated. In fact, even some structs with pointers to heap-allocated memory can be stack-allocated (such as array views.)

I don't know about linked lists, though.

superdimwit · on March 25, 2021

A low-latency GC would also be great. But again, the JVM only has that due to many millions of dollars spent over decades.

newswasboring · on March 25, 2021

I didn't even know julia GC had issues. Care to elaborate?

StefanKarpinski · on March 25, 2021

It doesn’t, it just doesn’t have a $100B GC like Java does. Rather than spending that kind of money trying to compensate for a language design that generates massive amounts of garbage (ie Java), Julia takes the approach of making it easier to avoid generating garbage in the first place, eg by using immutable structures that can be stack allocated and having nice APIs for modifying pre-allocated data structures in place.

odipar · on March 25, 2021

I don't think you can allocate immutable data structures on the stack.

I've never seen a 10 million entry immutable set on the stack but I could be wrong.

StefanKarpinski · on March 25, 2021

Obviously a 10-million-element array doesn't get stack allocated. But if individual objects of some type are immutable, then they can be stack allocated, or maybe not allocated at all (kept in registers).

Edit: reading your other post, it seems like you may mean persistent data structures, a la Clojure, rather than immutable structures, which are quite different. The former would indeed always be heap-allocated (it's necessary since they are quite pointer-heavy). Immutable structures, on the other hand are detached from any particular location in memory.

Moreover, if the elements in an array are mutable, eg Java objects, then each one needs to be individually heap allocated with a vtable pointer and the array has to be an array of pointers to those individually allocated objects. For pointer-sized objects (say an object that has a single pointer-sized field), that takes 3x memory to store x objects, so that's already brutal, but worse is that since the objects are all individually allocated, the GC needs to look at every single one, and freeing the space is a fragmentation nightmare. If the objects are immutable (and the type is final; btw all concrete types are final in Julia), then you can store them inline with no overhead and GC can deal with them a single big block.

Btw, I had to vouch for you to undead your posts in order to reply. Looks like you got downvoted a bunch.

leephillips · on March 25, 2021

Ouch.

StefanKarpinski · on March 25, 2021

I mean I’m not trying to hate on Java — pointer-heavy programming was all the rage when it was designed, and GC was a hot research topic, so there was good reason to be optimistic about that approach. But it turns out that it’s very hard to make up for generating tons of garbage and pointer-heavy programming hasn’t aged well given the way hardware has evolved (pointers are large and indirection is expensive).

pron · on March 25, 2021

You're mixing two things here: memory management and memory layout. This "pointer-heavy programming" is, indeed, a bad fit for modern hardware in terms of processing speed due to cache misses, which is why even Java is now getting user-defined primitive types (aka inline types, aka value types), but in terms of memory management, in recent versions OpenJDK is pretty spectacular, not only in throughput but also latency (ZGC in JDK 16 has sub millisecond maximum pause time for any size of heap and up to a very respectable allocation rate: https://malloc.se/blog/zgc-jdk16 and both throughput and max allocation rate are expected to grow drastically in the coming year with ZGC becoming generational). As far as performance is concerned, GC can now be considered a solved problem (albeit one that requires a complex implementation); the only real price you pay is in footprint overhead.

StefanKarpinski · on March 25, 2021

I'm not — memory layout and memory management are (fairly obviously, I would think) intimately related. In particular, pointer-heavy memory layouts put way more stress on the garbage collector. Java's choice of making objects mutable, subtypeable and have reference semantics, basically forces them to be individually heap-allocated and accessed via pointers. On the other hand, if you design your language so that you can avoid heap allocating lots of individual objects, then you can get away with a much simpler garbage collector. Java only needs spectacular GC technology because the language is designed in such a way that it generates a spectacular amount of garbage.

pron · on March 25, 2021

I would say no. To have stellar performance, you'll need compaction, you'll need parallelism (of GC threads), and you'll need concurrency between the GC threads and mutator threads; and for good throughput/footprint tradeoff you'll need generational collection. True, you might not need to contend with allocation rates that are that high, but getting, say, concurrent compaction (as in ZGC) and/or partial collections (as in G1), requires a sophisticated GC. E.g. Go isn't as pointer-heavy as (pre-Valhalla) Java, and its GC is simple and offers very good latency, but it doesn't compact and it throttles, leading to lower throughput (I mean total program sluggishness) than you'd see in Java, even with a much higher allocation rate. The thing is that even with a low allocation rate, you'd get some challenging heaps, only later, say, every 10 seconds instead of every 5.

It's true that a simpler GC might get you acceptable performance for your requirements if your allocation rate is relatively low, but you still won't get OpenJDK performance. So I'd say that if you design your language to require fewer objects, then you can get by with a simple GC if your performance requirements aren't too demanding.

All that dereferencing puts a higher load on data structure traversal (which is why Java is getting "flattenable" types) than on the GC. The main reason for Java's particular GC challenges isn't its pointer-heavy (pre-Valhalla) design but the mere fact that it is the GCed platform that sees the heaviest workloads and most challenging requirements by far. Java's GC needs to work hard mostly for the simple reason that Java is asked to do a lot (and the better some automated mechanism works, the more people push it).

bbatha · on March 25, 2021

Go has only concurrency of the features you talk about but is often competitive with Java in benchmarks. In my experience I’ll take administration of go processes any day of the week over Java — I’ve lost count of the number of hours lost to debugging gc stalls, runaway heaps and other garbage collector related issues never once had those problems in go.

Go even reverted a generational collector because it had no performance benefits since most generational objects would be stack allocated anyway — Julia’s JIT and way more advanced llvm backend should do even better than go in keeping objects stack local and inline.

pron · on March 25, 2021

It's competitive in pretty forgiving benchmarks. And LLVM is way more advanced than Go's compiler, but not OpenJDK's. I'm not saying you have to prefer Java to Go, but its throughput is better. As to the stack-allocation claim, young generations might be hundreds of MBs; that might correspond to the stacks of 100K goroutines on some server workloads, but not of a few threads.

So I'm not saying you must prefer Java to Go (even though GC tuning is a thing of the past as of JDK 15 or 16), or that Go's performance isn't adequate for many reasonable workloads, only that 1. a flatter object landscape might still not match Java's memory management performance without sophisticated GCs, and 2. I wouldn't extrapolate from Go to Julia, as they are languages targeting very different workloads. E.g. Julia might well prefer higher throughput over lower latency, and Go's GC's throughput is not great.

snovv_crash · on March 26, 2021

Having a Lamborghini racing a Toyota Corolla is of course going to show the Lambo winning. But if I need to maintain a fleet of them to move 1000 passengers around a city with certain availability guarantees, I'm going with the Toyotas every time.

pjmlp · on March 26, 2021

Likewise if I do races for living I am picking the Lamborghini.

snovv_crash · on March 26, 2021

But Java isn't a Lamborghini exactly, is it? It just has the maintenance issues of one.

pjmlp · on March 26, 2021

When compared with Go it certainly is, across all axes.

Then there are a couple of McLaren, Lotus and F1, but then it is another level altogether.

leephillips · on March 25, 2021

Hating on Java seems perfectly reasonable to me.

odipar · on March 25, 2021

so are you for GC or against GC?

In other posts you actually argue that GCs help you reduce complexity because manual memory management is too much of a hassle.

May be immutable is not the correct term - persistent data structures is what I like support for: that is my use-case.

I think you can have efficient persistent data structures without a GC, but that requires fast reference counting and in turn, that requires a lot of work to be competitive with the JVM.

I also understand that my use-case is not Julia's focus. That's perfectly fine.

ChrisRackauckas · on March 25, 2021

That's a major oversimplification. GC is good for ease of use and safety of a high level language. GC is never as performant as not requiring heap allocations at all. Julia has a GC, but also provides a lot of tools to avoid needing the GC in high performance computations. This combination gives ease of use and performance.

Java sacrifices some performance for having this "one paradigm" of all objects, and then heavily invested in the GC, but in many cases like writing a BLAS it still just will not give performance exactly matching a highly tuned code, where as in Julia for example you can write really fast BLAS codes like Octavian.jl.

Julia is multi-paradigm in a way that is purposely designed for how these features compose. I think it's important to appreciate that design choice, in both its pros and cons.

celrod · on March 25, 2021

To tie Octavian.jl into this memory allocation discussion:

Octavian uses stack-allocated temporaries when "packing" left matrix ("A" in "A*B"). These temporaries can have tens of thousands of elements, so that's a non-trivial stack allocation (the memory is mutable to boot). No heap allocations or GC activity needed (just a GC.@preserve to mark its lifetime). If I understand correctly, this isn't something that'd be possible in Java?

To be fair, you can also just use preallocated global memory for your temporaries, since the maximum amount of memory needed is known ahead of time.

StefanKarpinski · on March 25, 2021

I don't know that the object model is why writing a BLAS in Java doesn't make sense. After all they special case `float` and `double` as primitives, which bifurcates the whole type system and is its own whole issue, but means that you can store them efficiently inline. I'm actually not sure what stops someone from writing a BLAS in Java except that it would be hard and there's no point.

odipar · on March 25, 2021

I like your response, and yes, it was a major oversimplification and I'm sorry for that.

Indeed, it is always about design choices and trade-offs. I can see why BLAS code is important and why Julia is an optimal choice for computation heavy problems.

StefanKarpinski · on March 25, 2021

I love GC — it solves a ton of nasty problems in a programming language design with a single feature that users mostly don't have to worry about. Just because you have a GC, however, doesn't mean that it's a good idea to generate as much garbage as you can — garbage collection isn't free. That's where Java IMO went wrong. Java's design — objects are and subtypeable (by default) and mutable with reference semantics — generates an absolute epic amount of garbage. It seems like the hope was that improvements in GC technology would make this a non-issue in the future, but we're in the future and it hasn't turned out that way: even with vast amounts of money that have been spent on JVM GCs, garbage is still often an issue in Java. And this has given GC in general a bad name IMO quite unfairly. It just happens that Java simultaneously popularized GC and gave it a bad name by having a design that made it virtually impossible for the GC to keep up with the amount of garbage that was generated.

It is entirely possible to design a garbage collected language that doesn't generate so goddamned much garbage — and this works much, much better because a relatively simple GC can easily keep up. Julia and Go are good examples of this. Julia uses immutable types extensively and by default, while Go uses value semantics, which has a similar effect on garbage (but has other issues). With a language design that doesn't spew so much garbage, if you only care about throughput, a relatively simple generational mark-and-sweep collector is totally fine. This is what Julia has. If you also want to minimize GC pause latency, then you need to get fancier like Go (I think they have a concurrent collector that can be paused when it's time slice is up and resumed later).

Persistent data structures are a whole different question that I haven't really spent much time thinking about. Clojure seems to be the state of the art there but I have no idea if that's because of the JVM or despite it.

neolog · on March 26, 2021

> If you also want to minimize GC pause latency, then you need to get fancier like Go (I think they have a concurrent collector that can be paused when it's time slice is up and resumed later).

How possible would it be for Julia to add this? I keep thinking Julia would be great for graphical environments and gaming, but high GC latency won't work there.

StefanKarpinski · on March 26, 2021

Very doable, “just” a bunch of moderately tricky compiler work. Will happen at some point. Things that would make it happen sooner: someone interested in compiler work decides to do it; some company decides to fund it.

odipar · on March 25, 2021

Thanks for the reply!

Unfortunately, persistent data structures tend to produce (short-lived) garbage which the JVM is very good at collecting!

So yes, Clojure benefits immensely from the JVM.

It is also an interesting research topic whether (optimised) reference counting would be a better approach.

Regarding objects, there is also a "middle ground" to consider:

Split big (immutable) arrays in smaller ones, connect them with some pointers in between, and you are still cache friendly.

Also, you can do a lot on the application level to reduce garbage, and most Java programmers don't care for that exactly because of JVM.

c-cube · on March 26, 2021

About the refcounting approach, you may want to look at the Perceus paper. It's refcounting with dynamic reuse of memory that isn't shared (like a sort of runtime linear typing), and it's used in Koka for functional programming.

pron · on March 25, 2021

> garbage is still often an issue in Java.

Not anymore. That future is here. Java is getting "flattenable" types not because of GC, but because of iteration.

StefanKarpinski · on March 25, 2021

Has a version of Java with value types been released?

pron · on March 25, 2021

No, but there's finally a JEP, which normally means that release is imminent (I'm not involved with that project, so I have no inside information): https://openjdk.java.net/jeps/401

As part of this change, existing built-in primitive types (like int or double) will retroactively become instances of these more general objects.

adgjlsfhk1 · on March 25, 2021

The biggest struggle Julia's GC has is that in multi-threaded workloads, it sometimes isn't aggressive enough to reclaim memory leading to OOM.

StefanKarpinski · on March 25, 2021

This is very legit issue that the compiler team has their eye on and plans to work on.

adgjlsfhk1 · on March 25, 2021

fyi, I'm oscardssmith on most other channels.

StefanKarpinski · on March 25, 2021

Hi, Oscar! Nice user name :P

adgjlsfhk1 · on March 26, 2021

It's my backup username discovered when I was 10 or so, and I wanted something that wasn't my name and would be available everywhere.

noisy_boy · on March 25, 2021

How easy it is to produce a compiled executable in 1.6? I took a cursory look at the docs but couldn't spot the steps for doing so.

dklend122 · on March 25, 2021

That's coming. Pieces are there but still need polish and integration. Fib was around 44kb with no runtime required.

Check out staticcompiler.jl

superdimwit · on March 25, 2021

In your experience, what are the current limitations?

eigenspace · on March 25, 2021

There's all sorts of limitations right now. E.g. you can't allocate an array right now, dynamic dispatch is prohibited, there are some bugs too.

Most of this is just a relic from StaticCompiler.jl being a very straightforward slight repurposing of GPUCompiler.jl. It will take some work to make it robust on CPU code, but the path to doing so it pretty strightforward. It just requires dedicated work, but it's not a top priority for anyone who has the know-how currently.

triztian · on March 25, 2021

I'd love to be able to contribute to this, somewhat new to Julia, but I'm sure I could help with something ...

Do you have any links to what has been currently done?

EDIT: Found this thread so far: https://discourse.julialang.org/t/statically-compiled-and-st...

eigenspace · on March 25, 2021

This is the PR to look at if you want to try and help: https://github.com/tshort/StaticCompiler.jl/pull/46

I think this isn't really a great place for beginners though unfortunately. This project is tightly coupled to undocumented internals of not only julia's compiler, but also LLVM.jl and GPUCompiler.jl. It'll require a lot of learning to be able to meaningfully contribute at this stage.

patagurbon · on March 26, 2021

Where do you recommend an experienced Julia user start working on internals/core tools?

Tarrosion · on March 26, 2021

What's the difference between StaticCompiler.jl and PackageCompiler.jl? Doesn't the latter also let you build "apps"?

ced · on March 25, 2021

We did it for production code installed at client sites, and it has been very easy for us. YMMV

triztian · on March 25, 2021

I’ve also looked for this, does it mean that I have to install julia on the target machine and it’ll recompile when running?

Or are there steps to produce a binary (much like Go or C or Rust)??

cbkeller · on March 25, 2021

Currently you can make a relocatable “bundle” / “app” with PackageCompiler.jl, but the bundle itself includes a Julia runtime.

Making a nice small static binary is technically possible using an approach similar to what GPUCompiler.jl does, but the CPU equivalent of that isn’t quite ready for primetime.

triztian · on March 25, 2021

Thank you for your reply! PackageCompiler looks like the right way.

Do you happen have any links to the static binary procedure? Or links to the current state of efforts for this?

eigenspace · on March 25, 2021

https://github.com/tshort/StaticCompiler.jl/pull/46

ku-man · on March 25, 2021

You forgot to mention that a 'Hello World' standalone file is about 0.5 GB!

tfehring · on March 26, 2021

I think something to that effect was implicit in "the bundle itself includes a Julia runtime," but I vouched for this comment anyway since it's an important limitation and the parent comment evidently wasn't explicit enough to prevent confusion.

pansa2 · on March 26, 2021

Elsewhere someone says “Fib was around 44kb with no runtime required”. Which is correct?

newswasboring · on March 26, 2021

They are talking about two different systems. Static compilation is a separate project which is trying to include only those compiled code that is required. That isn't ready yet for normal people like me, but if you have the knowhow and your program meets certain requirements you can get a tiny binary.

PackageCompiler.jl just compiles everything and packages it up. It generates huge files, because it doesn't discriminate on which compiled stuff to include.

pansa2 · on March 26, 2021

I see, thanks. Looks like static compilation will only work if the entire program is “type stable”, which AFAICT means that the type of every variable can be deduced statically.

Sukera · on March 25, 2021

You probably want to check out PackageCompiler.jl (https://julialang.github.io/PackageCompiler.jl/dev/)

systems · on March 25, 2021

I know its minor, but I still hope they will fix scoping

not that my suggestion is good, but what they have now is bad

https://github.com/JuliaLang/julia/issues/37187

StefanKarpinski · on March 25, 2021

Has been fixed since 1.5.

systems · on March 25, 2021

no it has not, they now have different rules for repl, which is part of scope awkwardness

socialdemocrat · on March 26, 2021

Seems like a reasonable trade off to me. The previous behavior was just really annoying in practice.

systems · on March 26, 2021

its an exceptional behaviour, its a code smell, good code, should try to make exceptions normal

its a very poor design they don't have variable declaration, and they have to go head over heels to provide acceptable behaviour

anyway, it is a bad thing in the language, lets not defend it (i hope)

julia still have a lot to offer, i guess it can afford one design flaw

patagurbon · on March 25, 2021

Should that issue be closed then?

systems · on March 25, 2021

Its a suggestion to fix the awkwardness, one that will never get approved

"Put your code into functions, people!" .. is the reason why most people dont notice the weird scoping rules

You will only hit the weirdness face first, if you write scripts with global variables, which is usually what beginners do

Most advanced users, and library writers, probably hardly notice it

wott · on March 25, 2021

> Its a suggestion to fix the awkwardness, one that will never get approved

You were courageous to even try :-)

From their refusal to see any use in explicit variables declarations, their (somewhat related) huge scope debacle, to its strange and irregular 'resolution', not to mention the original absurdly weird propositions they had made to resolve it: the scope and variable declaration subject is pretty hopeless in Julia land. I quickly gave up on it years ago (long before the scope debacle), as I had no intention of losing my time, when I saw the arguments and the logic they used.

DNF2 · on March 26, 2021

This is just a disagreement over basic design: should variable declarations be explicit or not. It is a choice, and something that reasonable people can disagree on.

Framing this as a case of irrational and illogical behaviour is unnecessary and unreasonable in my opinion. A lot of serious thought and debate went into the resolution. There is no need to disrespect and badmouth people because they have different priorities than you.

systems · on March 26, 2021

Can you tell me one design benefit (a real good one) for not declaring variables before their use, in a mutable by default programming language

DNF2 · on March 26, 2021

I don't have a strong opinion on this, one way or the other. It's less verbose (which I like), and more familiar to those used to dynamic languages like python, matlab, etc. But this isn't my decision, I'm ok with either.

mbauman · on March 25, 2021

The feature I'm most excited about is the parallel — and automatic — precompilation. Combined with the iterative latency improvements, Julia 1.6 has far fewer coffee breaks.

shmeano · on March 25, 2021

or sword fights https://xkcd.com/303/

yesenadam · on March 25, 2021

Ohh, is that what the programmers were doing all through Halt and Catch Fire? Waiting for compilation? I couldn't understand how they got away with acting like naughty 5 year olds, throwing things at each other constantly.

pjmlp · on March 25, 2021

Love the improvements, all those little details that improve the overall usability.

xiphias2 · on March 25, 2021

Cool, I was thinking of downloading the RC, the demo was so impressive.

Will there be an M1 Mac version for 1.7?

thetwentyone · on March 25, 2021

I think so - Julia master branch (1.7 precursor) works on M1, but not all the dependencies that some packages require have been built for M1. Though, I understand that the wonderful packaging system and the folks who work on it are working on it.

> `git clone https://github.com/JuliaLang/julia` and `make` should be enough at this point.

https://github.com/JuliaLang/julia/issues/36617#issuecomment...

staticfloat · on March 25, 2021

Yeah, we've managed to get Julia itself running pretty well on the M1, there are still a few outstanding issues such as backtraces not being as high-quality as on other platforms. You can see the overall tracking issue [0] for a more granular status on the platform support.

For the package ecosystem as a whole, we will be slowly increasing the number of third-party packages that are built for aarch64-darwin, but this is a major undertaking, so I don't expect it to be truly "finished" for 3-6 months. This is due to both technical issues (packages may not build cleanly on aarch64-darwin and may need some patching/updating especially since some of our compilers like gfortran are prerelease testing builds, building for aarch64-darwin means that the packages must be marked as compatible with Julia 1.6+ only--due to a limitation in Julia 1.5-, etc...) as well as practical (Our packaging team is primarily volunteers and they only have so much bandwidth to help fix compilation issues).

[0] https://github.com/JuliaLang/julia/issues/36617

xiphias2 · on March 25, 2021

That's great, releasing something that isn't yet ready doesn't make sense. But when it works, it's important enough to warrant a release :)

xiphias2 · on March 25, 2021

Cool, I'll start with 1.6, but it looks interesting of course :)

fermienrico · on March 25, 2021

Are the performance claims of Julia greatly exaggerated?

Julia loses almost consistently to Go, Crystal, Nim, Rust, Kotlin, Python (PyPy, Numpy): https://github.com/kostya/benchmarks

Is this because of bad typing or they didn't use Julia properly in idiomatic manner?

stabbles · on March 25, 2021

I think it's more interesting to see what people do with the language instead of focusing on microbenchmarks. There's for instance this great package https://github.com/JuliaSIMD/LoopVectorization.jl which exports a simple macro `@avx` which you can stick to loops to vectorize them in ways better than the compiler (=LLVM). It's quite remarkable you can implement this in the language as a package as opposed to having LLVM improve or the julia compiler team figure this out.

See the docs which kinda read like blog posts: https://juliasimd.github.io/LoopVectorization.jl/stable/

And then replacing the matmul.jl with the following:

    @avx for i = 1:m, j = 1:p
        z = 0.0
        for k = 1:n
            z += a[i, k] * b[k, j]
        end
        out[i, j] = z
    end

I get a 4x speedup from 2.72s to 0.63s. And with @avxt (threaded) using 8 threads it goes town to 0.082s on my amd ryzen cpu. (So this is not dispatching to MKL/OpenBLAS/etc). Doing the same in native Python takes 403.781s on this system -- haven't tried the others.

SatvikBeri · on March 25, 2021

I've rewritten two major pipelines from numpy-heavy, fairly optimized Python to Julia and gotten a 30x performance improvement in one, and 10x in the other. It's pretty fast!

paul_milovanov · on March 25, 2021

looks like they're just multiplying two 100x100 matrices, once? (maybe I'm reading it wrong?) in Julia, runtime would be dominated by compilation + startup time.

A fair comparison with C++ would be to at least include the compilation/linking time into the time reported.

Ditto for Java or any JVM language (you'd have JVM startup cost but that doesn't count the compilation time for bytecode).

Generally, for stuff (scientific computing benchmarks) like this you want to run a lot of computation precisely to avoid stuff like this (i.e you want to fairly allow the cost of compilation & startup amortize)

StefanKarpinski · on March 25, 2021

This appears to be a set of benchmarks of how fast a brainfuck interpreter implemented in different programming languages is on a small set of brainfuck programs? What a bizarre thing to care about benchmarks for. Are you planning on using Julia by writing brainfuck code and then running it through an interpreter written in Julia?

fermienrico · on March 25, 2021

Seems like you're the founder of Julia. Why such a knee jerk reaction? Did you read the benchmark page? The table of content is right at the top.

Optics of this type of reaction is seen everywhere in the Julia community. My advice is to embrace negativity around the language, try to understand if it is fabrication or legitimate, and address the shortcomings.

Julia is a beautiful language and hope some of the warts of the language gets fixed.

StefanKarpinski · on March 25, 2021

When I wrote that I was under the impression that the brainfuck interpreter implementations were the only benchmarks in the repo. There are, however (I now realize), also benchmarks for base64 decoding, JSON parsing, and writing your own matmul (rather than calling a BLAS matmul, which is not generally recommended), so this is more reasonable than I thought but still a somewhat odd collection of tasks to benchmark. Of course, microbenchmarks are hard — they are all fairly arbitrary and random.

In a delightful twist, it seems that there is a Julia implementation of a Brainfuck JIT that is much faster than the fastest interpreter that is benchmarked here, so even by this somewhat esoteric benchmark, Julia ends up being absurdly fast.

https://news.ycombinator.com/item?id=26585042

neolog · on March 26, 2021

I'm a daily Julia user but tbh I've gotta agree with parent commenter. I think Jeff's attitude in the "What's bad about Julia" talk is the right way to handle criticism: listen to the person, ask about their use cases, understand how Julia could be improved for that user. Accepting criticism makes a good product, and seeing project leaders do it makes a good impression.

yellowcake0 · on March 26, 2021

Oh come now, are we really so delicate that one brusque comment gets our back up?

Does the man have to be obsequious everytime he discusses his language in an informal setting?

neolog · on March 26, 2021

It's not that we're delicate, it's that poor communication between users and maintainers leads causes problems. As for "one comment", OP already mentioned that defensiveness is becoming an issue in the community.

StefanKarpinski · on March 26, 2021

Not my best but as I said, I was genuinely confused about the benchmarks.

CyberDildonics · on March 25, 2021

Don't be ridiculous. If someone puts emphasis on nonsense, dismissing it is reasonable.

tgv · on March 26, 2021

Idk, but just a few weeks ago I started looking at Julia, partly because of the performance claims. I wanted to write a program a bit heavier than your average starter program, so I wrote a back-tracker (automatic layout for stripboards, to be precise). It was

* interesting (not fun) to find out how Julia works

* annoying AF to discover that much of the teaching material was hidden behind some 3rd party website, presumably in videos (I didn't bother to register, but started browsing the manual instead). What's wrong with text?

* unnecessarily complex because the documentation for the basic functions is nearly inaccessible to beginners.

But, I managed to get a simple layout system up and running, and it wasn't fast. I rewrote it in Go (the language in which I'm currently working most), and it was literally >100x faster. And that should not be due to the startup costs, because a backtracker shouldn't have that much overhead JIT-ing.

I think I can now say that I can't see the use case for Julia. "Faster than Python" is simply not good enough, and for the rest there are no redeeming features. Perhaps the fabled partial differential equation module is worth it, but that can get ported to other languages, I guess.

oivey · on March 26, 2021

Your relative skill and time invested in Julia vs Go makes that a not very fair comparison, I think. A 100x difference in performance is probably a sign of something that could be fixed in your code (common one: type instability). In general, Julia is being used to implement things like competitive versions of BLAS. Your Julia code can almost certainly be made much faster.

Coming from a Python and C++ background, I found it sufficient to just read the docs and do some Advent of Code problems to get productive in Julia. What videos are you talking about? https://docs.julialang.org/en/v1/manual/performance-tips/ I found to be a pretty good document on why and when Julia can be slow.

DNF2 · on March 26, 2021

I simply do not understand how some people are able to form so strong opinions in such a short time, and spew out disdain and negativity on the most flimsy basis. It's a matter of temperament, I guess.

Julia performance should be on par with Go, if it's slower, read the performance tips in the manual. As for teaching material on 3rd party websites, I don't know what you mean. The Julia manual is available from the julialang.org website.

As for re-writing DifferentialEquations, that is extremely strongly tied to the multiple dispatch paradigm, re-writing it would be hard. What you can get is wrappers like diffeqpy and diffeqr, which call out to Julia.

tgv · on March 26, 2021

You can verify that the teaching materials are not really up to scratch. Even nim and zig, which have less resources behind them, I think, do a better job there. The manual is a reference manual, and it was difficult to find all the operations on arrays. E.g., the difference between Array{Int} and Array{Int,1} is not clarified from the start.

And as I said: I wrote a straight-forward backtracker. It just recursive function calls: check a possible state for the current item, and when successful, update the overall state and move on to the next item; on return, try another state for the current item, until the search space is exhausted. There's not a lot to optimize, nor is there a lot of work for a JIT compiler.

> on the most flimsy basis

I've got more gripes. Forward type declaration to name one. But I'm not spewing disdain: I just don't see Julia take a larger role in general software development.

DNF2 · on March 26, 2021

I have no particular opinion on the teaching materials, I just use the manual and the discussion fora, so I don't know. But if a third party offers teaching materials, it's not so strange if it resides on their third party website.

As for performance, I'm not really talking about 'optimization'. Your implementation may simply have used some pattern that should be avoided, such as global variables, type instabilities, abstract types in structs, or some inappropriate data structures. If it's a microbencmhark, then there are some things to keep in mind.

These are not really optimizations, but basic performance principles. I cannot know that you are unaware of them, but your statement that 'there's not a lot to optimize' make me suspect that this could be the case. The unusual thing about Julia is that it's both dynamic and compiled, so that code that would simply not compile in static languages instead ends up slow.

oivey · on March 26, 2021

If I had to guess, your problem is type stability. Are you using NamedTuples to store your state and items you’re iterating over? If the keys and are not all the same and value types don’t stay the same (e.g. something initialized as zero(Int) and then accumulated into with Float64s) then performance will suffer. Another possibility is that you have a data type not is not concrete in an inner loop. For example, Array{Real} will be slower than Array{Float64} because an array of Reals has to support arrays mixing Float32 and Float64. If you had this in a function definition the likely correct thing to do is Array{<:Real}, which means the element type of the array must be a subtype of Real. Maybe even better, just drop the type annotations. They very, very rarely improve performance because Julia relies on compile time type inference.

Failed or bad type inference is almost always the cause of performance issues in Julia. Getting a feel for when the compiler can infer things or not takes practice, but it’s a lot easier than the semantics of generic programming systems IMO.

The REPL is really great for learning. If you type “Array{Int} == Array{Int, 1}” the result is false. If you type “?Array” it prints the docstring which gives some guidance on how to use one versus the other.

otde · on March 25, 2021

I think this particular Julia code is pretty misleading, and I'm (probably) one of the most qualified people in this particular neck of the woods. I wrote a transpiler for Julia that converts a Brainfuck program to a native Julia function at parse time, which you can then call like you would any other julia function.

Here's code I ran, with results:

  julia> using GalaxyBrain, BenchmarkTools

  julia> bench = bf"""
      >++[<+++++++++++++>-]<[[>+>+<<-]>[<+>-]++++++++       
      [>++++++++<-]>.[-]<<>++++++++++[>++++++++++[>++
      ++++++++[>++++++++++[>++++++++++[>++++++++++[>+       
      +++++++++[-]<-]<-]<-]<-]<-]<-]<-]++++++++++."""

  julia> @benchmark $(bench)(; output=devnull, memory_size=100)
  BenchmarkTools.Trial: 
    memory estimate:  352 bytes
    allocs estimate:  3
    --------------
    minimum time:     96.706 ms (0.00% GC)
    median time:      97.633 ms (0.00% GC)
    mean time:        98.347 ms (0.00% GC)
    maximum time:     102.814 ms (0.00% GC)
    --------------
    samples:          51
    evals/sample:     1

  julia> mandel = bf"(not printing for brevity's sake)"

  julia> @benchmark $(mandel)(; output=devnull, memory_size=500)
  BenchmarkTools.Trial: 
    memory estimate:  784 bytes
    allocs estimate:  3
    --------------  
    minimum time:     1.006 s (0.00% GC)
    median time:      1.009 s (0.00% GC)
    mean time:        1.011 s (0.00% GC)
    maximum time:     1.022 s (0.00% GC)  
    --------------
    samples:          5  evals/sample:     1

Note that, conservatively, GalaxyBrain is about 8 times faster than C++ on "bench.b" and 13 times faster than C on "mandel.b," with each being the fastest language for the respective benchmarks. In addition, it allocates almost no memory relative to the other programs, which measure memory usage in MiB.

You could argue that I might see similar speedup for other languages on my machine, assuming I have a spectacularly fast setup, but this person ran their benchmarks on a tenth generation Intel CPU, whereas mine's an eighth generation Intel CPU:

  julia> versioninfo()
    Julia Version 1.5.1
    Commit 697e782ab8 (2020-08-25 20:08 UTC)
    Platform Info:  OS: Linux (x86_64-pc-linux-gnu)
    CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz  
    WORD_SIZE: 64
    LIBM: libopenlibm  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

This package is 70 lines of Julia code. You can check it out for yourself here: https://github.com/OTDE/GalaxyBrain.jl

I talk about this package in-depth here: https://medium.com/@otde/six-months-with-julia-parse-time-tr...

StefanKarpinski · on March 25, 2021

I love the Julia community

Sukera · on March 25, 2021

@btime reports the minimum execution time, since all increases are attributable to noise. Use @benchmark to get mean, median and maximum instead.

otde · on March 25, 2021

Thank you! Edited to fix.

eigenspace · on March 25, 2021

Beautiful

BlackFingolfin · on March 26, 2021

This is really cool!

But note that OP uses larger cells (`int` = 32 bit in the C version, `Int` = 64 bit in the Julia version) while GalaxyBrain seems to use 8 bit cells. Not that I expect this to make a major difference (but perhaps a minor one?)

otde · on March 26, 2021

The real issue is that the original brainfuck spec (as given by the Wikipedia entry) explicitly sets the sizes of each cell to a single byte —- which means many of the interpreters used for this benchmark are using incorrect cell sizes!

fishmaster · on March 25, 2021

That is very cool.

adgjlsfhk1 · on March 25, 2021

They are measuring compile time, not runtime speed.

stabbles · on March 25, 2021

They are measuring compile time and runtime speed, not just runtime speed like for statically compiled langauges

stjohnswarts · on March 25, 2021

Is that truly accurate though ? I could see them comparing say load time of data files plus execution time but combining compile times in there doesn't make much sense. You always have to pay for it in julia but not with a statically compiled file.

dklend122 · on March 25, 2021

You only pay for it on the first run.

dunefox · on March 25, 2021

Where does it say that?

ZeroCool2u · on March 25, 2021

I'm not a huge Julia user, but typically if they don't specifically mention they're segmenting runtime from compilation time with Julia, that's a bit of a red flag, because unlike Rust, Go, or C++ the compilation step isn't separate in Julia. To the user it just looks like it's running, when in reality it's compiling, then running, without really letting you know in between.

cygx · on March 25, 2021

In the matrix multiplication example, the measurement is done via a simple

    t = time()
    results = calc(n)
    elapsed = time() - t

So startup time at least isn't included.

One might argue that this is still biased against Julia due to its compilation strategy, but fixing that would mean you'd have to figure out what the appropriate way to get 'equivalent' timings for any of the other languages would be as well - something far more involved than just slapping a timer around a block of code in all cases...

edit: As pointed out below, the Julia code should indeed already have been 'warmed up' due to a preceding sanity check. My apologies for 'lying'...

mbauman · on March 25, 2021

The problem is a minor placement issue for the `@simd` macro: https://github.com/kostya/benchmarks/pull/317

komuher · on March 25, 2021

If u cant even read code dont lie xD

    n = length(ARGS) > 0 ? parse(Int, ARGS[1]) : 100
    left = calc(101)  # <------- THIS IS COLD START JITTING CALL
    right = -18.67
    if abs(left - right) > 0.1
        println(stderr, "$(left) != $(right)")
        exit(1)
    end

    notify("Julia (no BLAS)\t$(getpid())")
    t = time()
    results = calc(n)
    elapsed = time() - t
    notify("stop")

stabbles · on March 25, 2021

not sure why this was downvoted to oblivion, as it seems to be correct

detaro · on March 25, 2021

presumably because of the tone.

stabbles · on March 25, 2021

Ah, I have to take that back, since benchmarks run in the order of seconds and they use sockets to start and stop the timer, which likely means compilation time is not included.

machineko · on March 25, 2021

I think i can answer that, first of all Julia isnt as fast as C/C++/Nim etc. in most cases Julia is just fast in scientific computing that's all. (there is only one "scientific" benchmark on kostya benchmarks)

Second to write very fast julia u need to knew a lot of "tricks" and in most cases u won't be doing it as easy as writing normal code.

And all people writing this benchmark is measuring compilation time (XD?) or not including jitting time they could just look at code/readme for 5s before commenting.

Julia is fast and can be as fast as C but not in all cases and not as easy at it seems.

snicker7 · on March 26, 2021

> Second to write very fast julia u need to knew a lot of "tricks" and in most cases u won't be doing it as easy as writing normal code.

That's true in literally any language. Some languages require inlined assembly. Others require preprocessor directives. In almost all languages, you need to understand the difference between stack and heap, know how to minimize allocations, know how to minimize dynamic dispatch, know how to efficiently structure cache-friendly memory layouts. And of course, data structures & algorithms 101.

In terms of performance, Julia provides the following:

1. Zero-cost abstractions. And since it has homoiconic macros, users can create their own zero-cost abstractions, e.g. AoS to SoA conversions, auto-vectorization. Managing the complexity-performance trade-off is critical. But you don't see that in micro-benchmarks.

2. Fast iteration speed. Julia is optimized for interactive computing. I can compile any function into its SSA form, LLVM bytecode, or native assembler. And I can inspect this in a Pluto notebook. Optimizing Julia is fun, which is less true in other languages.

RyEgswuCsn · on March 26, 2021

> That's true in literally any language. Some languages require inlined assembly. Others require preprocessor directives. In almost all languages, you need to understand the difference between stack and heap, know how to minimize allocations, know how to minimize dynamic dispatch, know how to efficiently structure cache-friendly memory layouts. And of course, data structures & algorithms 101.

I think what s/he meant to say is that Julia is not "magically" faster than other languages. The real questions are:

1. Can unoptimised Julia code run as fast as unoptimised c/c++ code? I think the linked benchmark suggests this is not really the case.

2. Can optimised Julia code run faster than comparably (i.e. requiring similar amount of effort and expertise) optimised c/c++ code? If not, then why use Julia?

mbauman · on March 26, 2021

> Julia is not "magically" faster than other languages

That's somewhat true, and is at the end-point of some mismatched expectations when folks come to Julia. Julia is a high-level dynamic language whose semantics are conducive to creating the ~same performance as static languages.

So if your unoptimized Julia program relies upon traditional "dynamic" features like `Any[]` arrays, then you should expect to see dynamic- (read: python-) like performance out of Julia. Julia should match performance of other dynamic languages here, but the complier doesn't have all the typical dynamic optimizations because, well, it's often easy to write your code in a manner that ends up hitting the happy path that gets the static-like performance.

Conversely, if your dynamic language baseline is just glue to an optimized static library, then you should expect to see static-like (read: C/C++-like) performance out of your dynamic language. Julia really should match performance here, and if it doesn't, open an issue: it's a bug.

Where Julia truly excels are the cases where you don't have a library implementation (like numpy) to lean on and find yourself writing a hot `for` loop in a dynamic language. Further, it excels at facilitating library creation, leading to more and more first-class ecosystems that are best-in-class like DiffEq.

snicker7 · on March 27, 2021

> So if your unoptimized Julia program relies upon traditional "dynamic" features like

Dynamic dispatch is slow in any language, including C/C++ (provided that the compiler can't devirtualize the method). This is why such things are never done in an inner loop.

In C++, its harder to "accidentally" use dynamic dispatch because you have to explicitly annotate a function as being virtual. In Julia, which is much more concise, type stability or instability is implicit. But it can be inspected statically via @code_warntype. Good IDE plug-ins can make it easier.

bananaquant · on March 27, 2021

Julia optimizes for a different thing. You can get your result, as in the actual useful thing that the code does/produces, much faster than with C/C++. You can skip type annotations, not worry about the memory usage, and write your code interactively using REPL or the excellent Revise.jl package.

If you have saved a couple of minutes or hours of coding and are only going to run that code a handful of times, it should not matter if it runs a second or two slower than C/C++. This is the same rationale that Python and other scripting languages have. But unlike Python, you should be able to match the speed of C/C++ or get pretty close by optimizing your code.

RyEgswuCsn · on March 28, 2021

Yes I get your point. I guess I should have phrased my first question like the following

1. Can unoptimised Julia code run faster than unoptimised Python code (with numpy being used to do the heavy lifting)?

Let's say one is prototyping some algorithm so iteration speed is more relevant than running speed. Then one can choose either Julia or Python (with the help of numpy perhaps) and get an implementation in similar timeframes. So Julia won't necessarily be more attractive here.

Now if the prototype proved that running speed is very critical to the successful application of the algorithm, then it would mean the developer now has to optimise the hell out of it. One can either:

1. Optimise the Julia codebase, if Julia was used to prototype, following the many tips and tricks available (e.g. type stability, various macros, etc.).

2. Port the algorithm to C/C++, applying the many performance best practices that people have accumulated over the years.

So if the optimised C/C++ port is capable of being any faster than the optimised Julia code, then the rational choice would be to port the implementation using C/C++; it would also mean Python would have some advantage over Julia in the prototyping phase too due to its popularity. Otherwise I'd agree that using a single language to both do prototyping and production is the best.

DNF2 · on March 29, 2021

This depends on what you mean by '(un)optimized code'. Because there's a difference between unoptimized and naive code.

'Unoptimized' code should still observe most of the performance tips in the manual (such as avoiding globals and type instability), while 'naive' code frequently does not. With some experience, you never write naive code, even for quick prototypes.

In those cases, Julia should outperform other dynamic lanuages significantly, and approach static languages in most cases.

Proper optimization means going in and removing allocations, ensuring that operations vectorize (simd), tailoring data structures for performance, adding parallelism etc. In the latter case Julia should virtually _always_ match static languages closely, otherwise it merits investigation.

RyEgswuCsn · on March 29, 2021

Well there is no type stability or scoping rules to worry about in Python, so just for the sake of comparing the two, I was indeed thinking of 'naive' Julia code vs 'naive' Python code.

The thing with Python is that 'naive' Python code is already pretty close to 'unoptimised' Python code, so one can write naive Python code with numpy and still ends up with not-too-shabby performance, provided they chose an efficient algorithm, of course. In other words, there are not as many performance mistakes one can make with Python (perhaps because it can't get any worse). I imagine that's also why so many Python users who tried Julia were disappointed that direct translations of their Python program fail to perform as fast as advertised.

timholy · on March 30, 2021

The point is, once you've gotten used to Julia you tend to write good code most of the time without even thinking about it. And that good code still "looks good," meaning it takes advantage of Julia's expressiveness and brevity. Understandably, newcomers make many more performance mistakes.

So there's often a huge difference between "unoptimized code" (something written by an experienced developer who's deliberately taking the easy way out) and "naive code" (something a newcomer might write). There can literally be orders-of-magnitude performance difference.

I agree that there isn't as much to learn about Python. But of course that's largely because of the gap in opportunities.

socialdemocrat · on March 26, 2021

To be fair Julia gives you better tools to analyze your code and figure out how to write more efficient. Being able to look at all the steps a JIT compiler will perform on an individual function helps a lot in building an intuition about what you should and should not do while writing high performance Julia code

JulianMorrison · on March 25, 2021

BTW, broken link on the documentation page, "The documentation is also available in PDF format: julia-1.6.0.pdf." No it isn't.

mbauman · on March 25, 2021

Thanks for the report:

https://github.com/JuliaLang/julia/issues/40190

Edit: it's now building:

https://github.com/JuliaLang/docs.julialang.org/runs/2196972...

JulianMorrison · on March 25, 2021

Much appreciated.

f6v · on March 25, 2021

Is there a per-project way to manage dependencies yet? I find global package installation to be the biggest weakness of all the R projects out there. Anaconda can help, but it’s not widely used for R projects. And Docker... well, don’t get me started.

adgjlsfhk1 · on March 25, 2021

Yeah. Julia's had that since (at least) 1.0. Environments are built-in, and you specify project dependencies in a Projects.toml file https://pkgdocs.julialang.org/v1/toml-files/.

staticfloat · on March 25, 2021

Small nitpick; its Project.toml (or JuliaProject.toml, to avoid name clashes) not Projects.toml

UncleOxidant · on March 25, 2021

Or you can activate a local project in a directory, add packages and the Project.toml gets created for you.

oxinabox · on March 25, 2021

Since 0.7 (which was 1.0 with deprecations) In julia 0.6 and before it was exactly as bad as described. (though there were things like Playground.jl to kind of work around it)

krastanov · on March 25, 2021

I might be misunderstanding your question, but this post is about Julia, not R. Julia has a pretty great per-project dependency management.

00117 · on March 25, 2021

Julia is a competitor of R, hence the comparison.

eigenspace · on March 25, 2021

Yes, absolutely. Julia has very strong per-project dependency tracking and reproducibility.

psychometry · on March 25, 2021

renv is how R projects do per-package dependency management. Before renv there was packrat. This has been a solved problem for years now...

f6v · on March 25, 2021

Doesn’t mean it’s adopted though.

siproprio · on March 25, 2021

On 1.6, I tried "] add Plots" and julia got stuck.

StefanKarpinski · on March 25, 2021

Please file an issue describing the situation: https://github.com/JuliaLang/julia/issues/new

siproprio · on March 25, 2021

After the issue, I nuked the .julia folder, and now it is taking too long to clone the "JuliaRegistries/General.git" repo.

By the download speed, it might take a few hours before I can plot something.

It also seems that just doing "git clone JuliaRegistries/General.git" is much faster than doing "] add Plots"

dagw · on March 25, 2021

Is this on Windows? I've experienced the same for the past few releases in Windows. For some reason that clone step is ridiculously slow.

oxinabox · on March 25, 2021

Window registry issues have been a thing.

I have heard it is to do with how windows antivirus works. Since the registry is like 10,000 seperate files. It chokes on them.

I have heard there is an upcoming feature to allow the package manager to work with the registry being kept inside a tarball, which is specifically being added to deal with this

siproprio · on March 25, 2021

It's not an av issue, at least not on my machine.

siproprio · on March 25, 2021