Whilst the Julia version currently beats Mojo, I fully expect both to approach basically the same performance with enough tinkering, and for that performance to be on par with C or Fortran.
A more interesting question is which version is more elegant, ‘obvious’ and maintainable. (Deeply familiar with both, but money is on Julia).
> A more interesting question is which version is more elegant, ‘obvious’ and maintainable. (Deeply familiar with both, but money is on Julia).
Yes, more than raw speed, what impresses me is that the version of code in [1] is already a few times faster than the Mojo code - because that's pretty basic Julia code that anyone with a little Julia experience could write, and maintain easily.
The later versions with LoopVectorization require more specialized knowledge , and get into the "how can we tune this particular benchmark" territory for me (I don't know how to evaluate the Mojo code in this regard as yet, how 'obvious' it would be to an everyday Mojo developer). So [1] is a more impressive demonstration of how an average developer can write very performant code in Julia.
User lmiq articulates the sentiment well in a later comment in the OP thread:
> what I find more interesting here is that the reasoning and code type applies to any composite type of a similar structure, thus we can use that to optimize other code completely unrelated to calculations with complex numbers.
It matters less whether super-optimized code can be written in Julia for this particular case (though there's some value in that too), the more important and telling part to me is that the language has features and tools that can easily be adopted to a general class of problems like this.
An even more interesting question is: which version will actually entice millions of independed and variably motivated actors from all walks of life to commit and invest to a particular ecosystem. Technnical and usability aspects play only a minor role in technology adoption. In particular the best technology doesnt always win.
My humble two pennies is that Julia is missing the influencer factor: being endorsed by widely known entities that will attract the attention of both corporate eyes and the hordes of developers constantly looking for the next big thing.
Your money might be on Julia but $100mln was just placed on the Mojo/Modular bet...
I've tried julia a handful of times. IMO, the thing slowing adoption is that the usecases where julia feels like the most powerful, optimal choice are too limited. For example
- Slow startup times (e.g., time-to-first-plot) kill it's a appeal for scripting. For a long time, one got told that the "correct" way to use julia was in a notebook. Outside of that, nobody wanted to hear your complaints.
- Garbage collection kills it's appeal for realtime applications.
- The potential for new code paths to trigger JIT compilation presents similar issues for domains that care about latency. Yes, I know there is supposedly static compilation for julia, but as you can read in other comments here, that's still a half baked, brittle feature.
The second two points mean I still have the same two language problem I had with c++ and python. I'm still going to write my robotics algorithms in c++, so julia just becomes a glue language; but there's nothing that makes it more compelling that python for that use. This is especially true when you consider the sub-par tooling. For example, the lsp is written julia itself, so it suffers the same usability problems as TTFP : you won't start getting autocompletions for several minutes after opening a file. It is also insanely memory hungry to the extent that it's basically unusable on a laptop with 8gb of ram (on the other hand, I have no problem with clangd). Similarly, auto-formatting a 40 line file takes 5 seconds. The debugging and stacktrace story is similarly frustrating.
When you take all of this together, julia just doesn't seem worth it outside of very specific uses, e.g., long running large scale simulations where startup time is amortized away and aggregate throughput is more important than P99 latency.
Some of what you have written seems pre 1.0 release and some pre 1.9. I have never seen anybody in the community say the correct way to use Julia is in a notebook. As far as I have seen, some people use a simple editor and have the REPL open, and most just use it in vscode.
You can do real-time applications just fine in Julia, just preallocate anything you need and avoid allocations in the hot loop, I am doing real-time stuff in Julia. There are some annoyances with the GC but nothing to stop you from doing real-time. There are robotics packages in Julia and they are old, there is a talk about it and compares it with c++(spoiler, developing in julia was both faster and easier and the results were faster).
I have been using two Julia sessions on an 8gb laptop constantly while developing, no problem. LSP loads fine and fast in vscode no problem there either.
The debugger in vscode is slow and most don't use it. There is a package for that. The big binaries are a problem and the focus is shifting there to solve that. Stacktrace will become much better in 1.10 but still needs better hints(there are plans for 1.11). In general, we need better onboarding documentation for newcomers to make their experience as smooth as possible.
The recommended solution to slow startup times has always been keep a repl open. That's basically the same workflow as a notebook in my mind. Like I said this means there is a large class of tasks julia a doesn't make sense for because paying the startup cost is too expensive compared to python or go.
I just timed vscode with the lsp. From the point I open a 40 line file of the lorenz attractor example, it takes 45 seconds until navigation within that same file works, and the lsp hogs 1 GB of memory. That's 5x the memory of clangd and 20x worse performance; hardly what I would consider a snappy experience.
I have no doubt that julia can be shoe-horned into realtime applications. But when I read threads like this [1], it's pretty clear that doing so amounts to a hack (e.g., people recommending that you somehow call all your functions to get them jited before the main loop actually starts). Even the mitigations you propose, i.e., pre-allocating everything, don't exploit any guarantees made by the language, so you're basically in cross-your-fingers and pray territory. I would never feel comfortable advocating for this in a commercial setting.
I don't know man, I just tested vscode and it's almost instant, loads every function from multiple files in less than 5 seconds. I'm on a 13-inch intel Mac and Julia 1.11 master (1.9 and 1.10 should be the same).
Having a REPL open is not the same thing as a notebook, if you feel like that, cool I guess.
That thread is old and Julia can cache compiled code now from 1.9 and onward. However, it can not distribute the cached code(yet).
Writing the fastest possible real-time application in c/c++ has the same principles as in Julia. It's not as shoe-horned as you might believe.
When developing Julia, the developers chose some design decisions that affected the workflow of using the language. If it doesn't fit your needs that's cool, don't use it. If you are frustrated and like to try the language come to discourse, people are friendly.
>I don't know man, I just tested vscode and it's almost instant, loads every function from multiple files in less than 5 seconds. I'm on a 13-inch intel Mac and Julia 1.11 master (1.9 and 1.10 should be the same).
I know, I'm always "holding it wrong". And that's the problem with julia.
> Having a REPL open is not the same thing as a notebook, if you feel like that, cool I guess.
Both workflows amortize the JIT times away by keeping an in-memory cache compiled code. This makes a lot of smaller scripting tasks untenable in julia. So people chose python instead. That means julia needs a massive advantage elsewhere if they are going to incorporate both languages into their project.
> When developing Julia, the developers chose some design decisions that affected the workflow of using the language. If it doesn't fit your needs that's cool, don't use it. If you are frustrated and like to try the language come to discourse, people are friendly.
This thread was about why julia hasn't seen wider adoption. It's my contention that the original design decisions are a one of the root causes of that.
I just tried it from the Windows command line and this benchmark with the plots ran in what seemed like instant, and some simple timing showed it was under 2 seconds with a fresh Julia v1.10 beta installation. That seems to line up with what amj7e is saying, and I don't think anyone would call the Windows command line the pinnacle of performance? That's not to say Julia's startup is fast, but it has improved pretty significantly for workflows like due to the package caching. It needs to keep improving, and the work to pull OpenBLAS safely out of the default system image will be a major step in that direction, but it's already almost an order of magnitude better than last year in most of the benchmarks that I run.
I think the person you are replying to was using notebook as shorthand for interactively. You don't write scripts that you call, you have to have a repl open to interactively feed code to.
Autocompletion in Julia is also just terrible and the tooling really is lacking compared to better funded languages. No harm in admitting that (when Julia had no working debugger some people were seriously arguing that you don't need one: Just think harder about your code! Let's please bury that attitude...)
This certainly has not been my experience with Julia people. Sure there are opinionated people in every community, but most of pain points are acknowledged and known.
I can confirm that there were multiple (heated) arguments on Discourse, where some posters completely dismissed the need for debuggers in general. I remember it quite well.
It was very strange, but I don't think it says anything about the community, except that people have different opinions and preferences, like in any community.
>> For a long time, one got told that the "correct" way to use julia was in a notebook. Outside of that, nobody wanted to hear your complaints.
> I have never seen anybody in the community say the correct way to use Julia is in a notebook.
patrick's comment is fully in the past tense for this part, and that was indeed a pretty common thing for a long while in the past. Especially pre-1.0, before Revise became mature and popular, an oft-recommended workflow was to use a notebook instead of an editor. Or it would come in the form of a reply to criticism about the slow startup or compilation latency - "the scientists who this language is made for use notebooks anyway, so it doesn't matter" was a common response, indirectly implying that that was the intended way to use the language.
You must mean REPL, not notebook. I've been following the community since before the move to Discourse, and "use the REPL" surely outnumbers "use a notebook" by orders of magnitude.
Here on HN (in threads about Julia) the focus was generally on notebooks as I remember it. That's the context I assumed, but if it's about Julia fora in general I agree, the REPL had/has been talked about much more often than notebooks.
About the show startup times, they have been worked massively in the latest version of Julia. Mainly in version 1.9, which is the first version of Julia that saves native compiled code. You can RAAF more about that in the released blog [1].
On garbage collection and real-time applications, there is this [2] talk where ASML (the manufacturer of photolithography machines for TSMC) uses Julia for it. Basically it preallocates all memory needed before hand and turns off the garbage collector.
On the same more about real-time, if your call stack is all type stable [3], the you can be sure that after the first call (and subsequent compilation), the JAOT compiler won't be triggered.
About static compilation, there are two different approaches
* PackageCompiler.jl [4], which is rally stable and used in production today. It has he downside of generating huge executables, but you can do work to trim them. There is still work to do on the size of them.
* StaticCompiler.jl [5], which is still in the experimental phase. But it is far from being completely brittle. It does puts several restrictions on the chide you can write and compile with it, basically turning Julia in a static type language. But it had been successfully used to compile linkable libraries and executables.
Some of the concerns you have about usability in your third paragraph have been worked on with the 1.9 and 1.10 (coming) releases. The LSP usage is better thanks to native code caching, maybe you can try it again (of you have time). The debugging experience I honestly think is top notch if you're using Debugger.jl+Revise.jl [6] [7], still I know there are some caveats in it. About stack traces, there is also a more of work done to make them better and more readable, you can read the with done in these PR's [8] [9] [10] [11, for state of Julia talk].
Still, I can understand that Julia might not be able (yet) to cover all the usecases or workflows of different people.
IMO the reason Julia gets to be this fast is because of LLVM, and the guy who created LLVM is also the creator of Mojo so there is something to be said about that
My understanding is that Julia gets to be this fast because the language design was optimized for performance from the beginning, by clever use of its type hierarchy and multiple dispatch for compilation into very specific and efficient machine code (plus a thousand other little optimizations like constant propagation, auto-vectorization, etc.)
LLVM helps with its own optimizations, but more and more of those optimizations are being moved to the Julia side nowadays (since the language has more semantic understanding of the code and can do better optimizations). I believe the main thing LLVM helps with is portability across many platforms, without having to write individual backends for each one.
Not necessarily. Julia will always be 11 years older than Mojo, no matter how old both of them get, and that advantage won't shrink. Not to mention, Mojo is a superset of a 40-year old language with billions of dollars of development poured into it, plus an extra hundred million poured directly into Mojo itself. If we go by resources spent on each, Mojo has had gotten about 5x more investment than Julia.
yeah pretty much any strongly performance oriented modern language should be able to be massaged into emitting whatever instructions should give close to optimal performance here.
It's always fun though when one language does better naïvely in a benchmark to delve in and see how to match or surpass them, and see if it was worth the trouble.
Microbenchmark performance for languages in this class definitely shouldn't be seen as a strongly deciding factor though.
I disagree, actually. I have found microbenchmarks to be very informative to understand why a language is fast in some cases and slow in others.
It's not only the actual benchmark numbers though. Its understanding the code that reaches those numbers: Can Julia do explicit SIMD? How awkward is that in one language or the other? Are there idiosyncratic bottlenecks? Bad design decisions that needs to be worked around in one language but not the other? And so on.
In my opinion, the issue that will make more of a difference in the long run is Mojo's first-class support for AoT compiled binaries (as well as JIT compilation).
Julia's poor AoT support (with small binaries) is a major Achilles heel. I really wish that the Julia developers had taken that more seriously earlier on.
If anyone is interested in compiling small binaries with Julia do check out staticcompiler.jl and supporting statictools.jl that manages to produce small binaries without the Julia runtime ,
Ofcourse it's a wip , and not fully mature , I'm just putting it out there for people to know.
There's some really cool demonstrations come out of this ...
Yep, I was aware of StaticCompiler.jl. I wish it was more mature.
Static compilation is indeed possible with Julia. But it's very limited in its capabilities and certainly not as effortless as a simple `mojo build myfile.mojo`.
Well, it is as simple as that... using PackageCompiler.jl [1] (PkgC.jl). It does create huge executable (you can easily trim them), but those are relocatable and portable between machines as they include most (if not all) of the dependencies needed to run them. They are already used in production in several places. I don't have the link at hand right now, but maybe someone else might jump to give the link to talks given by Chris Rackaukas on this.
From my personal experience. I've done graphical apps in GTK3 in Julia with PkgC.jl cross-compiling from Linux to Windows. And they worked. :)
PackageCompiler.jl is more fully featured, and I'm definitely looking forward to further developments :) But it still has some pain points:
* Massive executables (which you mentioned). This makes it very difficult to use with embedded systems.
* Functions are not precompiled by default. You need to write a precompile script [1], which leads to a "two script problem": one script to do what you actually want, and another script that (hopefully) hits all the types you'll possibly need at runtime. And yes you can use `--trace-compile=file.jl` or SnoopCompile.jl instead, but this is still another step I need to worry about when compiling something.
While the second part is true, it is a fundamental restriction because Julia compiles a method until it is actually hit and types are defined for the call stack. In equivalent manner, for StaticCompiler.jl you need a declarations of the methods/symbols you need to export. So, while it is true you need to do an extra step compared with other languages, it is not a fundamental problem with respect to PackageCompiler.jl, but rather a design restriction given how Julia works.
That's not a fundamental restriction or design problem with Julia, you're just describing JIT compilation. Plenty of languages can be compiled either ahead-of-time or just-in-time. Actually, this should be unusually easy for Julia, because their JIT compiler isn't a heuristic-based compiler like JITs for Python or JS.
The problem is funding. There are 0 full-time employees working on this issue because JuliaComputing has gotten about 10% of the funding Mojo has.
FWIW, the core Julia developers seem to be taking this more and more seriously, and AoT compilation to small binaries seems more of a "when" question than an "if" at this point. Open source development - without multi-million dollar support from outside - is unpredictable, but I wouldn't be surprised if a year from now, writing a restricted subset of Julia allowed you AoT compilation to reasonable binaries (and not something as restricted as StaticCompiler.jl requires, just avoiding some of the extreme dynamic features).
Boeing is probably interested in Julia for manufacturing optimization, so their interest is likely in improving the optimization ecosystem around JuMP [1], Optim [2], etc., and compiler improvements related to that.
But we can only guess from the outside, and it's ultimately upto JuliaHub to decide how to spend the money, so I'll cross my fingers and hope that this gets us AoT static compilation sooner!
13 million is substantial but not even close to the 100 million Modular got. Which really makes me wonder what Modular has been doing with that money, if they're still getting beat in benchmarks like this...
First of all, Mojo is quite new. Secondly, there might not be much CPU performance left on the table for that benchmark, no matter how much money you throw at it.
This is my current pain with Julia. It makes deploying code require the entire environment, or a PackageCompiler built sys-image. I've played with static compiler, and other techniques. They are sadly quite brittle for my previous use cases. Lack of ability to use threads in a static compiler built binary was a deal killer for me.
I think in the long run the real difference will be if mojo gets accepted into industry usage given it initially looks like it is closer to python.
Julia has struggled getting wider industry adoption and mojo is currently selling itself as minimal uplift from existing python which will help the sell in industry.
IMO this is just not a great example on either side. As others have pointed out, the Julia implementation was refined to be 8x faster. The Mojo code has to run the CPython interpreter to run numpy.
The example Mojo code does not run the computations with numpy. It uses the extensions of Mojo to do it, testing the capabilities of this extensions, which are the ones who promises the speed up. I must admit otherwise that not a lot of work has been done to optimise it as the optimised Julia version.
> The Mojo code has to run the CPython interpreter to run numpy.
Yes, the need to run CPython interpreter is what makes Mojo slow (and it will remain that way, unless they abandon their "superset of Python" promise).
A bit OT but what is Julia's adoption rate nowadays ? I know there are people who thinks it's the best, others thinks it's not going to cut it, but well.. In your experience ? (my experience is: a little too slow to load, type hierarchies lead to unbearable error messages sometimes but looks like a serious attempt to replace whatever language in the math/physics/stats/... space)
Hard to know really since the language is open source and tries not to be too onerous with telemetry (though there is some limited opt-out telemetry in the package manager).
Regarding your negative experiences, the bad news is that we haven't solved those issues, but the good news is that we're making real progress on them. Version 1.9 released in may of this year and is the first version to cache native code from packages, which makes loading of julia code MUCH faster through more AOT compilation, and there are even more improvements coming in v1.10 later this year. https://julialang.org/blog/2023/04/julia-1.9-highlights/#cac...
Error messages are also receiving a fair amount of attention, but it's a hard problem and there's less agreement on what the best way forward is. However, there's been some good work going into improving the readability and clarity of error messages that I think will help alleviate these struggles.
I know you're doing a lot of great work, I'm following Julia rather closely. It's just that as of now, it's not easy enough to grasp to make quick tests at work (I'd have some use there, but I have to be on schedule with the projects).
I've used a bit for various pet projects (mainly some graph search and JuMP stuff) and it was convincing. But now I can see Fortran perform in real production code (where it shines, at the cost of being so antiquated that it's not funny anymore) and my expectations for Julia are now higher.
I'll give it another try 'cos you spend some time answering my question :-) (and because the charts in the 2nd provided link are just really convincing)
oh cool, I'm interested to see what your impressions are. Depending on your exact workflows and comfort levels, I could imagine that you either find that things are much much better, or that only moderate progress has been made.
Let us know if you're experiencing any new painpoints too, or if things like code loading aren't as fast as you had hoped, there might be things we can do to help.
It is still fascinating that lisp languages lost to python for AI and data processing and now pretty much everything else. In a perfect world , we would be using lisp or lisp like languages for everything
It is still fascinating that lisp languages lost to python for AI and data processing
To a first approximation, the only people that love lisps are people with a solid computer science background, and most people working with AI and data processing day to day do not have a computer science background. They're scientists, engineers and mathematicians who see programming and programming languages as a tool needed to do their 'real' job and not as an end in itself. Python is the perfect language for people who want to learn as little programming as possible so that they can get on with what actually interests them.
I think the secret is that python is so slow that you have to vectorize and call a library written in C to do any serious math. In 2008 this was a serious downside, but it meant that a whole community got used to slicing, multi-indexing, specialized functions like cumsum, and shared idioms. As a result, when the GPGPU revolution hit, you could write vectorized gpu code in any language, but the shared idioms meant that python programmers had the unique superpower of being able to read each others vectorized gpu code.
If you mean “primarily uses S-expressions” then I guess I dont really see why that’s so important to you.
If you mean a language that is semantically similar to lisps and learned a lot of the important lessons that Lisp taught the programming world, I think Julia is one of the Lispiest languages in this space right now.
The syntax may not be S-expression based on the surface, but our Exprs are actually essentially just S-espressions so writing syntactic macros is very easy. The language is about as dynamic as is possible without major performance concessions, and is very heavily influenced by a lot of design ideas from the CLOS, with some features missing but also some cool features CLOS doesnt have.
I certainly consider Julia to be a Lisp, and I’m pretty sure that’s what the person you responded meant, too. His point remains true: Julia appears to have little chance of overtaking Python, except in some tiny niche areas. And even in these niche areas, I fear that Julia will end up losing to Mojo.
I really hope I am wrong. I love Julia and would like to see it succeed everywhere, but it doesn’t seem to be happening.
I’m not sure I would consider the scientific computing that Julia targets a “tiny” niche. It’s worth remembering that Python is 30 years older than Julia, and Julia has only been in a really usable state for 4 or 5 years. You can’t expect it to displace Numpy/Scipy/etc overnight. Especially if you include machine learning, where there’s just huge momentum with large corporations having massively invested in Python frameworks for at least 10 years. Subjectively, I’m seeing quite a lot of growth in Julia. It’s certainly a much stronger language than the Python/C++ combination that currently has the biggest market share in that area.
Also, there’s nothing wrong with niches. Julia is undoubtedly less of a general-purpose language like Python, but it very much shines in its domain.
Can you elaborate some more on this? My worldview assumed that a lisp used a list as a primary code/data structure and Julia doesn't seem to be doing that... Of course it does provide a way to manipulate code and data because of its macros. But what makes a lisp a lisp?
I think the commenter refers to the fact that in Julia, code is data, even if it is not represented as lists. This allows the existence of macros, which are syntactic sugar to functions modifying code. Also that most (all?) lisps have multiple dispatch as a fundamental part of the language, and Julia does too.
Every time this pops up, I feel compelled to point out how different Julia macros look like from regular Julia code, especially once the macros get complicated.
"Code as data" is not quite the same thing as homoiconicity, which I feel is Julia's missing piece.
While I consider julia lispy, I do think it's quite a reach to call it a lisp. I also think it's quite a reach to say julia lost in this space. It's still working it's way up.
Julia is growing and evolving and finding new users and niches. It certainly hasn't 'won', but it's a bit early to call it a loss.
The language is a "reasonable" Lisp, with the caveat that it doesn't have things like reader macros (which is a good thing IMO, and helps avoid the Curse of Lisp).
I'm not sure how much of the "breakloop" functionality Infiltrate.jl provides, but at least the runtime re-definition of types isn't supported in Julia, and is one of the shortcomings of the Revise.jl based workflow.
All this is not to take away from the original point, Julia does get you a big chunk of the way to being a Lisp and gives you a lot of expressive power. It's just to say that Julia is not just a reskinning of a Lisp with familiar syntax, it has some important design and implementation differences.
Not if you want to avoid condescending "I cannot look at the Python code my eyes hurt" comments. Good to know the Julia community hasn't made any progress in that regard, though.
Seems like we are speaking from different experiences. As someone who witnessed multiple requests like “please avoid hyperbole when it comes to criticizing other languages” in Slack, Discourse or Twitter, I interpret “I know I shouldn’t say so” differently.
I would say 1 in 32 is also about the experience. I stopped visiting Discourse, chatting in Slack because I found it exhausting that every time Python is mentioned someone came up with a different way of saying how much they hate Python. I know I wasn’t the only one disturbed by this but in the end communities make their own choices.
Someone feeling the need to bash other languages is universal. For example, every Julia post here has people bashing it, in a way that is often tangential to the topic of the post. I'm actually surprised this thread doesn't have someone complaining about 1-based indexing in Julia.
Only if it's the Correct opinion. Try being a julia user but thinking julia kinda sucks. That's a much more hostile experience than being a c++ dev and thinking c++ kinda sucks.
It's really all down to your tone and attitude. If you're hostile, demanding and negative, you will indeed get pushback, but that's human nature.
Some people feel that they should get to act like a prick, while everyone else should be humble and courteous.
If you act like a decent person person, there's no problem pointing out weak points in Julia and request help to work around it. If your only input is "Julia kinda sucks", what kind of feedback do you feel that you are owed?
There was a recent post in Julia's Discourse about why people think the language has not caught on as much as it should. There were around 550 comments where half of them talked about why Julia sucks and what to do about it, there are spin-offs of the post continuing discussion. Let's just cut the bull shit, these are all tools, if one doesn't fit you just use the other.
Mojo released an example of their new language which will mean readability and simplicity compared to the Python implementation will surely have been a requirement… I get someone within Modular doing some horrific looking low-level Mojo stuff could get it much quicker.
The Mojo one is already doing some pretty horrific low-level stuff with fairly manual SIMD. That’s why it was faster than Julia in the initial example, and the edge is lost when a couple of posters did similar things for Julia.
We have yet to see Mojo do any "sufficiently smart compiler" optimizations that Julia or similar languages don't already do. The Mojo code in the blog post does the same ugly optimizations to get good SIMD as the Julia code.
Convincing LLVM to vectorize is still a problem in both languages. I do hope Mojo can make some headway there in the future. Especially since with MLIR they might be able to capture some higher level semantics Julia can't.
> Mojo released an example of their new language which will mean readability and simplicity compared to the Python implementation will surely have been a requirement…
Did you read the Mojo code? It’s very messy and low-level dealing with explicit SIMD intrinsics and such.
I don't really believe you ran either the Mojo or the Julia code. There's no way your single-threaded C code outperformed multi-threaded simd optimized Julia or Mojo. It's flat out impossible.
The only other explanation is if you ran the non-simd Julia version under a single thread.
I did. Running with threads improves performance by 50%, but is still nowhere near C performance. My machine only has two cores so threading doesn't help much.
That's interesting. It makes sense that a two core machine doesn't benefit too much from multithreading, but "nowhere near C performance" is pretty surprising. I'll try out both the programs around this weekend on my own fairly anaemic machine, and see how they fair for me. Thanks for responding!
Cool. If Julia runs much faster for you than for me I'd be interested in hearing it. I was honestly surprised the performance was so bad so perhaps I did something wrong.
Would the C compiler automatically exploit vectorized instructions on the CPU, or loop/kernel fusion, etc? It’s unclear otherwise how it would be faster than Julia/Mojo code exploiting several hardware features.
In a HLL like Julia or Mojo you use special types and annotations to nudge the compiler to use the correct SIMD instructions. In C the instructions are directly usable via intrinsics. Julia's and Mojo's advantage is that the same code is portable over many SIMD instruction sets like sse, avx2, avx512, etc. But you generally never get close to the same performance hand-optimized C code gets you.
That is not "the same as C" and you certainly do not achieve the same performance as you do with C. Furthermore my point, which you missed, was that developers typically use different methods to vectorize performance-sensitive code in different languages (even Python has a SIMD wrapper but most people would use NumPy instead).
what's the difference? an llvm (or assembly) intrinsic called from Julia and one called from c will have exactly the same performance. c isn't magic pixie dust that makes your CPU faster.
That SIMD.jl doesn't give you direct control over which SIMD instructions are emitted, and that SIMD code generated with that module is awful compared to what a C compiler would emit. The Mandelbrot benchmark is there. Prove me wrong by implementing it using SIMD.jl and achieving performance rivaling C. Bet you can't.
I wasn't talking about using SIMD.jl. I was talking about the implimentation of the package (which is why I linked to a specific file in the package) which does directly (with some macros) generate simd intrinsics. As for the performance difference per core you're seeing, it's only because your C code is using 32 bit floats compared to the 64 bit floats that Julia is using here.
He has a point. Currently there is no way in Julia of checking with CPU instructions are available. So in practice, it's impossible to write low-level assembly code in Julia.
IIUC, SIMD.jl only works because it only provides what is guaranteed by LLVM to work cross-platform, which is quite far from being able to use AVX2, for example.
IIRC it relies on HostCPUFeatures.jl which parses output from LLVM. However, this means it just crashes when used on a different CPU than it was compiled on (which can happen on compute clusters) and it crashes if the user sets JULIA_CPU_TARGET.
I don’t find this benchmark very relevant but I still enjoyed the article and comments.
I am keenly interested in Mojo, and have been running the local SDK for a few days on my Linux laptop.
I was also very keen on Julia a few years ago. I evaluated Julia as a true general purpose programming language: ML, DL, string processing, web use, etc., and it looked very good. Still, I didn’t switch from my go to languages Common Lisp, Python, and Scheme.
I have some hope, but limited expectations, that Mojo may become my one general purpose programming language. I started a template for a Mojo AI and general programming book. It helps me to write about new tech I am very interested in.
Does anyone know of examples of Mojo for GPU computations? The website makes multiple claims about it being able to run seamlessly on the GPU but I couldn't find any documentation on it...
The website is more aspirational (as a public Todo list/roadmap) than descriptive in many ways. It's more of an investor-presentation than a developer-overview.
there are so many fractal benchmarks floating around, but i’ve never seen any cool interactive fractal applications, eg interactive visualizations that smoothly redraw changes over time, or respond to input.
Look at Taichi at Github. This library for Python seems not very popular and unaware. Maybe, because it is a Chinese development, but Taichi is simple and compiles directly down to kernels on CUDA, GPU, Metal, Vulkan and has batteries included. Beats the fastest Mojo implementation of the Mandelbrot set about 260 times faster.
https://github.com/taichi-dev/taichi
There is a package for Julia that does this, it's called InteractiveViz.jl [1]. It is rally neat and used GPU rendering underneath with (GL)Make.jl [2].
As long as you can copy and paste Python code to Mojo and it is 1:1 compatible with all your existing libraries and is hundreds of times faster than Python, that is much better than wasting time rewriting it in another language that is 8x faster than Mojo (in its first release) with hand-optimizations from Julia language experts.
I expect Mojo's first release to be fast enough that it would get the Python folks using it over Julia and that the next Mojo release or future ones will get another 8x or 10x faster.
The same outcome happened with Bun 1.0 and already claimed the top spot in speed and compatibility in the node ecosystem in its first release and it isn't even done yet.
The least amount of effort to get something done much faster wins by default.
It's only closed-source for now, with plans to open-source the language when it's more finalized - similar to LLVM early on. Not sure if it says so explicitly on their website somewhere, but Chris Lattner has stated that several times
This is not how you think Mojo works.
If you copy&paste python code into Mojo, you will not benefit from optimizations. You need to refactor your python code into Mojo code to gain compiler efficiency.
But if you look at the refactored code in the end, it is very ugly syntax (this is my personal opinion and might change with the evolution of Mojo)
Please actually read the Mojo code. It is full of complex hand-optimized simd instructions.
By comparison, the simd-optimized Julia code (especially the first version) is significantly more elegant and transparent.
Impressively, the ComplexSIMD Julia class was defined in a few simple lines, from scratch. I wonder what the, apparently built-in, complex simd functionality in Mojo looks like under the hood.
> Why not develop Mojo in the open from the beginning?
> Mojo is a big project and has several architectural differences from previous languages. We believe a tight-knit group of engineers with a common vision can move faster than a community effort. This development approach is also well-established from other projects that are now open source (such as LLVM, Clang, Swift, MLIR, etc.).
We have seen many languages cycle in popularity, but Julia is one of the few high-level languages that could actually match... or in some cases exceed C/C++ performance.
There are always tradeoffs, and it usually takes a few weeks for people to come to terms with why Julia is unique.
This is actually really easy. Most C/C++ code is pretty slow. Beating perfectly optimized C/C++ code by a notable margin is basically impossible (all relatively fast languages in the limit tend to converge to theoretical peak CPU performance), but real world code isn't perfectly optimized. The better question is on a performance vs effort graph who wins. Julia has a ton of major advantages here. The base language actually gives you fast implementations of common data structures (e.g. Dictionaries and BitSets) and BLAS/LAPACK wrappers to do linear algebra efficiently while still having your code look like math. The package manager makes it basically trivial to add packages for more complicated problems (no need to mess around with makefiles). the REPL makes it really easy to interactively tweak your algorithms and gives you easy ways to introspect the compilation process (@code_native and friends). Another major advantage is that Julia has macros that make it really easy to make local changes to a block of code's semantics that are compiler flags in C/C++. For example, consider `@fastmath`. In C/C++ you can only opt in to fastmath on a per-compilation unit level, so most projects that have one part that require IEEE handling of nonfinite numbers or require associativity in one part of the program will globally opt out of the non IEEE transforms. In julia, you just write `@fastmath` before a function (or for loop or single line) and you get the optimization.
All the other answers are true. But there is one thing I didn't see people saying. Thanks to the existence of macros, you can create->compile code in runtime. This allows for faster solving of some problems which are too dynamic, thanks to the fast compile times of Julia.
This might sound counterintuitive given that latency is a normal problem mentioned everywhere else about Julia. But, if you think about it, Julia compiled to native code a plot library from scratch in 15- seconds every time you imported it (before Julia 1.9 where native caching of code was introduced, and latency was cut down significantly).
This makes that problems where you would like to (for example) generate polynomials in runtime and evaluate then a billion times each, Julia can generate efficient code for ever polynomial, compile it and run it fast those billion times. C/C++/Fortran would have needed to write a (really fast) genetic function to evaluate polynomials, but this would have always (TM) been less efficient than code generated and optimised for them.
Edit: typos and added some remarks lacking originally
In general, parallelization was a messy kludge in older languages originally intended for single CPU machine contexts. Additionally, many modern languages inherited the same old library ecosystem issues with Simplified Wrapper and Interface Generator template code (Julia also offers similar support).
Only a few like Go ecosystem developers tended to take the time to refactor many useful core tools into clean parallelized versions in the native ecosystem, and to a lesser extent Julia devs seem to focus on similar goals due to the inherent ease of doing this correctly.
When one compares the complexity of a broadcast operator version of some function in Julia, and the amount of effort needed to achieve similar results in pure C/C++... the answer of where the efficiency gains arise should be self evident.
One could always embed a Julia programs inside a c wrapper if it makes you happier. =)
I do not dispute that C is not the fastest language. However C99 has the `restrict` keyword, which when combined with strict aliasing rules gives non-aliasing function arguments (I believe).
There are a few cases where it's easier to get LLVM to generate certain code I imagine. Semantic things like aliasing, in lining, and type information.
In general though it's just a question of which hoops you have to jump through for which language comparing C/C++/Julia/Fortran when using LLVM
If this wasn't their bread and butter and not an example they picked themselves, it would indeed not be fair... But they chose this example specifically and said this is how you get state of the art performance with Mojo, so...
One run, 7ms, 2ms? It is statistical fluctuation, not data, especially if it was run on "typical" developer laptop under "typical" session where browsers and other high-hitters are run in background and all these turbo-boosts and freq-governors are not turned off.
You need OS where almost all software (including most system services) are killed, CPU frequency is fixed (all power saving technology is turned off in both firmware and OS, I'm not sure it is possible on M-based Apple laptops, and many Intel-based laptops with castrated BIOSes are not suitable too).
You need warm-up loops to warm-up caches, or special code to flush caches, depends on what you want to measure.
You need to have tight-loop with your function called (and you must be sure, that it is not inlined by compiler into loop) which runs enough iterations to spend at least several seconds of wall time.
You need several such loop runs (10+, ideally), to have something which looks like statistics.
You need to calculate standard deviation and check that it is small enough (and you need to understand why it is not small enough if it is not).
I think you might not be familiar with the package used to benchmark Julia [1].
It does not fix processes to CPU's, or set kernel governor to performance, and there are fluctuations from usage of the computer. But it does run the function for several seconds and returns the distribution of the runs (the little graphics underneath the benchmarks). It calculates standard deviation and if some runs are too small (sub-nano seconds) it emits warnings saying the results might be caused by inlining and constant propagation.
The differences in runtimes you refer to are from use of different machines or different routines, which is completely expected. They also argue they need to run the Mojo code in the same machine as the Julia code to be able to give meaningful results and comparisons.
While to someone outsider it might be seen as done without care, I can asure you that this people are used to take extreme care on how they do benchmarks. Again, it might just be that you're not familiar with the tooling developed to do it.
I do think there is more benchmarks needed to be done, as the Mojo code hasn't be optimised yet and none in that thread was able to run both the Julia code and Mojo code in the same machine (outside of the OP). But I'm sure this will be done (I guess rather sooner than later). :)
I cannot edit my comment, but you're completely right about it being optimised. I will say that I'm not familiar with what idioms might the Mojo compiler be able to optimise better (a problem arising with the Mojo compiler still being closed source, in comparison with the open source nature of Julia), and, in this sense, I don't know if there is more "compiler friendly" code with the same semantics for Mojo that might allow it to get nearer to Julia results.
> I'm not familiar with what idioms might the Mojo compiler be able to optimise better
> I don't know if there is more "compiler friendly" code with the same semantics for Mojo
The Mojo code here is from the official docs [1], so it's from the people best placed to know what the most "compiler friendly" code for Mojo would be, and what idioms they should use to get the best performance Mojo can provide.
Thanks for the link! Then I guess there is not a lot more that can be said currently, except that maybe Mojo still needs more compiler work. Still, I prefer how readable is the Julia code. :)
You should read a little more closely before such strong condemnations.
The Julia macros @btime and the more verbose @benchmark are specially designed to benchmark code. They perform warm up iterations, then run hundreds of samples (ensuring there is no inlining) and output mean, median and std deviation.
This is all in evidence if you scroll down a bit, though I’m not sure what has been used to benchmark the Mojo code.
I returned to HN recently after a few years away, and I swear the average commenter has gotten much worse. Simultaneously more arrogant, more ignorant, and with worse reading comprehension.
I used to be able to count on commenters understanding what was written even if they disagreed, but lately I see many comments confidently responding to something that wasn't relevant or even present.
I suspect there's a bit of an Eternal September effect going on as a wider audience ends up here (possibly fleeing the continuing "enshittification" of most all for-profit online fora)
These are not single runs of the code. The Julia code uses `btime` from BenchmarkTools, which runs many iterations of the code until a certain number of seconds or iterations is reached. The Mojo code uses `Benchmark` from a `benchmark` package, which I assume does similar things.
Beyond that, this is one person getting curious about how a newly released language compares to an existing language in a similar space, and others chiming in with their versions of the code. If you have a higher standards for benchmarks and think it will make a difference, you're welcome to contribute some perfect benchmarking results yourself.
Come on, of course this is not a thorough benchmark, but just a random thread in a forum, where someone wants to get a feeling for the performance of a new technology.
They could have used @benchmark instead of the @btime macro, though. The first gives you the statistics, you asked for, whereas the second one is a thin wrapper around @benchmark, that just prints the minimal time across all runs.
Nevertheless the takeaway of this thread is pretty clear, even without @benchmark: The performance difference mainly stems from SIMD instructions.
This is using BenchmarkTools which does most of these things ;) And the numbers are very nicely reproducable.
But yes, those benchmarks are hard to compare, especially if they dont run on the same machine!
But, there is absolutely no reason to believe that Julia can't get optimal performance on a given hardware knowing Julia's compiler and architecture...This is much more a benchmark of Mojo, which hasn't been proven in that regard.
A more interesting question is which version is more elegant, ‘obvious’ and maintainable. (Deeply familiar with both, but money is on Julia).