Hacker News new | past | comments | ask | show | jobs | submit login

Last time this was up I wrote a single-threaded version in C which I'm pretty sure beats both Julia and Mojo: https://github.com/bjourne/c-examples/blob/master/programs/m...



> which I'm pretty sure beats both Julia and Mojo

Sometimes "showing the code" is not enough. Show me the benchmark.


I'll pass - Hacker News comments are not dissertations. The C code ran faster on my machine. YMMV.


Ran faster on your machine than what? Mojo was only runnable locally starting 2 days ago, so it doesn't sound like you compared it with Mojo.


I don't really believe you ran either the Mojo or the Julia code. There's no way your single-threaded C code outperformed multi-threaded simd optimized Julia or Mojo. It's flat out impossible.

The only other explanation is if you ran the non-simd Julia version under a single thread.


I ran this Julia code which one of the thread participants claimed to be the "best": https://discourse.julialang.org/t/julia-mojo-mandelbrot-benc... Run your own benchmarks if you don't believe me. IDGAF.


Just to make sure, did you start Julia with multithreading enabled? `julia --threads=auto` should do it (or `julia -tauto` if you prefer).

Without it, Julia starts single threaded, which means the code does all this work to enable multithreading and then doesn't get to benefit from it.


I did. Running with threads improves performance by 50%, but is still nowhere near C performance. My machine only has two cores so threading doesn't help much.


That's interesting. It makes sense that a two core machine doesn't benefit too much from multithreading, but "nowhere near C performance" is pretty surprising. I'll try out both the programs around this weekend on my own fairly anaemic machine, and see how they fair for me. Thanks for responding!


Cool. If Julia runs much faster for you than for me I'd be interested in hearing it. I was honestly surprised the performance was so bad so perhaps I did something wrong.


If you want speed, why not use, say, GLSL, instead of C? The code is far simpler, and runs much faster than C.


I'm pretty sure it doesn't. That looks exactly like the single threaded code for the good julia versions.


> pretty sure beats both Julia and Mojo:

Would the C compiler automatically exploit vectorized instructions on the CPU, or loop/kernel fusion, etc? It’s unclear otherwise how it would be faster than Julia/Mojo code exploiting several hardware features.


In a HLL like Julia or Mojo you use special types and annotations to nudge the compiler to use the correct SIMD instructions. In C the instructions are directly usable via intrinsics. Julia's and Mojo's advantage is that the same code is portable over many SIMD instruction sets like sse, avx2, avx512, etc. But you generally never get close to the same performance hand-optimized C code gets you.


This just isn't true. Julia lets you write intrinsics (either LLVM intrinsics or native assembly code) just the same as C. For example, https://github.com/eschnett/SIMD.jl/blob/master/src/LLVM_int....


That is not "the same as C" and you certainly do not achieve the same performance as you do with C. Furthermore my point, which you missed, was that developers typically use different methods to vectorize performance-sensitive code in different languages (even Python has a SIMD wrapper but most people would use NumPy instead).


what's the difference? an llvm (or assembly) intrinsic called from Julia and one called from c will have exactly the same performance. c isn't magic pixie dust that makes your CPU faster.


That SIMD.jl doesn't give you direct control over which SIMD instructions are emitted, and that SIMD code generated with that module is awful compared to what a C compiler would emit. The Mandelbrot benchmark is there. Prove me wrong by implementing it using SIMD.jl and achieving performance rivaling C. Bet you can't.


I wasn't talking about using SIMD.jl. I was talking about the implimentation of the package (which is why I linked to a specific file in the package) which does directly (with some macros) generate simd intrinsics. As for the performance difference per core you're seeing, it's only because your C code is using 32 bit floats compared to the 64 bit floats that Julia is using here.


He has a point. Currently there is no way in Julia of checking with CPU instructions are available. So in practice, it's impossible to write low-level assembly code in Julia.

IIUC, SIMD.jl only works because it only provides what is guaranteed by LLVM to work cross-platform, which is quite far from being able to use AVX2, for example.


Loopvectorization exploits avx512, when available. How is that achieved?


IIRC it relies on HostCPUFeatures.jl which parses output from LLVM. However, this means it just crashes when used on a different CPU than it was compiled on (which can happen on compute clusters) and it crashes if the user sets JULIA_CPU_TARGET.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: