The Accelerating Adoption of Julia

jrevels · on Oct 20, 2020

I think one of Julia's greatest indicators of long-term success is the variety of commercial users/companies from a diverse pool of technical domains/industries that are all excited, willing, and capable of contributing back to the ecosystem.

We had a BoF at this year's JuliaCon revolving around this topic [1] and are now planning the first Annual Industry Julia Users Contributhon as a result.

I especially think that well-configured Julia + K8s setups have the capacity to really tighten exploratory data science <-> operational data engineering feedback loops in industrial settings in a way that is much more ergonomic, generically useful, and portable/extensible than using pre-baked frameworks to achieve something similar. Julia-centric tooling for coarse-grained workflow orchestration, experiment tracking, data provenance, etc. would be nice, though I also think existing generic tools in this vein (e.g. Argo) could probably compose well too :)

A few different companies have nice in-house implementations of these kinds of setups, and Julia Computing is building a nice looking commercial product suite in this vein that I look forward to exploring more in the future (especially JuliaHub and JuliaRun).

[1] https://julialang.org/blog/2020/09/juliacon-2020-open-source...

dragontamer · on Oct 20, 2020

The important part of Julia is its programming model.

The implicitly parallel fork-join model is easy to program and incredibly general. And I'm glad to see a high-level language embrace it.

-------

I probably should note some other languages of this model: CUDA, OpenCL, ISPC, OpenMP, OpenACC. Most of these other languages are low-level, where you manage memory directly (and GPUs have very little memory per thread. So manual memory management is still hugely important)

But for speed of development, prototyping, and higher level reasons, a language like Julia that implements this "mindset" is going to be hugely important moving forward.

------

The fact that parallel fork-join scales from 2 CPU threads all the way up to thousands of GPU-threads... or CPU + SIMD (such as AVX512), is proof that this methodology is useful. I feel like people are sleeping on this model: its hugely useful for scaling on what I believe to be the computer of the future.

amelius · on Oct 20, 2020

Could you be more precise about what do you mean by fork-join model? Is this similar to how in Go starting a thread is easy and waiting for it to finish and collecting results is easy as well?

dragontamer · on Oct 20, 2020

I don't program in Go, so I can't speak to that.

The fork-join model generally starts off with one-thread, often called "The Single" or "The Master". Its where main() begins.

But somewhere along the line, you discover a portion of code that needs to be multithreaded. So you fork into many threads: maybe dozens of threads, or thousands in the case of SIMD or GPUs. This multithreaded portion is then "joined", that is, the "master" thread refuses to continue until all children are done executing.

Lets say there's a raytracer you are writing.

    main() {
      // In the "Single" thread
      Foo();
      Bar(); 

      fork(doRayCastingInParallel(), 5-million); // Spawns 5,000,000-threads, one for each ray
      // Implicitly wait for all 5-million threads to finish
      
      doSomethingElse(); 
    }

Now, raycasting has a bunch of things to do, but you can break it up into the same steps for all 5-million threads.

    doRayCastingInParallel(int myIdx){
      doTrigonometry(ray[myIdx]);
      findCollision(ray[myIdx]);
      outputColor[myIdx] = calculateColors(ray[myIdx]);
    }

So from both the "single" thread, as well as your "child" thread, it feels like you're writing single-threaded code. But in actuality, your "child" thread is a 5-million spawned copy, executing in gross parallelism.

The "Fork-Join" model assigns a different "myIdx" number to each thread. So Ray#500 knows to do things differently than Ray#68121.

But otherwise, they run through the same code.

---------------

The important bit here is how general this mindset is. Its commonly used for GPU-programming (CPU issues a GPU command to spawn millions of threads often one-per-pixel, or even thousands per pixel). Then the CPU waits for the GPU threads to finish, and then just carries on.

But ISPC proved that you can use this fork-join model on SIMD units like AVX512 or SSE. Julia and Python are showing off how you can use it in higher level languages. Etc. etc.

OpenMP further proves that fork-join works on normal CPU threads. Maybe there's some other CPU-threading languages (but OpenMP is the one I'm personally familiar with).

pkulak · on Oct 20, 2020

Isn't that how nearly all multi-threaded code is written? Wouldn't the only other option be fire-and-forget? I've been writing code like this is nearly every language I've ever used for decades.

Are you talking about the actual fork() function, instead of, say, spawning inside a loop explicitly?

dragontamer · on Oct 20, 2020

> Isn't that how nearly all multi-threaded code is written

Producer-consumer queues say otherwise. As does Async. There's also message-passing. There's also thread-pools.

Task-based programming doesn't wait for the tasks to finish. So without the "join", you don't have an easy synchronization point to rely upon, and have to use explicit locks for synchronization.

------

Producer - consumer can't work with fork/join, because there's no "join" at all! The producer keeps producing, and the consumers just keep consuming.

Async is... a tangled mess. I'm pretty sure its just the modern "unstructured goto" style of spaghetti that has been rediscovered. Yeah, its super-efficient, but no, its not really easy to program.

Task-based parallelism is probably my next favorite style. Fork-join is super easy, but the join causes a significant waiting period and underutilization of the processor. Task-based provides more flexibility (at the cost of a little bit more complexity).

------

There's a ton of other models of parallelism. But fork-join seems to be the methodology that more and more work is being focused on.

nbadg · on Oct 21, 2020

> Async is... a tangled mess. I'm pretty sure its just the modern "unstructured goto" style of spaghetti that has been rediscovered. Yeah, its super-efficient, but no, its not really easy to program.

That's not inherent to async though, just a property of the implementations that have seen widespread adoption. There's nothing stopping async from implementing the same process tree semantics; in fact, there's at least one high-level async library that does exactly that (python-trio, https://trio.readthedocs.io/en/stable/)

leephillips · on Oct 20, 2020

As I mention in the article, Julia has task-based parellelism now, with a @spawn macro.

dragontamer · on Oct 20, 2020

That's wonderful to hear!

I'm looking into it: and it looks like "Stupid Fibonacci" proves that its working in Julia (https://julialang.org/blog/2019/07/multithreading/).

Task-based parallelism is fraught with more overhead than other patterns. But it really is a powerful construct: converting recursive ideas into multithreaded concepts rather easily.

peheje · on Oct 20, 2020

"thousands in the case of SIMD or GPUs" What does 'Single Instruction Multiple Data' have to do with threads?

Also fork join sounds just like normal threads to me in the way you are describing it.

Sorry what an I missing?

dragontamer · on Oct 20, 2020

Fork join by and large doesn't use mutexes.

If you need to synchronize, wait for the next fork/join cycle instead of doing explicit mutexes. The "join" provides your synchronization point.

If you have a complicated set of highly-synchronous calculations, then you don't fork at all. You simply use the "Single Thread" to perform all those calculations (and therefore negate the need of cross-thread synchronization).

Fork-join has low-utilization, but its very easy to program. In some cases, fork-join remains efficient (ex: spawn a thread per pixel on the screen), because all pixels do NOT need to synchronize with each other (or call mutexes).

-----

If a sync point within the children are needed, you usually make due with a barrier instead of a mutex.

> What does 'Single Instruction Multiple Data' have to do with threads?

SIMD processors, such as GPUs, emulate a thread in their SIMD units. Its called a "CUDA Thread". Its not a true thread in the sense of CPU-threads, but the performance by-and-large scales as if it were real threads. (With exception of the "thread-divergence problem")

Ultimately, the fork-join model translates trivially to SIMD. Any practitioner of CUDA, OpenCL, ROCm, or ISPC can prove that to you easily.

jschwartzi · on Oct 20, 2020

Yeah in this model you wouldn't need a mutex because each "thread" is independent and operates either on independent data or uses constant-but-shared data. As soon as the result of one thread depends on the result of another thread you have to have some mechanism for synchronization.

I mean it's not really any different from writing a C/C++ program that avoids use of mutexes by having each thread operate on separate parts of the process address space. I'm still intrigued but it's not mind-blowing to me to fork a bunch of threads and join them when the function execution completes.

dragontamer · on Oct 20, 2020

There's nothing mindblowing about Edgar Dijkstra's "Go To Statement Considered Harmful". Which largely argued that you should organize your code into easily composable function calls. Kinda obvious in hindsight.

Its more about discipline than anything else. A recognition that fork-join is much easier than other methodologies (such as Async).

peheje · on Oct 20, 2020

Thanks for the explanation. The only SIMD programming I've seen is where the programmer would carefully call the cpu-brand based instructions and painstakingly manage the memory register making sure the numbers to be added, multiplied etc. is evenly divided then given to the SIMD ALUs.

Sounds like what you are saying that fork join model translates easely by the compiler to these SIMD instructions?

Some compilers can also vectorize plain loops, but you would advocate for fork join?

dragontamer · on Oct 20, 2020

> Sounds like what you are saying that fork join model translates easely by the compiler to these SIMD instructions?

Why do you think CUDA has become so popular recently? That's exactly what CUDA, OpenCL, and ISPC does.

> Some compilers can also vectorize plain loops, but you would advocate for fork join?

CUDA style / OpenCL style fork-join is clearly easier than reading compiler output, trying to debug why your loop failed to vectorize. That's the thing about auto-vectorizers, you end up having to grok through tons of compiler output, or check out the assembly, to make sure it works.

ALL fork-join style CUDA / OpenCL code automagically compiles into SIMD instructions. Ditto with ISPC. Heck, GPU programmers have been doing this since DirectX 7 HLSL / OpenGL decades ago.

There's no "failed to vectorize". There's no looking up SIMD-instructions or registers or intrinsics. (Well... GPU-assembly is allowed but not necessary). It just works.

-------

If you've never tried it, really try one of those languages. CUDA is for NVidia GPUs. OpenCL for AMD. ISPC for Intel CPUs (instead of SIMD intrinsics, ISPC was developed for an OpenCL-like fork-join SIMD programming environment).

And of course, Julia and Python have some CUDA plugins.

peheje · on Oct 20, 2020

Must admit never tried it. Thanks for the insights I'll have a go at some point.

dragontamer · on Oct 20, 2020

If you got an OpenMP 4.5 or later compiler (and GCC / CLang both support OpenMP), you can also use #pragma omp simd.

https://www.openmp.org/spec-html/5.0/openmpsu42.html

Its not as reliable as a dedicated language like OpenCL or ISPC. But this might be easier for you to play with rather than learning another language.

OpenMP is just #pragmas on top of your standard C, C++, or Fortran code. So any C / C++ / Fortran compiler can give this sort of thing a whirl rather easily.

---------

OpenMP always was a fork-join model #pragma add on to C / C++. They eventually realized that their fork-join model works for SIMD, and finally added SIMD explicitly to their specification.

leephillips · on Oct 20, 2020

And Fortran co-arrays, no?

TheRealKing · on Oct 21, 2020

Fortran Coarray is far beyond simple Fork-Join. It enables one-sided remote memory access, something that is impossible with OpenMP or CUDA, as far as I am aware, and requires the highest levels of skill to do it right in MPI.

0xffff2 · on Oct 20, 2020

Do you mean it doesn't use explicit mutexes? I don't see any way this model would avoid using mutexes (or some mutex-like construct) under the hood, in which case I'm not sure I see the advantage.

The term "fork/join cycle" is intriguing and meaningless to me as a non-Julia user. What exactly is this cycle?

dragontamer · on Oct 20, 2020

Well, I only visit Julia now and then. I've been programming various parallel programs though for myself in a variety of languages, trying to grok parallelism better.

> The term "fork/join cycle" is intriguing and meaningless to me as a non-Julia user. What exactly is this cycle?

https://www.researchgate.net/profile/Alina_Kiessling/publica...

There are many forks-and-joins across a program, when you're doing the fork-and-join paradigm. Each time the threads need to communicate, you join, and then use the "master" thread to pass data to all of the different units.

-------------

For example, most video-game engines issue a fork to calculate the verticies of all objects in the video game (the fork turns into a GPU call). This is called the vertex-shader.

Once all the verticies are calculated, the GPU joins these threads together, and the main-program / game engine continues.

The next step is the geometry shaders: so the CPU forks (aka: spawn thousands of GPU threads), and joins on the results of the geometry shaders. (Tesselation may spawn more vertexes. Ex: you model a rope as a square, but then the geometry shader turns the square into a rope-shape at this stage)

Then the pixel shaders. For every pixel of your 1080 x 1920 screen, a GPU SIMD-thread is forked off, and each pixel's final color is calculated based on the results of vertex-shaders and geometry shaders before it.

Each of these cycles is a fork-join cycle. Thousands of threads spawning, thousands of threads joining, the CPU calculating some synchronization data together, and then spawning thousands of threads again.

(In practice, modern game engines are now async for speed reasons. But the general fork/join model is still kinda there if you squint)

marmaduke · on Oct 20, 2020

Classing for example ISPC as low level invalidates this comment for me; ISPC among others provides a great interface to SIMD model for CPUs that simply isn’t available in Julia.

staticfloat · on Oct 20, 2020

Julia's SIMD programming model is still very much a work in progress; I think we have a way to go in providing the kind of flexibility and control that languages such as ISPC, Halide, TVM, etc... provide.

That being said, packages such as SIMD.jl [0], and LoopVectorization.jl [1] are making fantastic progress, to the point that LoopVectorization forms the basis of a legitimate BLAS contender, in pure Julia [2]. It's not totally there yet, but it's close enough that real work is being done in LV at OpenBLAS-like speeds.

As an aside, I find it incredible that these kinds of extensions can be built in packages thanks to the fact that Julia's compiler is extensible enough to allow for direct manipulation of the LLVM intrinsics being emitted by user code.

[0] https://github.com/eschnett/SIMD.jl [1] https://github.com/chriselrod/LoopVectorization.jl [2] https://github.com/MasonProtter/Gaius.jl

marmaduke · on Oct 20, 2020

It’s not a jab at Julia, really rather that ISPC provides a workable model that would later on be nice to see elsewhere.

> find it incredible that these kinds of extensions can be built in packages thanks to the fact that Julia's compiler is extensible

Come on, jeez.. Julia’s compiler is a Lisp-based LLVM driver: of course it can do these things.

celrod · on Oct 20, 2020

ISPC can be really good at SIMD-ing complicated control flow (ray tracers being the archetypal example). I'm interested in eventually working on something like that for Julia. In the mean time, it should be possible to deliberately write code to be compatible with something like SIMD.jl. I think I'd work on a project of trying to get that working via multiple dispatch with at least a moderately complex project, and let those experiences inform the kind of transforms an automatic compiler would need to both work and get good performance.

staticfloat · on Oct 20, 2020

> It’s not a jab at Julia

I didn't take it as such; there are legitimate shortcomings to any tool, I just wanted to provide pointers to other readers that the devs are aware of it, and that there is ongoing development to address it. :)

> Come on, jeez.. Julia’s compiler is a Lisp-based LLVM driver: of course it can do these things.

As someone who, before Julia, was firmly entrenched in C/C++/Python land, I suppose I am discovering many of these "obvious" things for the first time. :)

marmaduke · on Oct 20, 2020

> I suppose I am discovering many of these "obvious" things

sorry for the flippant remark then! it's great to be in discover mode, enjoy ;)

dragontamer · on Oct 20, 2020

ISPC is higher level than its peers, but its still new/delete based manual memory allocation. So in the great scheme of programming languages, its still rather low level (since you're manually handling memory).

Indeed: ISPC provides constructs to provide structure-of-arrays, and other low-level memory layout details. This is a good thing: these details have significant implications on the speed of your program.

Nonetheless, any language which wrangles with manual details of memory layout, or new/delete based memory allocation, is inevitably going to be classified as low level in my books.

> ISPC among others provides a great interface to SIMD model for CPUs that simply isn’t available in Julia.

If Julia can compile into GPU-assembly (which is innately SIMD), I'm sure an AVX-based compile could work eventually.

They may have to target AVX512 (since most GPU-assembly requires per-thread exec-masks), but the general concept is being proven as Julia can now compile down into PTX or AMDGPU assembly.

Julia's compile down to GPU-assembly / SIMD code is not supported in the "general case", only in select circumstances. But that's still an incredible boon for a high-level language.

marmaduke · on Oct 20, 2020

> but its still new/delete based manual memory allocation

I guess we aren’t solving the same problems: memory allocation is trivial is my domain, mapping complex nested control flow in SIMD is the hard part.

> Julia can now compile down into PTX or AMDGPU assembly.

Sure but Julia-as-syntax is nothing special now, Numba does this for a Python as well.

dnautics · on Oct 20, 2020

you can and you can't. Julia is composable -- Suppose I want to write a library to find compression polynomials for reed-solomon encoding system (see mary wootters' talks on youtube) for my storage product. I need a LU decomposition algorithm that operates on GF256 (which is just an 8-bit int). But the +/-/x/divide operations are all messed up. I'd have to rewrite LU decomposition. How confident are you that you can get even that right? I'm pretty good at implementing algorithms, but I've messed up LU decomposition before.

Then suppose I rewrite the LU decomposition algorithm, using python. Now I want to accelerate the search by running the search on GPUs. I have to re-rewrite the code from scratch. Each GF256 encoding has to have rejiggered operators, and so I need to rewrite custom GPU kernels, then figure out how to resequence the operations (* looks different for each GF256 encoding), etc.

This is all SUPER easy in julia.

fnord123 · on Oct 21, 2020

> (see mary wootters' talks on youtube)

https://www.youtube.com/watch?v=Gh578e98qAk

> Suppose I want to write a library to find compression polynomials for reed-solomon encoding system (see mary wootters' talks on youtube) for my storage product. I need a LU decomposition algorithm that operates on GF256

Surely you use isa-l[1].

[1] https://github.com/intel/isa-l

> Now I want to accelerate the search by running the search on GPUs.

GPUs are float oriented so I don't think you'll get the performance you hope for out of 8 bit integer operations. If you have interesting results to share I'd like to read them.

dragontamer · on Oct 21, 2020

> GPUs are float oriented so I don't think you'll get the performance you hope for out of 8 bit integer operations. If you have interesting results to share I'd like to read them.

You've never seen GPU Hashcat, or GPU Bitcoin / Ethereum mining?

Vega now has a huge focus on INT8 operations. NVidia can perform int operations in parallel with float operations (superscalar GPU cores)

fnord123 · on Oct 21, 2020

> You've never seen GPU Hashcat, or GPU Bitcoin / Ethereum mining?

I've heard of it, but afaik it hasn't been profitable to mine using a GPU for a long time due to the competition for hashes, power consumption, and rate they mine at. This is in contrast to ASICs which can mine even faster for less capex and opex.

marmaduke · on Oct 20, 2020

Then use Julia, that's great. My comment was that the SIMD support in Julia isn't sufficient for my problem, not that you can't do your GF256 linear alg on a GPU..?

dragontamer · on Oct 20, 2020

> I guess we aren’t solving the same problems: memory allocation is trivial is my domain, mapping complex nested control flow in SIMD is the hard part.

Those statements are just confusing to me. In most GPU-code, you have crazy amounts of parallelism and therefore don't really care what order those statements execute in.

As such, if you can allocate memory, and map those if/else statements into a collection of queues and/or stacks (depending on if you want a breadth-first search, or depth-first search pattern).

In effect: you use memory allocation to solve the complex control flow issue. Maybe its more obvious with code:

Instead of doing:

    if(baz()){
      foo();
    } else {
      bar(); // Thread divergence!!
    }

Do:

    if(baz()){
      pushIntoFooQueue();
    } else {
      pushIntoBarQueue(); 
      // Thread divergence, but not much of a penalty
    }

    while(fooIsNotEmpty()){ // No thread divergence at all
        task = SIMDPopFoo();
        task.execute();
    }

    while(barIsNotEmpty()){
        task = SIMDPopBar();
        task.execute();
    }

This is heavier on the memory units, because you now have to manage the data-structures. But this style practically negates the thread-divergence penalty completely. If you're lucky, your fooQueue and barQueue fit in __shared__ memory.

Bonus points: Not only is thread-divergence negated, but you also achieve effective load-balancing across your workgroup. If Thread#0 spawns 20 items for FooQueue, after pushing/popping from the queue, those 20 Foo-tasks will be assigned to 20 different threads.

-----

From there on out, you're just pushing / popping different parts of your code to various queues and/or stacks. But this is only really a valid solution if you have a decent memory allocator that knows where and how to clean up these queues / stacks (especially if you have nodes starting to point to each other for dependency management)

I haven't really solved this problem "in general", but is clear that the queues should be sorted into topological order, and that any tasks that depend onto each other need to be run in different iterations. It really depends on how much you're willing to spend on organizing this execution information.

In any case: the memory allocation issue is one-and-the-same with the thread-divergence / complex instruction flow issue to me. You need to create a data-structure that organizes the instruction flow, and any complex data-structure will need memory management as soon as you start linking things together. The above uses a queue or stack, but things can get more complicated.

--------

Not that Julia, Python, ISPC or anything really... solves this problem. But memory management is very useful for this "style" (I'm mostly doing ref-counted C++ with a custom allocator myself. But such trees or graphs of links can grow into the CPU-side and end up using the default memory allocator)

marmaduke · on Oct 20, 2020

> you use memory allocation to solve the complex control flow issue

This is an interesting remark I have to sleep on. I usually don't see this as possible, since I have a bunch of arrays which are allocated once, then a bunch of nested but static control flow. The only way I've found to go beyond single thread performance is with "whole program" SIMD which only worked in the ISPC/OpenCL/CUDA programming models.

dragontamer · on Oct 20, 2020

It doesn't work all the time. Sometimes its just faster to have thread-divergence than to hit VRAM constantly.

But its a different paradigm to do things: something to try if your standard if/else stuff isn't working out.

EDIT: If you can make due with just stack-allocation (or queue-allocation), if you have singular-sized tasks with predictable sizes, if all the information fits inside of __shared__ memory, and if thread-divergence is normally a problem... this methodology will probably work.

Oh, and for a hint:

    myIdx = prefixSum(__activemask());
    myTask = stack[stackTail + myIdx];
    if(myIdx == 0){
      stackTail -= __popc(__activemask()); // Lulz, I made a bug. Whatever, I think you understand...
    }
    __syncThreads(); // Barrier is important

Stack pops and pushes are pretty easy. activemask() provides your execution mask, and the logic needed to synchronously push / pop items to a stack (and probably to more complicated memory allocation functions that I haven't figured out yet)

marmaduke · on Oct 20, 2020

> I'm sure an AVX-based compile could work eventually.

Eventually is nice but it’s also easy to imagine that it never gets done. ISPC works today.

dragontamer · on Oct 20, 2020

The CUDA portion of Julia works today, and you can get 10Tflop or 20Tflop GPUs under $1000 to play with it.

It's a different language for a different computer. But the fundamentals are still there, laying the groundwork for the future.

Sure, ispc can win on latency. But the raw compute girth of GPU SIMD cannot be underestimated.

In any case, Julia has proven itself capable of compiling down to a SIMD instruction set and achieving nearly the full performance of those GPUs. Even if it's not a computer you prefer, the programming model and technology demonstrated here is clear.

marmaduke · on Oct 20, 2020

I agree with this and have the hardware but simply prefer to write CUDA kernels directly instead of in Julia syntax.

My comment was more geared to the CPU side of things since I write code whose users don’t usually have GPUs available.

There are also problem sizes which don’t fit the GPU’s more stark memory system separation: tens of CPU cores with tens of MBs of cache can move past GPUs in terms of memory bandwidth for such cases so banking on CUDA just doesn’t work for everyone.

gnufx · on Oct 20, 2020

OpenMP, in particular, is not limited to fork-join, and you don't want to be so limited for efficient, scalable numerical systems. Not that you get large-scale parallelism just with OpenMP. A recent keynote of Jack Dongarra's covers replacing the decades-old use of fork-join for (or course) dense linear algebra: https://www.iwomp2020.org/wp-content/uploads/iwomp-2020-K1-D... It's significantly in the past, not the future, of large-scale computation of the sort Julia particularly targets. I don't know why Julia should be stuck with that, though. Parallelism isn't new to Lisp-y systems, of course.

dragontamer · on Oct 20, 2020

I guess I still see Julia as a quick and dirty prototyping language akin to Python and some others. Clearly, Julia has ambitions for greater capabilities.

Apparently, Julia now supports some more complicated forms of parallelism: Async, @Spawn (aka: Tasks), and others. But @Thread based for loops are just easier to grok.

Overall, Julia is one of the few high-level languages that is explicitly adding in highly-threaded concepts like Fork-join, SIMD, or even GPUs to the language.

Yeah, some Python libraries extend these concepts to Python. But something like @Threads for ... end really demonstrates how the fork-join parallelism is a first-class member of the Julia language.

gnufx · on Oct 21, 2020

Any language supporting (current) OpenMP supports fork-join, tasking, SIMD specification, and offload, though you still want distributed memory, and overlapped communication and computation. Baking particular technology into a system isn't a good idea long-term, or even medium-term, unless you wait for the cycle of re-invention, like vector computing. (I don't know if Julia really does that.)

CyberDildonics · on Oct 20, 2020

Fork join parallelism is just one technique for using multiple cores and it is pretty ubiquitous. Most large arrays can be read by multiple threads at one time to let them all come up with part of the answer. I don't think this is specifically an advantage of julia.

outlace · on Oct 20, 2020

Julia as a whole still lags behind Python or Matlab (unsurprising given the age difference) but there are some packages in Julia that just steal the show compared to Python et al, such as the SciML packages and the probabilistic programming packages. I switched to Julia for Turing.jl when I was previously using Pyro. Much happier with Julia.

adamnemecek · on Oct 20, 2020

The best thing about julia is the interop with essentially any language you can name. Want to call numpy?

using PyCall

np = pyimport("numpy")

res = np.fft.fft(rand(ComplexF64, 10))

The interop with matlab and cpp is similarly painless.

https://github.com/JuliaInterop/Cxx.jl

cossatot · on Oct 20, 2020

I agree that this is great, but I think saying it's Julia's best attribute is underselling the language quite a bit. Personally I like how effectively it can replace those languages, at least for future projects.

adamnemecek · on Oct 20, 2020

You are right. One of the best things.

ZephyrBlu · on Oct 20, 2020

Why is language interop not this simple for everything?

It seems insane to me that it's this easy in Julia, yet most other languages are completely incompatible without transpiling one into the other.

adamnemecek · on Oct 20, 2020

Magic of macros.

eigenspace · on Oct 20, 2020

To be fair, there are no macros in that example. Just good ol polymorphism and exceptionaly hackable semantics.

adamnemecek · on Oct 20, 2020

Pycall does rely on macros and so do essentially all the other interop packages.

eigenspace · on Oct 20, 2020

Sure, they all provide nice macro interfaces, but nothing in the above example required those macros as far as I understand.

This is not to say macros aren't awesome.

adamnemecek · on Oct 20, 2020

I'm pretty sure that macros are involved in the invocation of the python function.

ed25519FUUU · on Oct 20, 2020

Why interop with a language that already solves the problems? Just use that language.

Enginerrrd · on Oct 20, 2020

Because it's a PITA sometimes to use c++. It's easy to accidentally go out of bounds on stuff.

I've arguably got a lot more experience doing scientific programming in c++ and FORTAN, than I do in python. Yet for quick prototyping where performance is almost never an issue, I still elect to use python because it lets me wrap my head around the problem faster. I spend less time fixing stupid errors and thinking about coding and more problem thinking about the problem I really need to solve.

Julia might be even better, I've heard good things, I've just never used it though. In the scientific computing and data science world that I deal with, I'd say at least 95% of the time there's never a need to jump from a hacked together prototype to a more production level product.

xh-dude · on Oct 20, 2020

My experience with Julia is not deep, but I’ve found it’s really nice to iterate on something that floats towards something that flies in one ecosystem. It’s not harder than Python to get the working implementation up and going - really there is a lower volume of idiosyncrasies.

roywiggins · on Oct 20, 2020

The only annoying thing I've found is the startup time for the REPL / notebook environments, unless you keep one running all the time. Importing biggish libraries takes some time even if they're already compiled.

I ended up making a custom system image with the libraries I always use (Plots, etc) which does help on that front.

That's a fairly minor gripe, mind you. It's still great.

marmaduke · on Oct 20, 2020

Some libraries implement i/o formats taht you'd never want to spend time implementing, but are available in Python. Nibable is an exmaple, of neuroimaging formats I never want to understand the details of, but would need to work in Julia.

owlie · on Oct 20, 2020

Hey neat! I use Nibabel in my work! Small world. :>

DNF2 · on Oct 20, 2020

If language X solves 9 of my 10 problems, and language Y solves the last one of my problems, I can either, A: use language X and call out to language Y to solve all my 10 problems. Or, B: use just language Y to solve 1 of my 10 problems.

You suggest alternative B?

adamnemecek · on Oct 20, 2020

Because julia is a better language. Numpy is written in C, why use python when you can use C?

StreamBright · on Oct 21, 2020

I do not want to use C/C++ at all. I would like to use Numpy and Pandas because those projects are essential to my workflow.

datanecdote · on Oct 20, 2020

The right benchmark is Stan

https://github.com/TuringLang/TuringExamples/pull/25

civilized · on Oct 20, 2020

What is going on here?

datanecdote · on Oct 20, 2020

Lengthy, nuanced discussion about benchmarking between Turing devs and Stan devs.

nextos · on Oct 20, 2020

I love Julia, but I think comparing Pyro with Turing is pretty unfair.

Pyro can scale to pretty big datasets thanks to custom inference via guides. So you can do inference on pretty sophisticated models such as Deep Markov.

Whereas Turing is mostly for small or medium sized models. For learning purposes, for smaller models or for non-parametrics Turing might be a much better choice, though.

outlace · on Oct 20, 2020

But isn’t the good scaling due to using variational inference rather than full MCMC? Turing has VI too.

nextos · on Oct 20, 2020

Not just that. Aside from the fact that Pyro has a lot of guide tooling, you get for example (hybrid) message passing for HMMs. This is crucial to scale beyond toy data.

Within Julia, ForneyLab.jl is quite cool for non-deep state space models. I also like Turing.jl, but it has different tradeoffs.

marmaduke · on Oct 20, 2020

the "good scaling" is due to variational inference, which turns inference problems into optimization problems, but one has to verify the variational approximation holds, etc.

In my experience, Pyro (generally variational inference) is more like a Bayesian optimization than full inference, since it will simply miss multimodality or non-Gaussian tails, etc.

diarrhea · on Oct 20, 2020

I don't have a CS background, so maybe that's how I'm missing the point here, but I don't see what is so supposed to be so special about Julia's multiple dispatch. In fact, reading more about it and knowing Python, single dispatch/polymorphism feels more natural to me. This would allow a `plot` function in Python, originally written for the float data type, to also work for float-like (quacks like a duck) data types, like `Decimal`. I struggle to see what about Julia improves on this situation or makes it conceptually different.

Lastly, the article mentions that `plot` also works for uncertainty-carrying floats, because the `Measurements` package has implemented "a simple recipe" for this case. But doesn't this mean very tight coupling? The `Measurements` author very specifically (if I understand this right) implemented a `plot` functionality. For the end user, this allows things to just work, which is great. But under the hood, it seems this depends on package authors having to be aware of all possible use cases for their library code.

StefanKarpinski · on Oct 20, 2020

The core problem with class-based object orientation is that methods go inside of the type instead of being added externally. That means everyone has to agree on what methods can be called on a class or they have to subtype it, which sounds harmless, but is actually a big problem when what you really wanted to do was just define a new method for an existing type. My best attempt at explaining: https://www.youtube.com/watch?v=kc9HwsxE1OY

There's also the issue that one sometimes needs to specialize generic operations in generic algorithms on something other than the receiver—sometimes even on more than one of the arguments. Single dispatch forces you to use something slow and awkward like double dispatch in cases like this. And that's assuming the person who wrote the generic code anticipates the need for specialization! If they don't allow for it, then you're just stuck. With multiple dispatch, you can just define the method you need the specialization for and you're done.

munificent · on Oct 20, 2020

> This would allow a `plot` function in Python, originally written for the float data type, to also work for float-like (quacks like a duck) data types, like `Decimal`.

So let's say you do that. You define your own new number-like type, say Complex. You pass that in to plot(n).

Somewhere in the body of plot(), though, it turns out there's some code like:

    # Flip to put origin at bottom left.
    window_height = 400
    y = window_height - n

So now plot() tries to invoke "-" with an int on the left and your complex number on the right. It passes your complex number to the int class's minus operator, which has no idea about your type and barfs.

Python has a hacky workaround for this specific case where the right operand can implement __rsub__(), but that hacky workaround exists specifically because Python doesn't have multiple dispatch.

If it supported multiple dispatch natively, you'd just define an operator - that took a number on the left and a complex on the right, plot() would call it, and you'd be good.

leephillips · on Oct 20, 2020

Wow, nice example. Made possible also because arithmetic operators in Julia are functions, and you can extend them to work on any data types. (Operators like "+" have hundreds of methods.)

ChrisRackauckas · on Oct 21, 2020

Plot recipes are a really good fundamental example of composability in the Julia ecosystem. Extending a plotting library cannot be a function because plotting functions don't compose. Want to support plotting quaternions in matplotlib? You'd write a `plotquaternion` function. Then the ODE solver makes a `plotode` function for the output object of its ODE solver. Now let's say you got ODEs solving on quaternions... what about plotting? Well you have to modify `plotode` to call `plotquaternion` or modify `plotquaternion` to call `plotode`, and this is how you get monorepos and the big functions we all know.

Plot recipes are different. The Plots.jl system calls the type recipe function recursively on the plotting data `X` until it sees a set of primitives that it knows. Libraries define dispatches to this function to denote how data types should transform to something more primitives. Quaternion numbers become four independent series of numbers (and add keyword arguments for doing things like modulus). The ODE solver library writes a recipe that say, if you try and plot an ODE, transform it into arrays of time series. Now plot on a quaternion ODE solution recurses into 4x time series of the solution, which is the plot you'd want to have. This is a feature I use all of the time: I don't generally do more than just `plot(sol)` since recursive recipes generally give me the plot I want (or else I open an issue because someone is missing a recipe).

musingsole · on Oct 21, 2020

__rsub__ isn't a hacky workaround. Magic methods are a fundamental part of python's object model. If you think it's hacky, I think you don't grok python.

> If it supported multiple dispatch natively, you'd just define an operator - that took a number on the left and a complex on the right, plot() would call it, and you'd be good.

This doesn't come across as a win to me.

StefanKarpinski · on Oct 21, 2020

What happens when it's an operation that doesn't have magic fallback methods like the arithmetic operators do? Say the code is `max(400, n)` instead. Now what? The `max` function doesn't know about your special type, so the code fails. Magic methods simply don't scale: they solve this problem in a few very specific cases and nowhere else. With multiple dispatch, you have a completely general solution to this entire class of problems: you just define a new method of `max` that knows how to handle your new number type and everything works.

pdeffebach · on Oct 20, 2020

> This would allow a `plot` function in Python, originally written for the float data type, to also work for float-like (quacks like a duck) data types, like `Decimal`.

This might come with sacrificed performance. If you want a different implementation for a different data type to get the best performance, writing one method makes that hard.

> The `Measurements` author very specifically (if I understand this right) implemented a `plot` functionality.

This is a good point. But in that example, note that `Measurements` also works with all of differential equations, and they didn't implement any differential equations specific code, unlike as you pointed out with `plot`. The fact that uncertainty propagates through a differential equation solver is pretty impressive, imo.

jschwartzi · on Oct 20, 2020

Yeah as I was reading this it occurred to me that a startling number of lab reports I wrote as an undergrad would have been much easier to do in Julia. We normally used Excel for the calculations and had to check all of our propagation of errors by hand.

acdha · on Oct 20, 2020

I really wish someone could get traction with a replacement for Excel with better correctness options. So many people use it because it's easy to get started with but then find themselves in situations like what you mentioned where there's a lot of manual work (or undetected mistakes) to maintain it.

improbable22 · on Oct 20, 2020

I think we're getting there, slowly. Quite a lot of people who don't in any way think of themselves as programmers now reach for python/R/julia for tasks that would once have been done in Excel.

https://github.com/fonsp/Pluto.jl is a super-nice way of just starting, without too much setup cost. I think it will soon be self-contained as to what packages & versions are needed, too.

jschwartzi · on Oct 20, 2020

The biggest barrier to an alternative in these situations is the learning curve. If a professor can teach it to a bunch of lower-division undergraduates in a single class and receive their reports in a uniform format then it'll be a hit.

A bigger barrier here is that we had to know how to apply propagation of errors, so although we could use a tool like this to do the calculation we still had to generate a set of equations for the lower bound, best estimate, and upper bound. There's a seductive argument to be made for just plugging those equations into Excel instead of using them to validate that the Julia functions are working correctly.

chrispsn · on Oct 22, 2020

Mesh can do this, somewhat - exceptions in table columns stick out like a sore thumb in the sheet source.

http://mesh-spreadsheet.com

pdeffebach · on Oct 20, 2020

Julia!

acdha · on Oct 21, 2020

I realize you like it but think about what we're talking about here: someone has a room full of students who have not been trained as programmers and they're working on a particular experiment, not taking a class on software development. Most of them have probably used Excel/Numbers before and if not the basic idea, especially given a template, is pretty easy to explain so you can get them up in running quickly and get back to the actual problem you're working on.

Now contrast that with how much stuff you need to learn to be proficient with any programming language — and no matter how much you think your favorite one is a friendly unicorn, teaching a room full of beginners will be eye-opening for the things they get stuck on. Which editor to use? How to install it? Are you saying “$LANG” but actually using that as a shorthand for “$LANG, Git and shell tools, and the conventions for using which are common in my specialty”? How much time does it take to learn the basics of the language, how to interpret error messages, and debug things?

None of that is insurmountable, of course, but the difference in learning curves is why many people end up with the Excel file which started in a hurry and is now Frankenstein's workbook which everyone is afraid to touch or share. Over a sufficient time interval, it'd be much easier to pick any decent toolchain because maintaining that level of complexity in almost any major programming language is going to be less work but unless you're starting with experienced developers the short term answer will probably favor Excel. I've seen people who've been meaning to get around to rewriting it long enough that the target language has shifted as things fall in and out of favor.

smabie · on Oct 20, 2020

Multiple dispatch is nice because it makes extending library (or any) code very trivial.

The seamless way in which Julia libraries can be mixed and matched and used together is a testament to this. Python on the otherhand, each library is its own little world and cannot be used in novel or unexpected ways.

Moreover, the unperformant nature of Python means that you must use libraries in very cookie cutter ways or else performance falls off a cliff.

ddragon · on Oct 20, 2020

As the name implies, single dispatch is just a special case of multiple dispatch. Everything that can be done with single dispatch works exactly the same (arg1.f(arg2...) is just f(arg1, arg2...) in Julia), so there is no loss in expressivity. The only conceptual difference in multiple dispatch is that arg1 is not considered special, which makes sense for a scientific language as for example in 1 + 5 the operator '+' isn't really owned by the "object" 1, it's a property of the whole formula (the combination of the operator itself and all operands).

And the consequence in terms of programming is exactly that it that it shifts the responsibility of handling interoperability to whoever creates a new type. For example if I create MyType and want a method SomeoneType + MyType to work, I don't need to change the '+' implementation on SomeoneType for that to work (for example making a PR on his library), I only define methods with the type I just created. Which is why people say Julia ecosystem compose a lot, the creator of SomeoneType can just focus on what he wants and possibly never even know about MyType, MyType will extend it with whatever I want without having to re-implement stuff that SomeoneType already implemented (only MyType stuff and the intersection between libraries, defined by methods that use MyType and SomeoneType), and the final user can just import all stuff he wants and have a library that looks like a monolith but is actually many smaller libraries working together this way.

socialdemocrat · on Oct 21, 2020

I wrote two articles which may clarify this better. One is about creating custom temperature units in Julia and Python. It shows how multiple dispatch gives an advantage over single dispatch: https://medium.com/@Jernfrost/defining-custom-units-in-julia...

This story is more like a general intro to Julia, but it has an example with Knights, Pikeman and Archers fighting each other (sort of rock, paper, scissors game), which also shows the utility of multiple dispatch.

https://levelup.gitconnected.com/knights-pikemen-archers-and...

But to give a quick idea here. Imagine writing functions for intersecting two geometric shapes. The algorithm for intersecting a circle and square is entirely different from intersecting a polygon and a line segment. The specific algorithm needed will depend on BOTH shapes not just one. Python can only dispatch on one of the shapes, not both.

So in Julia I could write a different function implementation like this:

    intersect(c::Circle, r::Rectangle)
    intersect(c1::Circle, c2::Circle)
    intersect(r1::Rectangle, r2::Rectangle)

and so on.

musingsole · on Oct 21, 2020

I appreciate your first article demonstrating the differences between Python's single dispatch and Julia's multiple dispatch, but your python example suffers from a lack of a child-aware parent class, Temperature, that could resolve the complaints raised. You're building inherently coupled classes in a decoupled way and complaining that they don't mix well.

gnufx · on Oct 20, 2020

Apart from CS and how software construction works with these things in practice, I reckon multiple dispatch is natural. At least it is to someone with a physics background, which is the context in which I encountered it long ago. I'm used to thinking essentially functionally and in terms of operation->arguments/objects (or verb->subjects), not something with the order and asymmetry of, say, object.operation(object). I wouldn't expect an operation to be tied just to one of the operands. I've seen the result of people trying to explain to scientists, say, Java-like "OO" as if it was somehow mirroring the sort of things they work on.

leephillips · on Oct 20, 2020

A couple of the other replies might have answered this for you already. You are correct that the recipe is specific to the Plots package. But consider: once written, it works not only for that specific plot function, but many others in the package as well, for example scatter. Also, I have written these recipes for my own data types, and they are incredibly easy to create. The recipe informs the functions in Plots how to handle your datatpye, so you can just pass it in as an argument. The thing is that I’ve never even looked at the code in the Plots package. With Julia you can use functions on combinations of your own custom datatypes to do new things, without looking at the code for the functions. Often nothing like the Plots recipes are required; but here the recipe lets you specify how you want your datatype to be visualized.

shepik · on Oct 20, 2020

The deal with multiple dispatch, as I understand it, is that instead of obj->somemethod, you define function named somemethod, which is defined for obj. Maybe some other code calls it.

You can then define somemethod for other data types, without touching the original definition, and that makes your new definition available to the other code that used the original somemethod. It is a unique feature of Julia (at least unique in a sense that it is absent from mainstream languages)

jabl · on Oct 20, 2020

The thing that distinguishes multiple dispatch from single is that you can dispatch based on the types of all arguments to the method, not just one. E.g. you can define

function somemethod(x::Int, y::Int)

function somemethod(x::Int, y::Float64)

function somemethod(x::Float64, y::Float64)

Multiple dispatch is available at least in Common LISP, though I guess that's not considered particularly mainstream.

DNF2 · on Oct 20, 2020

As far as I know, multiple dispatch is optional i clisp.

It also exists in R, but again, is optional (and slow), and is therefore mostly unused, which limits its usefulness, making it less used, etc. etc.

In Julia, it is default, so everything participates in multiple dispatch. It completely pervades the language, which is what makes it so useful.

leephillips · on Oct 20, 2020

It is also available in Perl6, but there, as in CL, it is opt-in, and might slow things down.

shepik · on Oct 21, 2020

Right. I (incorrectly) assumed that "multiple" meant multiple versions for different input types, not necessarily multiple arguments.

badsectoracula · on Oct 21, 2020

Isn't this the same as function overloading in some programming languages?

ddragon · on Oct 21, 2020

It's a form of function overloading, but works on runtime types. for example let's say we have a Dog and Cat, both subclass of Animal. And you define f(Cat, Dog), f(Animal, Animal). In languages without multiple dispatch, if you call it with a Cat object and a Dog object it will choose the first as expected, but if you pass with two Animal objects/pointers it will dispatch to the second regardless of what runtime values they are.

If the language is single dispatch though, animal1.f(animal2) will accurately call the Cat or Dog method for the first argument (before the dot), but the second argument will work like above, it will call the Animal version and not the Cat or Dog ones. If you want a method to call Cat or Dog method based on runtime information you'd have to redispatch on runtime within f(Animal), which is basically a pattern known as Visitor Pattern [1].

In Julia (and multiple dispatch languages) defining f(Cat, Dog) and f(Animal, Animal) means that even when I call f(animal1, animal2) the compiler will convert to the runtime values and if the first animal is Cat and the second is Dog it will call the first method (as a form of specialization) and otherwise it will go for f(Animal, Animal) (which is the more general method). And the Julia compiler is really good at inferring runtime types at compile time, so even though it's runtime values it's still static dispatch (that's why it's also fast).

[1] https://en.wikipedia.org/wiki/Visitor_pattern

DNF2 · on Oct 21, 2020

You can think of it as similar to operator overloading. It's not quite the same, but it's a subtle distinction, or rather, the explanation of the difference is subtle.

You can find examples where operator overloading and multiple dispatch actually behaves differently and gives different answers, but the comparison works as a first order approximation.

pansa2 · on Oct 21, 2020

Yes, except dispatch happens at runtime, not compile time. It takes into account the dynamic type of each argument, not just their static types.

DNF2 · on Oct 21, 2020

Well, almost. Dispatch can happen at compile time, and in fact does for type-stable code. But the dispatch is on run-time types, not static types.

(The fact that you can have static dispatch on dynamic types is a pretty subtle point that hurts my head, but the upshot is that you get semantically dynamic dispatch with static performance as an optimization.)

musingsole · on Oct 21, 2020

As the other answers note, multiple dispatch is largely a performance improvement. When writing Python, performance has usually been sidelined as a priority in the first place (within some constraints).

Personally, I hate this presentation. Multiple dispatch (to me) is cognitively heavier and not what I would default to. If it's a performance gain and I need performance, then sure. But too often people hold it up as a virtue because the need for speed is assumed.

amkkma · on Oct 21, 2020

What? I used to write python, and to me, MD is first and foremost a conceptual improvement. Makes writing code so much sweeter, more expressive and sensical.

It also heavily improves composability.

It's LESS performant if not done carefully. Certainly FastAI, Matt rocklin etc didn't reimplement a subset in python just to get slower code

See https://youtu.be/kc9HwsxE1OY

DNF2 · on Oct 21, 2020

I agree, the performance is a very nice bonus, but it is primarily about expressiveness, composability and natural simplicity.

It just seems profoundly un-natural and arbitrary to limit dispatch to only the first argument.

melling · on Oct 20, 2020

There’s a current Fall 2020 class at MIT that uses Julia.

https://computationalthinking.mit.edu/Fall20/

There’s a live video on ray tracing starting in about 20 minutes

https://youtu.be/MkkZb5V6HqM

threatofrain · on Oct 20, 2020

It's notable that the 3B1B guy is contributing to that class.

aborsy · on Oct 20, 2020

Julia is better suited to numerical computation. Matlab is expensive and not really a language. Python is not designed for numerical computation and can be verbose.

The issue for Julia is, if you stay in Numpy and Scipy in Python, and can use JIT when applicable, you are basically using C. You can make it fast using C libraries and there is not whole lot of room for Julia to shine. Even with C, I had to be lately a bit careful to beat Numpy.

krastanov · on Oct 20, 2020

I have quite the opposite experience. Numpy is terrible in terms of allocation overhead (even when you use the available, but limited, inplace operators and memory views). In Julia it is 𝚝̶𝚛̶𝚒̶𝚟̶𝚒̶𝚊̶𝚕̶ much easier to write allocation free code. And only so many operations are trivially broadcastable, after which numpy becomes very cumbersome and slow.

Numpy is a great piece of engineering, but its limitations are very noticeable when compared to Julia.

edit: not trivial, but much easier

spacedome · on Oct 20, 2020

As much as I like Julia, I think "trivial to write allocation free code" is a bit of an overstatement. Depending on what you are doing, it can be difficult, for example iteratively calling any of the LinearAlgebra methods, since there is no interface for preallocating work arrays (doing the ccall on BLAS yourself is not a fun work-around). It is also not always clear why something is allocating, even with the debugging tools.

DNF2 · on Oct 20, 2020

Perhaps I misunderstand what a 'work array' is, but you can easily pre-allocate arrays, and then work on them in-place, including with LinearAlgebra functions like `mul!` et. al.

spacedome · on Oct 20, 2020

Higher level BLAS operations, such as solving a linear system, or computing svd/eigen, cannot be done in-place the way matrix multiplication can, and require additional memory of a predetermined, fixed size, called the work array. This cannot be pre-allocated in Julia, as there is no interface to do so in LinearAlgebra, so these BLAS calls will always allocate memory.

DNF2 · on Oct 20, 2020

Have you looked thoroughly over https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/ ?

There's a ton of in-place operations there, including in-place solving of linear systems, and at least some stuff related to svd (though the names are pretty un-intuitive, being wrappers for BLAS and LAPACK functions).

spacedome · on Oct 21, 2020

Yes, many of these are "in-place" but will still allocate. I typically use the geev!/ggev!/geevx! routines, if you look at the source code you will see that the work arrays are still allocated inside the call. The in-place here (unfortunately) means only that the input is overwritten, not that there is no allocation.

DNF2 · on Oct 21, 2020

The purpose of in-place operations is to avoid allocations, so this seems like an oversight.

Is this a matter of simply wrapping the 'right' LAPACK routines, or is there something missing in the interfaces that could be trivially added in principle?

spacedome · on Oct 21, 2020

It is entirely possible, but is not trivial, especially for the user who then needs to know "arcane knowledge" of BLAS/LAPACK work array sizes and flags. There was some discussion about this on github, but it sort of trailed off without a real resolution. I think it is considered too complicated/niche to be in base, and was recommended to be an external library, but nobody (myself included) seems particularly interested in what amounts to maintaining a fork of the entire BLAS package. The base devs would have more insight, but this is my view from the outside at least.

cbkeller · on Oct 21, 2020

Hah, you were not kidding about the names being unintuitive.

For anyone else interested there is a large list starting here [1]. The in-place versions will all end with `!`.

[1] https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#Linea...

smabie · on Oct 20, 2020

Numpy isn't actually very fast, especially if the data is small.

FranzFerdiNaN · on Oct 20, 2020

Julia is a wonderfully designed language and I fully anticipate making it my main language in a few years. Right now I will stick with R though, because R’s ecosystem is simply unparalleled right now. And for someone who is not terrible interested in programming, except as a way to get things done, that is what matters most.

But i really think that in a few years time, when Julia’s ecosystem has had more time to grow and develop it can rival R.

eigenspace · on Oct 20, 2020

> And for someone who is not terrible interested in programming, except as a way to get things done, that is what matters most.

For what it's worth, I found that taking the plunge into julia is what made me love programming.

I'm constantly amazed by how easy it is to whip something up myself that before I would have relied on a built-in routine in another language. Doing this made me a better programmer and more capable of doing the things that 'matter most' as well.

Plus, whenever I whip something up myself, I try to share it with the community either as it's own package or as a PR to an existing package, so I get to feel involved in the progress as we get to the sort of ecosystem you would find acceptable.

dunefox · on Oct 20, 2020

Why wait? https://github.com/JuliaInterop/RCall.jl

ed25519FUUU · on Oct 20, 2020

Julia is an interesting language. I see that there's a lot of excitement for it, but most excitement seems to come from data scientists who are heavier on the engineer side. For almost all of the rest of the data scientists I know, Python seems "more than good enough".

Also to note, Julia has seen a rise over the last few years, but so has Python, its primary competitor in the data science field.

smabie · on Oct 20, 2020

It's only good enough for data scientists because they don't know any better.

For existing projects, fine, stick with Python. But Julia is strictly superior to Python in everyway, there is nothing that Python is better at than Julia. This is partially because Julia can call any Python library, but mostly because it's actually designed for data science and scientific computing.

dalke · on Oct 21, 2020

"Everyway" seems like unneeded flamewar fuel. Here are a few things where it seems that Julia is not strictly superior to Python:

Julia doesn't run everywhere that Python does. Especially if you include all of the Python implementations, including MicroPython.

Last I checked, Julia's startup cost was high, making it the wrong choice for short-lived programs. Yes, people do scientific computing in short-lived programs.

Now, I don't know where Julia stands in this, but Python is used to teach people to program, including high school students and younger. By the time Python was 8 years old (as Julia is now), Python was being used in high school courses. Python's built-in "turtle" module exists as part of that education component - at one Python conference, an educator pointed out that it can be very hard to get anything installed at primary and secondary schools, so having a maintained "turtle" in the base installation was important.

I therefore suspect Julia is a worse teaching language than Python for high school level and below.

ddragon · on Oct 21, 2020

About your last point, you shouldn't conflate popularity with being better. There are languages with heavy focus on education (both in the language itself and tooling) like Racket that are still not heavily used compared to python even when they provide a great environment for learning from basic to more complex comp sci.

Regarding Julia, yes the focus seems to be more on college, where the matlab-like math support makes it a lot closer to regular math. Though since it's not very different from Python someone can still pick it up easily from knowing Python in high school (or Racket).

dalke · on Oct 21, 2020

I think Python is a better teaching language because it comes out of and is informed by ABC. ABC went through user testing to see how to develop a programming language that would make it easier for non-programmers to learn how to program.

Remember, I am presenting what I think are counter-examples to "is strictly superior to Python in everyway" (excluding "existing projects"). I still think that's true even if Python weren't popular at all.

As an example of the influence, the ":" at the end of lines which start a block in Python is not required, in the language sense. However, ABC testing showed it made the language more useful.

As an example of improving Python for learners, one of the reasons for Python's change from 1/2 == 0 -> 1/2 == 0.5 was feedback from the Alice developers, where they found that students did not understand van Rossum's decision to follow C's semantics.

Python started being used in schools when it wasn't popular. One of the presenters at the Python conference in 1998 or so was a high school AP comp sci teacher. At that time C++ and Java were the AP languages. He reported better success teaching students Python first and then teaching C++ or Java, than spending all of the time in one of the latter languages.

smabie · on Oct 22, 2020

I think Julia is actually easier. No classes. 1 instead of 0 based indexing. Functional features (I think functional programming is easier to learn).

dalke · on Oct 22, 2020

There are people who think Haskell is easier.

Which is why I attempted support my personal beliefs via examples of Python uptake. Something you interpreted as an argument by popularity.

I observe that https://julialang.org/learning/classes/ lists only a single high school, and I didn't see a single course meant as a general introductory programming course for non-programmers.

Here's the IPCC talk (from 2000) I mentioned about "Using Python in a High School Computer Science Program" https://legacy.python.org/workshops/2000-01/proceedings/pape...

It references the "Computer Programming for Everybody" essay at https://www.python.org/doc/essays/everybody/ , which I argue shows the stronger emphasis on the Python developers for beginning programs, inherited from ABC.

I also mentioned Alice. "Because Alice is targeted towards novice programmers, it is important that Python, more than languages such as Tcl or Scheme, can be mastered by new Alice programmers with little effort" (quoting the 1995 paper at http://www.cs.cmu.edu/~stage3/publications/95/journals/IEEEc... ). But that was at a university.

Python of course is much more established than Julia, which is why I linked to resources from when Python was only a year or two older than Julia is now.

That said, at https://www.python.org/community/sigs/current/edu-sig/ you can find other teaching resources for high school teaching and for "kids" and "young children". My local library here in Sweden has a book on programming for kids, which uses Python.

Is there a similar movement as CP4E within the Julia? Nothing from https://julialang.org/learning/ suggests that the needs of non-programmers, such as high school students, plays a strong role in its design.

Do you know of any such internal movement? Can you point me to any external resources which show Julia being used as a teaching language for non-programmers?

bobajeff · on Oct 20, 2020

>But Julia is strictly superior to Python in everyway

That is until you use the debugger.

FridgeSeal · on Oct 20, 2020

The debugger has had a huge amount of work done on it recently.

Anyway, half the data-scientists I know don't use the debugger, they're solidly in the print-to-debug camp, or they're using notebooks and going step-by-step.

smabie · on Oct 20, 2020

Hell half the programmers I know don't ever use a debugger. I usually prefer to just think about my code and add some print statements, but occasionally I get super frustrated and throw up my hands. Feels like a personal failing on my part though.

siproprio · on Oct 20, 2020

>>But Julia is strictly superior to Python in everyway >That is until you use the debugger.

Or stumble into the first example in JuliaPlots, which doesn't work.

dunefox · on Oct 21, 2020

The first example here (https://docs.juliaplots.org/latest/) is the Lorentz attractor - just tried it in 1.5 and it works.

smabie · on Oct 20, 2020

What example are you referring to?

qaq · on Oct 20, 2020

I am pretty excited about Julia potential for web apps

dnautics · on Oct 20, 2020

I wouldn't be. I love julia, but it doesn't have a spectacular error-handling story. So if you have to start running a web server, and also have to worry about people issuing a DDOS or finding sneaky ways to cause your system to consume a ton of resources on a handful of threads, and lower availability, it's not going to get pretty.

Unless you mean compling julia or something like it to WASM and running it in the browser. That could get interesting, but julia is still "a bit too chonky" for that.

ddragon · on Oct 21, 2020

To be fair you wouldn't just expose your server directly nowadays, you'd have them in a container orchestrator like k8s that would have a liveness probe (and container usage tracking) that would appropriately scale/restart pods as to not have the system going down, together with load balancers and DDoS protection systems like AWS Shield to detect such attempts.

And of course, you can also use a more low latency high availability oriented language handling the frontend (like Elixir, which will have trouble doing the heavy data oriented stuff Julia can), with Julia as a microservice to handle the actual backend logic and analytics, which is probably the setup I'd go for (using 2 high level lispy-languages with very different qualities, each doing what it does best).

dnautics · on Oct 21, 2020

> And of course, you can also use a more low latency high availability oriented language handling the frontend (like Elixir, which will have trouble doing the heavy data oriented stuff Julia can), with Julia as a microservice to handle the actual backend logic and analytics, which is probably the setup I'd go for (using 2 high level lispy-languages with very different qualities, each doing what it does best).

I would do this exactly too, if I had to do number crunching.

qaq · on Oct 21, 2020

this is a problem with pretty much every lang used for web dev. Most web apps are behind Nginx or some API proxy etc. to mitigate.

ku-man · on Oct 20, 2020

Nim is a great general-purpose language I have been using lately for my numeric and data heavy processing needs.

Julia's overenthusiastic posts here in HN make me wary of the language instead of curious.

MrMan · on Oct 20, 2020

Yes I like nim more than Julia but Julia has this big push behind it. If you put the same resources into nim it would be a more usable (not saying better) language.

tgb · on Oct 20, 2020

Can the DifferentialEquations package actually solve quaternion equations correctly? Quaternions are non-commutative and I'd hazard to guess that some solver methods would assume commutativity at some point.

spacedome · on Oct 20, 2020

My experience with various linear solvers (which a diffeq solver typicaly relies on) is that if they assume commutivity anywhere at all, which they often do, they almost definitely will not work for quaternions. Even if a derivation of the algorithm can be done with non-commutivity in mind, the implementation typically is not. Many NLA solvers are based on orthogonality transforms, which do not translate directly to quaternions, and even solvers which use only inner products and mat-vec multipication like BiCG do not really work as-is for quaternion matrices. Luckily dense linear solves with LU are fine! One can usually use the complex matrix expansion for quaternions and solve that instead without much difficulty though, and for computing eigenvalues this is perhaps even preferable, as it gives a canonical representation. I have not tried to use DifferentialEquations.jl for quaternion problems, so maybe they have figured some of it out, but it is non-trivial.

ChrisRackauckas · on Oct 20, 2020

ODE solvers only use LU

spacedome · on Oct 20, 2020

This is good to know, thanks Chris. I mostly solve sparse PDEs, being able to always use LU makes everything much simpler, especially with quaternions.

LolWolf · on Oct 20, 2020

I would say it's probably worth converting your problem to its linear-algebraic representation and then using the solvers on that. They're quite fast there, and it's probably fairly easy to make a representation map and its inverse work well. (I will say that I don't usually deal with quaternions other than in a linear-algebraic form when doing some robotics stuff, so maybe this is more of a pain than warranted!)

spacedome · on Oct 20, 2020

I found this approach to work well for linear systems, here is some rough code I used for the representation map (note the jmag/kmag functions are part of my implementation, not sure what the equivalent is for Quaternions.jl).

  function cmatrix(Q::AbstractVecOrMat{Quaternion{T}}) where {T}
      [complex.( real.(Q),  imag.(Q)) complex.( jmag.(Q), kmag.(Q));
       complex.(-jmag.(Q),  kmag.(Q)) complex.( real.(Q), -imag.(Q))]
  end

  function qmatrix(C::AbstractMatrix{Complex{T}}) where {T}
      n, m = size(C)
      quat.(C[1:n÷2, 1:m÷2], C[1:n÷2, m÷2+1:m])
  end

unnouinceput · on Oct 20, 2020

Searching for Julia related work on Upwork resulted in only 3 jobs. One of them was actually to port an existing application from Julia to C++

marmaduke · on Oct 20, 2020

Julia looks great but in practice I can pretend it doesn’t exist without much worry. Python and MATLAB are good enough for interactive work and for the performance stuff we use CUDA or similar directly.

Edit just to mention that Numba solves by and large the middle ground for the domain I work in, despite what Julia proponents have to say.

bachmeier · on Oct 20, 2020

> Julia looks great but in practice I can pretend it doesn’t exist without much worry.

That's always the problem with a new programming language. The target for the language is the people that already have something that works. Then a decade passes and that language is no longer a shiny new object, so enthusiasm for it dies, it's just another old language with warts.

teorema · on Oct 20, 2020

Not to sound dismissive but I remember when that was true of Python (vis-a-vis Perl and some other things). My recollection of discussions on forums when it was starting to gain mindshare were arguments largely about the aesthetics of the syntax.

Numerical computing is something where there's always been something that works. Before Python it was C and Fortran.

The problem people ran into is that when everything is wrapped around low-level libraries for speed, you eventually run into the catch 22 of using the slower language that you prefer for clarity, or the faster one that makes it acceptable performance-wise.

In other words, with Python, to get the performance you would have got in C or Fortran, you have to code in C or Fortran. Then you're not using Python anymore. The idea (in theory, and a lot in practice) with Julia (or Nim, or other LLVM-targeting languages) is that you don't have this penalty.

So in that sense Julia is providing something that isn't working in Python or Matlab.

I think Julia's not quite what it's cracked up to be, but mostly it is, and I'd probably prefer working with it over Python or Matlab for numerical stuff.

bachmeier · on Oct 20, 2020

> I remember when that was true of Python

But Python didn't take off because of scientific computing. Google was a heavy user of Python from the start. Python seriously sucked for scientific computing in 2005. I was looking for a new language (the old one wasn't cutting it) and I'd been looking for a use for Python for years. It had some projects but it just was not in a useful state so I went with R, which was quite mature by that time, and it focused on statistical analysis. Years later Python became popular for scientific computing, likely due to the userbase it had built up in other areas.

cb321 · on Oct 21, 2020

While it's true in 2004 there was still a Numeric/numarray schism, at the time I was using Pyrex (which soon evolved into Cython) to have a gradually strongly typed Python-like thing cross-compatible with the rest of the ecosystem. That made it quite easy to write code which ran circles around R performance-wise without leaving the Python syntax domain and with a very simple FFI to call C to boot. I even had a tiny "pycc" script to create "executables" instead of "importable modules". Yes, those executables did depend on the installed base of Python stuff.

These days, Nim is a better Cython but with less dynamic temptations and more powerful metaprogramming (and, yes, a much smaller ecosystem..maybe not that much smaller than Python in the late 90s, though). { Not that this is all Nim is...It's actually a really good everything-language that's tricky to summarize in just a few words. }

teorema · on Oct 20, 2020

Yeah I don't disagree re: scientific computing and Python (although I think from the beginning some of the advocacy for Python was coming from more "math-oriented" communities). I still think that when it started accelerating in use (which was mainly among scripting and then web applications, some other stuff too) there were existing solutions. A lot of discussions about established languages versus new ones are very similar to one another; the particular languages are just swapped out for different ones.

gnufx · on Oct 20, 2020

> My recollection of discussions on forums when it was starting to gain mindshare were arguments largely about the aesthetics of the syntax.

I remember dynamic languages people picking up on three things in particular: Python apparently being designed to be inherently inefficient, not having proper garbage collection, and having weird scoping. At least Python was something to point people at who rejected Lisp as an alternative to Tcl and Perl for scientific computing.

marmaduke · on Oct 20, 2020

> Before Python it was C and Fortran

And before that, analog circuits. Hodgkin and Huxley computed their first results for neuronal activity without a computer.

gnufx · on Oct 20, 2020

Although you might have needed the distinction between an electronic computer and a human one then.

pasttense01 · on Oct 29, 2020

No. Before C and Fortran it was assembly language.

marmaduke · on Oct 20, 2020

I think even this is overstated: Python via Numba and MATLAB both have excellent JIT compilers now, and for the maximum performance you need to hit SIMD. Latter is not reliable in Julia and the former, well, you already have Python and MATLAB.

stabbles · on Oct 20, 2020

"SIMD not reliable" is an overstatement itself for a couple reasons:

1. It's very easy to inspect generated code of your kernels where you really need SIMD. For instance:

    julia> function my_kernel(xs)
             total = zero(eltype(xs))
             for x in xs
               @fastmath total += x
             end
             total
           end
    julia> @code_native debuginfo=:none my_kernel(rand(10))
    ...
    L96:
     vaddpd 8(%rcx,%rax,8), %ymm0, %ymm0
     vaddpd 40(%rcx,%rax,8), %ymm1, %ymm1
     vaddpd 72(%rcx,%rax,8), %ymm2, %ymm2
     vaddpd 104(%rcx,%rax,8), %ymm3, %ymm3
     addq $16, %rax
     cmpq %rax, %rdi
     jne L96
    ...

2. There's a Julia package that does the code gen for vectorization exactly because LLVM does not always get it right: https://chriselrod.github.io/LoopVectorization.jl/stable/exa...

3. You can make LLVM explain why it did not vectorize certain loops just like in clang.

marmaduke · on Oct 20, 2020

SIMD convergence is a big issue here: how do you keep those lanes running together? If you have trivial kernels sure, but ISPC guarantees convergence across control flow which isn’t even available in CUDA (according to ISPC docs at least).

Just to be clear: I want to use Julia but without guessing about SIMD use in kernels. Until this is available for complex control flow in Julia, it’s not a game changer in my opinion.

Edit: that kernel is trivial. I have nested control flow to vectorize.

stabbles · on Oct 20, 2020

Right, I see what you're getting at. Just taking a slightly less trivial example from Intel's docs, and superficially comparing the assembly they generate to what LoopVectorization.jl can do, I guess there is some hope at least:

    julia> function simple!(ys, xs)
             @avx for i = eachindex(xs)
               ys[i] = if xs[i] < 3.
                  xs[i]^2
               else
                  sqrt(xs[i])
               end
             end
           end

This generates vbroadcastsd, vpcmpgtq, vmulpd, vsqrtpd, vblendvpd and vmaskmovpd AVX2 instructions (I don't have AVX512). But more complex control flow does indeed not work; the `@avx` macro errors it can't handle it at the moment.

marmaduke · on Oct 20, 2020

Thanks for the example, I am interested in trying these packages in the future, but here's a current snippet of ISPC,

    varying float aff[nc], xh[nn], wij[nn], x[nc];
    varying int ih[nn];
    uniform int t_ = t & (nl - 1);
    uniform float k;
    for (int j=0; j<nc; j++)
        aff[j] = 0.0f;
    for (int j=0; j<nn; j++)
        for (int i=0; i<nc; i++)
            aff[i] += wij[j] * shuffle(xh[j], ih[j]);
    for (int i=0; i<nc; i++)
        x[i] = 0.1f*(x[i] + x[i]*x[i]*x[i]/3.0f + k*aff[i]);
    for (uniform int i=0; i<nn; i++)
        xh[i] = insert(rotate(xh[i], 1), 0, extract(x[i/nl], i&(nl-1)));

Just being able to absorb SIMD lanes into the notion of varying vs uniform makes this easy to write, not to mention SIMD operations like shuffle or rotate. In Julia or Numba or CUDA I have to index into arrays, ensure compatible data layout etc. I imagine this could be done with more macrology in Julia, but again why not use something which already works.

stabbles · on Oct 20, 2020

> but again why not use something which already works

Sure, but there's always somebody crazy enough to try to implement it :p

From the fact that ISPC can generate C++ (with --emit-c++) I would think their compiler is conceptually very high-level. Since so few new concept are introduced, I wouldn't be surprised if someone would implement the same DSL in Julia at some point, just to see if one can get similar performance.

marmaduke · on Oct 20, 2020

> ISPC can generate C++ (with --emit-c++) I would think their compiler is conceptually very high-level

It's a Clang-based frontend for LLVM, so yeah, no reason Julia couldn't make use of their work. I think the main thing to tap into is the uniform vs varying support, since this drives the whole thing.

DNF2 · on Oct 20, 2020

I'm sorry, but as a regular Matlab user, I have to say that, no, Matlab does not have an "excellent jit compiler." It has a jit compiler which is really limited, and works for straight-line code with built-in functions.

For performance you virtually always need to write 'vectorized' code, which is often difficult and memory heavy.

The daily churn of handling input parsing, along with contorting code into vectorized form takes up an extraordinary amount of time when writing Matlab code.

Matlab is certainly not 'good enough' for me! It's just what I'm stuck with at work.

marmaduke · on Oct 20, 2020

Maybe, depends on the work. I’ve seen Matlab jit code to match C performance. I think it’s the language holding it back, mainly.

DNF2 · on Oct 21, 2020

I have also seen Matlab code match Julia for simple loops. If you write your code properly vectorized and with loops that use only calls to 'built-in' operators and C-functions, then it can match C/Julia.

But that's just extremely restrictive. The jit does not work on arbitrary user code.

socialdemocrat · on Oct 21, 2020

Everything is possible with enough effort. People wrote games in Assembly language too in the past. The thing though is that as long as the domain you want to work in has Julia packages, there is a high chance you job will simply be a lot easier in Julia.

I have recently been working with a REST API, something I don't do with either Python or Julia normally. I spent probably 10x as long time getting this to work in Python. It was not just a matter of not doing what I wanted, but also in not supporting well established conventions well as well as simply having far more complex APIs than the Julia equivalent.

This is something I frequently find. Julia APIs for doing the same thing as in Python will generally be much easier to work with.

Python APIs also tend to be very OOP oriented and stateful. Meaning there is a lot of mutating of hidden state. Julia APIs tend to show much more clearly what the data flow, is because they follow a more functional approach, where mutation of state is generally avoided. Although Julia is pragmatic. It isn't like working in Haskell.

NOTE: I am not saying Python is a bad language. It would usually be my second choice. I am just trying to say that there are REAL advantages to using Julia, which go far beyond mere performance. I never started using Julia due to performance but due to its expressiveness and clean APIs.

agumonkey · on Oct 20, 2020

Being stuck in the middle like this can be a difficult if not deadly issue to resolve.

jimbokun · on Oct 21, 2020

Sounds like Innovator's Dilemma.

lenkite · on Oct 21, 2020

Please allow Julia tooling to easily compile to a native binary/dynlib/dll that can be deployed on both desktop and mobile OS'es.

Right now, it appears like a niche interpreted language for numeric geeks only.

edsac_xyzw · on Oct 21, 2020

Matlab and its Simulink extension, which is widely used for designing control system has a feature for generating C code that can be deployed without Matlab runtime. Someone with knowledge of code generation and compilers could take this Matlab feature as an inspiration and implement a C code generator for Julia functions.

> Right now, it appears like a niche interpreted language for numeric geeks only.

Yes, it was designed for numerical and scientific computing. In Julia, you can at least prototype the numerical algorithm before implementing in C++ or Fortran.

vsskanth · on Oct 21, 2020

https://github.com/JuliaLang/PackageCompiler.jl

Note that it doesn't do slim binaries yet for embedded. Don't know if that's in the roadmap.

7thaccount · on Oct 20, 2020

Julia is pretty neat, but I'm still using Python at the moment for numerical computing work. Julia does have built-in sparse arrays, but I don't think it's on par with scypy.sparse yet.

stabbles · on Oct 20, 2020

I would assume scipy uses MKL, and Julia has wrappers for that too: https://github.com/JuliaSparse/MKLSparse.jl. For the GPU CUDA.jl has support for sparse matrices as well.

7thaccount · on Oct 20, 2020

My only problem too is I just want to write high level mathematical code and have it work. With Python's anaconda distribution, I just download it and the sparse stuff works. I don't want the hassle of dealing with wrappers and code that is not very mature. I know it's a chicken and the egg thing.

stabbles · on Oct 20, 2020

Have you even read the readme in that link?

Start a julia repl, hit ] for the package manager, paste `add MKLSparse`, write `using SparseArrays, MKLSparse`, run `sprand(100,100, 0.1) * rand(100)`

spacedome · on Oct 20, 2020

I think the functionality of Julia's sparse arrays is mostly on par with scipy.sparse, it's just a bit rough around some edges still, and spread out into non-base packages (e.g. into IterativeSolvers.jl, ARPACK.jl, MUMPS(*).jl etc) of various maturity. Having the sparse arrays built-in might (hopefully) lead to a large ecosystem of interoperable sparse packages, but as with much of the Julia ecosystem this is still a work in progress.

Personally, working on sparse eigensolvers, my only complaint is that IterativeSolvers.jl is still somewhat lacking (though the ARPACK.jl bindings are more comparable to what scipy does), but otherwise cannot imagine going back to python for this kind of low-level numerical research. Having code run comparably to fortran speed (barring some memory annoyances) is huge for working on numerical algorithms.

adamnemecek · on Oct 20, 2020

The nice thing about julia is that its much simpler to roll your own things. With numpy, you might have to be writing C.

coliveira · on Oct 20, 2020

Just the opposite, because more people know C than Julia.

snicker7 · on Oct 20, 2020

A Julia user knows more about Julia than a Python user knows about C.

dunefox · on Oct 21, 2020

So more people knowing C than Julia somehow makes it easier for me? Absolute nonsense.

adamnemecek · on Oct 20, 2020

They do, what's your point.

coliveira · on Oct 21, 2020

My point is that there are far more C programmers and C code in the world than there are Julia programmers and Julia code. This is the advantage of Python: it was able to leverage billions of lines of C code by simply acting as a glue language.

DNF2 · on Oct 21, 2020

But I don't know C, and it's easier for me to be able to write fast code in a language I actually know.

Having to switch to a language you don't know is not a good situation, it doesn't help that a lot of other people know that other language.

eigenspace · on Oct 21, 2020

Just fyi, Julia has a fantastic and very fast C foreign function interface, just like Python.

marmaduke · on Oct 20, 2020

> With numpy, you might have to be writing Python

FTFY cf Numba

http://numba.pydata.org

krastanov · on Oct 20, 2020

Numba is great (as well as Jax, which is even better), but its limitations are drastic compared to Julia. So much of what is great about python (e.g., compossibility, introspection, simplicity) is lost with these libraries, while Julia still retains them.

marmaduke · on Oct 20, 2020

The main thing missing from Numba is user defined structures, the jit classes are a performance penalty usually.

That said, in terms of composability you can jit over the closure to achieve a lot of what you might want, e.g.

    def make_loop(f):
        @jit
        def fn(x):
            for i in range(x.shape[0]):
                 x[i] = f(x[i])
        return fn

for any jit'd function f will be just as fast as if you had inlined the body of f.

For introspection and simplicity, I think, in high performance, you simply have to choose two of fast, simple and generic. Julia clearly chooses fast and generic.

krastanov · on Oct 20, 2020

Could you elaborate on why you are saying julia is not as simple? Certainly, there are performance tips one needs to be aware of, but that is the case with python too. I have used python for relatively high-performance numerics for research work for the last decade, and now that I am exploring Julia, it certainly addresses all of the concerns I mentioned above with code which is at least as legible/simple and much more introspective. It has its warts, but those warts are on the roadmap and I have seen significant improvement between v1.3 and v1.6 (e.g. compilation latency and debuggability).

marmaduke · on Oct 20, 2020

Compiler errors can be on par with heavily templates C++, with similarly difficult to read code bases. It’s of course easier to work with than C++ but not the panacea that seems to be implied in a lot of the comments.

In terms of deployment, you essentially have to have Julia installed wherever you want to run, which frequently ok, and if not there’s always PackageCompiler except it’s not that easy to get a working shared lib.

There are other things I find complex but probably because I’ve used Python too long so Julia is different etc. I think Julia is a great choice for HPC generally except for challenging SIMD codes where it’s hard to vectorize without an explicit ILP model.

datanecdote · on Oct 20, 2020

How does Jax lose composability or introspection?

krastanov · on Oct 20, 2020

E.g. jax does not autodifferentiate anything that is not jax (scipy ode solvers, special functions, image processing libraries, special number types (mpmath), domain-specific libraries). Compare that to Zygote.jl

datanecdote · on Oct 20, 2020

It is true that Jax cannot differentiate through C code. But it can differentiate through python code that was written to accept Numpy.

amkkma · on Oct 20, 2020

Which is extremely limited compared to Zygote which can do custom types, dicts, custom arrays, complex type system and multiple dispatch uses etc

datanecdote · on Oct 21, 2020

Try reading the docs before making sweeping negative comments about what a piece of software can and cannot do.

https://jax.readthedocs.io/en/latest/notebooks/autodiff_cook...

krastanov · on Oct 21, 2020

Are you talking about "Differentiating with respect to nested lists, tuples, and dicts" from that page? The comment to which you are responding covers quite a bit more. The jax documentation specifically says "standard Python containers". Zygote.jl and other less stable Julia auto-diff libraries go far beyond the built-ins and can work with structures defined by packages never designed to be used with automatic differentiation. Of course, there are limitations, but quite a bit less severe than the one in jax (and again, I am saying this while being a big jax fan).

datanecdote · on Oct 21, 2020

As the document I linked to says, Jax autograd supports custom data types and custom gradients.

It’s honestly exhausting arguing with all you Julia boosters. You can down vote me to hell, I don’t care. I’m done engaging with this community.

You all are not winning over any market share from Python with your dismissive, arrogant, closed minded culture.

cbkeller · on Oct 21, 2020

I understand you are frustrated, however, please remember

> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that."

> Please don't comment about the voting on comments. It never does any good, and it makes boring reading.

> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.

https://news.ycombinator.com/newsguidelines.html

krastanov · on Oct 21, 2020

I am confused why you assume I am a "Julia booster" or use such combative language. I love Python and Jax and use it for much of my research work, I just also like learning of other approaches. Please try to honestly address the sibling comments. We have repeatedly claimed that tools like Zygote.jl can autodifferentiate efficiently things that Jax can not (without a lot of extra special code and hand-defined backprop methods), e.g., an array of structs with scalar and vector properties over which a scalar cost is defined. Just give examples, so that we can both learn something new about these wonderful tools instead of using such offensive language. It is hard to not take your own comments as the ones being dismissive.

Also, look from where this conversation started. My claim was that jax does not work with "(scipy ode solvers, special functions, image processing libraries, special number types (mpmath), domain-specific libraries)". A julia library does not need to know of Zygote.jl to be autodifferentiable. A python library needs to be pure-python numpy-based library to work with jax.

In order to try to contribute to the discussion: I think this paper describes relatively well what is so special about the Julia autodiff tools: https://arxiv.org/abs/1810.07951

For a separate approach, which is also very original, check out https://github.com/jrevels/Cassette.jl