Hacker News new | past | comments | ask | show | jobs | submit login
The Accelerating Adoption of Julia (lwn.net)
316 points by chmaynard on Oct 20, 2020 | hide | past | favorite | 252 comments



I think one of Julia's greatest indicators of long-term success is the variety of commercial users/companies from a diverse pool of technical domains/industries that are all excited, willing, and capable of contributing back to the ecosystem.

We had a BoF at this year's JuliaCon revolving around this topic [1] and are now planning the first Annual Industry Julia Users Contributhon as a result.

I especially think that well-configured Julia + K8s setups have the capacity to really tighten exploratory data science <-> operational data engineering feedback loops in industrial settings in a way that is much more ergonomic, generically useful, and portable/extensible than using pre-baked frameworks to achieve something similar. Julia-centric tooling for coarse-grained workflow orchestration, experiment tracking, data provenance, etc. would be nice, though I also think existing generic tools in this vein (e.g. Argo) could probably compose well too :)

A few different companies have nice in-house implementations of these kinds of setups, and Julia Computing is building a nice looking commercial product suite in this vein that I look forward to exploring more in the future (especially JuliaHub and JuliaRun).

[1] https://julialang.org/blog/2020/09/juliacon-2020-open-source...


The important part of Julia is its programming model.

The implicitly parallel fork-join model is easy to program and incredibly general. And I'm glad to see a high-level language embrace it.

-------

I probably should note some other languages of this model: CUDA, OpenCL, ISPC, OpenMP, OpenACC. Most of these other languages are low-level, where you manage memory directly (and GPUs have very little memory per thread. So manual memory management is still hugely important)

But for speed of development, prototyping, and higher level reasons, a language like Julia that implements this "mindset" is going to be hugely important moving forward.

------

The fact that parallel fork-join scales from 2 CPU threads all the way up to thousands of GPU-threads... or CPU + SIMD (such as AVX512), is proof that this methodology is useful. I feel like people are sleeping on this model: its hugely useful for scaling on what I believe to be the computer of the future.


Could you be more precise about what do you mean by fork-join model? Is this similar to how in Go starting a thread is easy and waiting for it to finish and collecting results is easy as well?


I don't program in Go, so I can't speak to that.

The fork-join model generally starts off with one-thread, often called "The Single" or "The Master". Its where main() begins.

But somewhere along the line, you discover a portion of code that needs to be multithreaded. So you fork into many threads: maybe dozens of threads, or thousands in the case of SIMD or GPUs. This multithreaded portion is then "joined", that is, the "master" thread refuses to continue until all children are done executing.

Lets say there's a raytracer you are writing.

    main() {
      // In the "Single" thread
      Foo();
      Bar(); 

      fork(doRayCastingInParallel(), 5-million); // Spawns 5,000,000-threads, one for each ray
      // Implicitly wait for all 5-million threads to finish
      
      doSomethingElse(); 
    }
Now, raycasting has a bunch of things to do, but you can break it up into the same steps for all 5-million threads.

    doRayCastingInParallel(int myIdx){
      doTrigonometry(ray[myIdx]);
      findCollision(ray[myIdx]);
      outputColor[myIdx] = calculateColors(ray[myIdx]);
    }
So from both the "single" thread, as well as your "child" thread, it feels like you're writing single-threaded code. But in actuality, your "child" thread is a 5-million spawned copy, executing in gross parallelism.

The "Fork-Join" model assigns a different "myIdx" number to each thread. So Ray#500 knows to do things differently than Ray#68121.

But otherwise, they run through the same code.

---------------

The important bit here is how general this mindset is. Its commonly used for GPU-programming (CPU issues a GPU command to spawn millions of threads often one-per-pixel, or even thousands per pixel). Then the CPU waits for the GPU threads to finish, and then just carries on.

But ISPC proved that you can use this fork-join model on SIMD units like AVX512 or SSE. Julia and Python are showing off how you can use it in higher level languages. Etc. etc.

OpenMP further proves that fork-join works on normal CPU threads. Maybe there's some other CPU-threading languages (but OpenMP is the one I'm personally familiar with).


Isn't that how nearly all multi-threaded code is written? Wouldn't the only other option be fire-and-forget? I've been writing code like this is nearly every language I've ever used for decades.

Are you talking about the actual fork() function, instead of, say, spawning inside a loop explicitly?


> Isn't that how nearly all multi-threaded code is written

Producer-consumer queues say otherwise. As does Async. There's also message-passing. There's also thread-pools.

Task-based programming doesn't wait for the tasks to finish. So without the "join", you don't have an easy synchronization point to rely upon, and have to use explicit locks for synchronization.

------

Producer - consumer can't work with fork/join, because there's no "join" at all! The producer keeps producing, and the consumers just keep consuming.

Async is... a tangled mess. I'm pretty sure its just the modern "unstructured goto" style of spaghetti that has been rediscovered. Yeah, its super-efficient, but no, its not really easy to program.

Task-based parallelism is probably my next favorite style. Fork-join is super easy, but the join causes a significant waiting period and underutilization of the processor. Task-based provides more flexibility (at the cost of a little bit more complexity).

------

There's a ton of other models of parallelism. But fork-join seems to be the methodology that more and more work is being focused on.


> Async is... a tangled mess. I'm pretty sure its just the modern "unstructured goto" style of spaghetti that has been rediscovered. Yeah, its super-efficient, but no, its not really easy to program.

That's not inherent to async though, just a property of the implementations that have seen widespread adoption. There's nothing stopping async from implementing the same process tree semantics; in fact, there's at least one high-level async library that does exactly that (python-trio, https://trio.readthedocs.io/en/stable/)


As I mention in the article, Julia has task-based parellelism now, with a @spawn macro.


That's wonderful to hear!

I'm looking into it: and it looks like "Stupid Fibonacci" proves that its working in Julia (https://julialang.org/blog/2019/07/multithreading/).

Task-based parallelism is fraught with more overhead than other patterns. But it really is a powerful construct: converting recursive ideas into multithreaded concepts rather easily.


"thousands in the case of SIMD or GPUs" What does 'Single Instruction Multiple Data' have to do with threads?

Also fork join sounds just like normal threads to me in the way you are describing it.

Sorry what an I missing?


Fork join by and large doesn't use mutexes.

If you need to synchronize, wait for the next fork/join cycle instead of doing explicit mutexes. The "join" provides your synchronization point.

If you have a complicated set of highly-synchronous calculations, then you don't fork at all. You simply use the "Single Thread" to perform all those calculations (and therefore negate the need of cross-thread synchronization).

Fork-join has low-utilization, but its very easy to program. In some cases, fork-join remains efficient (ex: spawn a thread per pixel on the screen), because all pixels do NOT need to synchronize with each other (or call mutexes).

-----

If a sync point within the children are needed, you usually make due with a barrier instead of a mutex.

> What does 'Single Instruction Multiple Data' have to do with threads?

SIMD processors, such as GPUs, emulate a thread in their SIMD units. Its called a "CUDA Thread". Its not a true thread in the sense of CPU-threads, but the performance by-and-large scales as if it were real threads. (With exception of the "thread-divergence problem")

Ultimately, the fork-join model translates trivially to SIMD. Any practitioner of CUDA, OpenCL, ROCm, or ISPC can prove that to you easily.


Yeah in this model you wouldn't need a mutex because each "thread" is independent and operates either on independent data or uses constant-but-shared data. As soon as the result of one thread depends on the result of another thread you have to have some mechanism for synchronization.

I mean it's not really any different from writing a C/C++ program that avoids use of mutexes by having each thread operate on separate parts of the process address space. I'm still intrigued but it's not mind-blowing to me to fork a bunch of threads and join them when the function execution completes.


There's nothing mindblowing about Edgar Dijkstra's "Go To Statement Considered Harmful". Which largely argued that you should organize your code into easily composable function calls. Kinda obvious in hindsight.

Its more about discipline than anything else. A recognition that fork-join is much easier than other methodologies (such as Async).


Thanks for the explanation. The only SIMD programming I've seen is where the programmer would carefully call the cpu-brand based instructions and painstakingly manage the memory register making sure the numbers to be added, multiplied etc. is evenly divided then given to the SIMD ALUs.

Sounds like what you are saying that fork join model translates easely by the compiler to these SIMD instructions?

Some compilers can also vectorize plain loops, but you would advocate for fork join?


> Sounds like what you are saying that fork join model translates easely by the compiler to these SIMD instructions?

Why do you think CUDA has become so popular recently? That's exactly what CUDA, OpenCL, and ISPC does.

> Some compilers can also vectorize plain loops, but you would advocate for fork join?

CUDA style / OpenCL style fork-join is clearly easier than reading compiler output, trying to debug why your loop failed to vectorize. That's the thing about auto-vectorizers, you end up having to grok through tons of compiler output, or check out the assembly, to make sure it works.

ALL fork-join style CUDA / OpenCL code automagically compiles into SIMD instructions. Ditto with ISPC. Heck, GPU programmers have been doing this since DirectX 7 HLSL / OpenGL decades ago.

There's no "failed to vectorize". There's no looking up SIMD-instructions or registers or intrinsics. (Well... GPU-assembly is allowed but not necessary). It just works.

-------

If you've never tried it, really try one of those languages. CUDA is for NVidia GPUs. OpenCL for AMD. ISPC for Intel CPUs (instead of SIMD intrinsics, ISPC was developed for an OpenCL-like fork-join SIMD programming environment).

And of course, Julia and Python have some CUDA plugins.


Must admit never tried it. Thanks for the insights I'll have a go at some point.


If you got an OpenMP 4.5 or later compiler (and GCC / CLang both support OpenMP), you can also use #pragma omp simd.

https://www.openmp.org/spec-html/5.0/openmpsu42.html

Its not as reliable as a dedicated language like OpenCL or ISPC. But this might be easier for you to play with rather than learning another language.

OpenMP is just #pragmas on top of your standard C, C++, or Fortran code. So any C / C++ / Fortran compiler can give this sort of thing a whirl rather easily.

---------

OpenMP always was a fork-join model #pragma add on to C / C++. They eventually realized that their fork-join model works for SIMD, and finally added SIMD explicitly to their specification.


And Fortran co-arrays, no?


Fortran Coarray is far beyond simple Fork-Join. It enables one-sided remote memory access, something that is impossible with OpenMP or CUDA, as far as I am aware, and requires the highest levels of skill to do it right in MPI.


Do you mean it doesn't use explicit mutexes? I don't see any way this model would avoid using mutexes (or some mutex-like construct) under the hood, in which case I'm not sure I see the advantage.

The term "fork/join cycle" is intriguing and meaningless to me as a non-Julia user. What exactly is this cycle?


Well, I only visit Julia now and then. I've been programming various parallel programs though for myself in a variety of languages, trying to grok parallelism better.

> The term "fork/join cycle" is intriguing and meaningless to me as a non-Julia user. What exactly is this cycle?

https://www.researchgate.net/profile/Alina_Kiessling/publica...

There are many forks-and-joins across a program, when you're doing the fork-and-join paradigm. Each time the threads need to communicate, you join, and then use the "master" thread to pass data to all of the different units.

-------------

For example, most video-game engines issue a fork to calculate the verticies of all objects in the video game (the fork turns into a GPU call). This is called the vertex-shader.

Once all the verticies are calculated, the GPU joins these threads together, and the main-program / game engine continues.

The next step is the geometry shaders: so the CPU forks (aka: spawn thousands of GPU threads), and joins on the results of the geometry shaders. (Tesselation may spawn more vertexes. Ex: you model a rope as a square, but then the geometry shader turns the square into a rope-shape at this stage)

Then the pixel shaders. For every pixel of your 1080 x 1920 screen, a GPU SIMD-thread is forked off, and each pixel's final color is calculated based on the results of vertex-shaders and geometry shaders before it.

Each of these cycles is a fork-join cycle. Thousands of threads spawning, thousands of threads joining, the CPU calculating some synchronization data together, and then spawning thousands of threads again.

(In practice, modern game engines are now async for speed reasons. But the general fork/join model is still kinda there if you squint)


Classing for example ISPC as low level invalidates this comment for me; ISPC among others provides a great interface to SIMD model for CPUs that simply isn’t available in Julia.


Julia's SIMD programming model is still very much a work in progress; I think we have a way to go in providing the kind of flexibility and control that languages such as ISPC, Halide, TVM, etc... provide.

That being said, packages such as SIMD.jl [0], and LoopVectorization.jl [1] are making fantastic progress, to the point that LoopVectorization forms the basis of a legitimate BLAS contender, in pure Julia [2]. It's not totally there yet, but it's close enough that real work is being done in LV at OpenBLAS-like speeds.

As an aside, I find it incredible that these kinds of extensions can be built in packages thanks to the fact that Julia's compiler is extensible enough to allow for direct manipulation of the LLVM intrinsics being emitted by user code.

[0] https://github.com/eschnett/SIMD.jl [1] https://github.com/chriselrod/LoopVectorization.jl [2] https://github.com/MasonProtter/Gaius.jl


It’s not a jab at Julia, really rather that ISPC provides a workable model that would later on be nice to see elsewhere.

> find it incredible that these kinds of extensions can be built in packages thanks to the fact that Julia's compiler is extensible

Come on, jeez.. Julia’s compiler is a Lisp-based LLVM driver: of course it can do these things.


ISPC can be really good at SIMD-ing complicated control flow (ray tracers being the archetypal example). I'm interested in eventually working on something like that for Julia. In the mean time, it should be possible to deliberately write code to be compatible with something like SIMD.jl. I think I'd work on a project of trying to get that working via multiple dispatch with at least a moderately complex project, and let those experiences inform the kind of transforms an automatic compiler would need to both work and get good performance.


> It’s not a jab at Julia

I didn't take it as such; there are legitimate shortcomings to any tool, I just wanted to provide pointers to other readers that the devs are aware of it, and that there is ongoing development to address it. :)

> Come on, jeez.. Julia’s compiler is a Lisp-based LLVM driver: of course it can do these things.

As someone who, before Julia, was firmly entrenched in C/C++/Python land, I suppose I am discovering many of these "obvious" things for the first time. :)


> I suppose I am discovering many of these "obvious" things

sorry for the flippant remark then! it's great to be in discover mode, enjoy ;)


ISPC is higher level than its peers, but its still new/delete based manual memory allocation. So in the great scheme of programming languages, its still rather low level (since you're manually handling memory).

Indeed: ISPC provides constructs to provide structure-of-arrays, and other low-level memory layout details. This is a good thing: these details have significant implications on the speed of your program.

Nonetheless, any language which wrangles with manual details of memory layout, or new/delete based memory allocation, is inevitably going to be classified as low level in my books.

> ISPC among others provides a great interface to SIMD model for CPUs that simply isn’t available in Julia.

If Julia can compile into GPU-assembly (which is innately SIMD), I'm sure an AVX-based compile could work eventually.

They may have to target AVX512 (since most GPU-assembly requires per-thread exec-masks), but the general concept is being proven as Julia can now compile down into PTX or AMDGPU assembly.

Julia's compile down to GPU-assembly / SIMD code is not supported in the "general case", only in select circumstances. But that's still an incredible boon for a high-level language.


> but its still new/delete based manual memory allocation

I guess we aren’t solving the same problems: memory allocation is trivial is my domain, mapping complex nested control flow in SIMD is the hard part.

> Julia can now compile down into PTX or AMDGPU assembly.

Sure but Julia-as-syntax is nothing special now, Numba does this for a Python as well.


you can and you can't. Julia is composable -- Suppose I want to write a library to find compression polynomials for reed-solomon encoding system (see mary wootters' talks on youtube) for my storage product. I need a LU decomposition algorithm that operates on GF256 (which is just an 8-bit int). But the +/-/x/divide operations are all messed up. I'd have to rewrite LU decomposition. How confident are you that you can get even that right? I'm pretty good at implementing algorithms, but I've messed up LU decomposition before.

Then suppose I rewrite the LU decomposition algorithm, using python. Now I want to accelerate the search by running the search on GPUs. I have to re-rewrite the code from scratch. Each GF256 encoding has to have rejiggered operators, and so I need to rewrite custom GPU kernels, then figure out how to resequence the operations (* looks different for each GF256 encoding), etc.

This is all SUPER easy in julia.


> (see mary wootters' talks on youtube)

https://www.youtube.com/watch?v=Gh578e98qAk

> Suppose I want to write a library to find compression polynomials for reed-solomon encoding system (see mary wootters' talks on youtube) for my storage product. I need a LU decomposition algorithm that operates on GF256

Surely you use isa-l[1].

[1] https://github.com/intel/isa-l

> Now I want to accelerate the search by running the search on GPUs.

GPUs are float oriented so I don't think you'll get the performance you hope for out of 8 bit integer operations. If you have interesting results to share I'd like to read them.


> GPUs are float oriented so I don't think you'll get the performance you hope for out of 8 bit integer operations. If you have interesting results to share I'd like to read them.

You've never seen GPU Hashcat, or GPU Bitcoin / Ethereum mining?

Vega now has a huge focus on INT8 operations. NVidia can perform int operations in parallel with float operations (superscalar GPU cores)


> You've never seen GPU Hashcat, or GPU Bitcoin / Ethereum mining?

I've heard of it, but afaik it hasn't been profitable to mine using a GPU for a long time due to the competition for hashes, power consumption, and rate they mine at. This is in contrast to ASICs which can mine even faster for less capex and opex.


Then use Julia, that's great. My comment was that the SIMD support in Julia isn't sufficient for my problem, not that you can't do your GF256 linear alg on a GPU..?


> I guess we aren’t solving the same problems: memory allocation is trivial is my domain, mapping complex nested control flow in SIMD is the hard part.

Those statements are just confusing to me. In most GPU-code, you have crazy amounts of parallelism and therefore don't really care what order those statements execute in.

As such, if you can allocate memory, and map those if/else statements into a collection of queues and/or stacks (depending on if you want a breadth-first search, or depth-first search pattern).

In effect: you use memory allocation to solve the complex control flow issue. Maybe its more obvious with code:

Instead of doing:

    if(baz()){
      foo();
    } else {
      bar(); // Thread divergence!!
    }
Do:

    if(baz()){
      pushIntoFooQueue();
    } else {
      pushIntoBarQueue(); 
      // Thread divergence, but not much of a penalty
    }

    while(fooIsNotEmpty()){ // No thread divergence at all
        task = SIMDPopFoo();
        task.execute();
    }

    while(barIsNotEmpty()){
        task = SIMDPopBar();
        task.execute();
    }
This is heavier on the memory units, because you now have to manage the data-structures. But this style practically negates the thread-divergence penalty completely. If you're lucky, your fooQueue and barQueue fit in __shared__ memory.

Bonus points: Not only is thread-divergence negated, but you also achieve effective load-balancing across your workgroup. If Thread#0 spawns 20 items for FooQueue, after pushing/popping from the queue, those 20 Foo-tasks will be assigned to 20 different threads.

-----

From there on out, you're just pushing / popping different parts of your code to various queues and/or stacks. But this is only really a valid solution if you have a decent memory allocator that knows where and how to clean up these queues / stacks (especially if you have nodes starting to point to each other for dependency management)

I haven't really solved this problem "in general", but is clear that the queues should be sorted into topological order, and that any tasks that depend onto each other need to be run in different iterations. It really depends on how much you're willing to spend on organizing this execution information.

In any case: the memory allocation issue is one-and-the-same with the thread-divergence / complex instruction flow issue to me. You need to create a data-structure that organizes the instruction flow, and any complex data-structure will need memory management as soon as you start linking things together. The above uses a queue or stack, but things can get more complicated.

--------

Not that Julia, Python, ISPC or anything really... solves this problem. But memory management is very useful for this "style" (I'm mostly doing ref-counted C++ with a custom allocator myself. But such trees or graphs of links can grow into the CPU-side and end up using the default memory allocator)


> you use memory allocation to solve the complex control flow issue

This is an interesting remark I have to sleep on. I usually don't see this as possible, since I have a bunch of arrays which are allocated once, then a bunch of nested but static control flow. The only way I've found to go beyond single thread performance is with "whole program" SIMD which only worked in the ISPC/OpenCL/CUDA programming models.


It doesn't work all the time. Sometimes its just faster to have thread-divergence than to hit VRAM constantly.

But its a different paradigm to do things: something to try if your standard if/else stuff isn't working out.

EDIT: If you can make due with just stack-allocation (or queue-allocation), if you have singular-sized tasks with predictable sizes, if all the information fits inside of __shared__ memory, and if thread-divergence is normally a problem... this methodology will probably work.

Oh, and for a hint:

    myIdx = prefixSum(__activemask());
    myTask = stack[stackTail + myIdx];
    if(myIdx == 0){
      stackTail -= __popc(__activemask()); // Lulz, I made a bug. Whatever, I think you understand...
    }
    __syncThreads(); // Barrier is important
Stack pops and pushes are pretty easy. activemask() provides your execution mask, and the logic needed to synchronously push / pop items to a stack (and probably to more complicated memory allocation functions that I haven't figured out yet)


> I'm sure an AVX-based compile could work eventually.

Eventually is nice but it’s also easy to imagine that it never gets done. ISPC works today.


The CUDA portion of Julia works today, and you can get 10Tflop or 20Tflop GPUs under $1000 to play with it.

It's a different language for a different computer. But the fundamentals are still there, laying the groundwork for the future.

Sure, ispc can win on latency. But the raw compute girth of GPU SIMD cannot be underestimated.

In any case, Julia has proven itself capable of compiling down to a SIMD instruction set and achieving nearly the full performance of those GPUs. Even if it's not a computer you prefer, the programming model and technology demonstrated here is clear.


I agree with this and have the hardware but simply prefer to write CUDA kernels directly instead of in Julia syntax.

My comment was more geared to the CPU side of things since I write code whose users don’t usually have GPUs available.

There are also problem sizes which don’t fit the GPU’s more stark memory system separation: tens of CPU cores with tens of MBs of cache can move past GPUs in terms of memory bandwidth for such cases so banking on CUDA just doesn’t work for everyone.


OpenMP, in particular, is not limited to fork-join, and you don't want to be so limited for efficient, scalable numerical systems. Not that you get large-scale parallelism just with OpenMP. A recent keynote of Jack Dongarra's covers replacing the decades-old use of fork-join for (or course) dense linear algebra: https://www.iwomp2020.org/wp-content/uploads/iwomp-2020-K1-D... It's significantly in the past, not the future, of large-scale computation of the sort Julia particularly targets. I don't know why Julia should be stuck with that, though. Parallelism isn't new to Lisp-y systems, of course.


I guess I still see Julia as a quick and dirty prototyping language akin to Python and some others. Clearly, Julia has ambitions for greater capabilities.

Apparently, Julia now supports some more complicated forms of parallelism: Async, @Spawn (aka: Tasks), and others. But @Thread based for loops are just easier to grok.

Overall, Julia is one of the few high-level languages that is explicitly adding in highly-threaded concepts like Fork-join, SIMD, or even GPUs to the language.

Yeah, some Python libraries extend these concepts to Python. But something like @Threads for ... end really demonstrates how the fork-join parallelism is a first-class member of the Julia language.


Any language supporting (current) OpenMP supports fork-join, tasking, SIMD specification, and offload, though you still want distributed memory, and overlapped communication and computation. Baking particular technology into a system isn't a good idea long-term, or even medium-term, unless you wait for the cycle of re-invention, like vector computing. (I don't know if Julia really does that.)


Fork join parallelism is just one technique for using multiple cores and it is pretty ubiquitous. Most large arrays can be read by multiple threads at one time to let them all come up with part of the answer. I don't think this is specifically an advantage of julia.


Julia as a whole still lags behind Python or Matlab (unsurprising given the age difference) but there are some packages in Julia that just steal the show compared to Python et al, such as the SciML packages and the probabilistic programming packages. I switched to Julia for Turing.jl when I was previously using Pyro. Much happier with Julia.


The best thing about julia is the interop with essentially any language you can name. Want to call numpy?

using PyCall

np = pyimport("numpy")

res = np.fft.fft(rand(ComplexF64, 10))

The interop with matlab and cpp is similarly painless.

https://github.com/JuliaInterop/Cxx.jl


I agree that this is great, but I think saying it's Julia's best attribute is underselling the language quite a bit. Personally I like how effectively it can replace those languages, at least for future projects.


You are right. One of the best things.


Why is language interop not this simple for everything?

It seems insane to me that it's this easy in Julia, yet most other languages are completely incompatible without transpiling one into the other.


Magic of macros.


To be fair, there are no macros in that example. Just good ol polymorphism and exceptionaly hackable semantics.


Pycall does rely on macros and so do essentially all the other interop packages.


Sure, they all provide nice macro interfaces, but nothing in the above example required those macros as far as I understand.

This is not to say macros aren't awesome.


I'm pretty sure that macros are involved in the invocation of the python function.


Why interop with a language that already solves the problems? Just use that language.


Because it's a PITA sometimes to use c++. It's easy to accidentally go out of bounds on stuff.

I've arguably got a lot more experience doing scientific programming in c++ and FORTAN, than I do in python. Yet for quick prototyping where performance is almost never an issue, I still elect to use python because it lets me wrap my head around the problem faster. I spend less time fixing stupid errors and thinking about coding and more problem thinking about the problem I really need to solve.

Julia might be even better, I've heard good things, I've just never used it though. In the scientific computing and data science world that I deal with, I'd say at least 95% of the time there's never a need to jump from a hacked together prototype to a more production level product.


My experience with Julia is not deep, but I’ve found it’s really nice to iterate on something that floats towards something that flies in one ecosystem. It’s not harder than Python to get the working implementation up and going - really there is a lower volume of idiosyncrasies.


The only annoying thing I've found is the startup time for the REPL / notebook environments, unless you keep one running all the time. Importing biggish libraries takes some time even if they're already compiled.

I ended up making a custom system image with the libraries I always use (Plots, etc) which does help on that front.

That's a fairly minor gripe, mind you. It's still great.


Some libraries implement i/o formats taht you'd never want to spend time implementing, but are available in Python. Nibable is an exmaple, of neuroimaging formats I never want to understand the details of, but would need to work in Julia.


Hey neat! I use Nibabel in my work! Small world. :>


If language X solves 9 of my 10 problems, and language Y solves the last one of my problems, I can either, A: use language X and call out to language Y to solve all my 10 problems. Or, B: use just language Y to solve 1 of my 10 problems.

You suggest alternative B?


Because julia is a better language. Numpy is written in C, why use python when you can use C?


I do not want to use C/C++ at all. I would like to use Numpy and Pandas because those projects are essential to my workflow.



What is going on here?


Lengthy, nuanced discussion about benchmarking between Turing devs and Stan devs.


I love Julia, but I think comparing Pyro with Turing is pretty unfair.

Pyro can scale to pretty big datasets thanks to custom inference via guides. So you can do inference on pretty sophisticated models such as Deep Markov.

Whereas Turing is mostly for small or medium sized models. For learning purposes, for smaller models or for non-parametrics Turing might be a much better choice, though.


But isn’t the good scaling due to using variational inference rather than full MCMC? Turing has VI too.


Not just that. Aside from the fact that Pyro has a lot of guide tooling, you get for example (hybrid) message passing for HMMs. This is crucial to scale beyond toy data.

Within Julia, ForneyLab.jl is quite cool for non-deep state space models. I also like Turing.jl, but it has different tradeoffs.


the "good scaling" is due to variational inference, which turns inference problems into optimization problems, but one has to verify the variational approximation holds, etc.

In my experience, Pyro (generally variational inference) is more like a Bayesian optimization than full inference, since it will simply miss multimodality or non-Gaussian tails, etc.


I don't have a CS background, so maybe that's how I'm missing the point here, but I don't see what is so supposed to be so special about Julia's multiple dispatch. In fact, reading more about it and knowing Python, single dispatch/polymorphism feels more natural to me. This would allow a `plot` function in Python, originally written for the float data type, to also work for float-like (quacks like a duck) data types, like `Decimal`. I struggle to see what about Julia improves on this situation or makes it conceptually different.

Lastly, the article mentions that `plot` also works for uncertainty-carrying floats, because the `Measurements` package has implemented "a simple recipe" for this case. But doesn't this mean very tight coupling? The `Measurements` author very specifically (if I understand this right) implemented a `plot` functionality. For the end user, this allows things to just work, which is great. But under the hood, it seems this depends on package authors having to be aware of all possible use cases for their library code.


The core problem with class-based object orientation is that methods go inside of the type instead of being added externally. That means everyone has to agree on what methods can be called on a class or they have to subtype it, which sounds harmless, but is actually a big problem when what you really wanted to do was just define a new method for an existing type. My best attempt at explaining: https://www.youtube.com/watch?v=kc9HwsxE1OY

There's also the issue that one sometimes needs to specialize generic operations in generic algorithms on something other than the receiver—sometimes even on more than one of the arguments. Single dispatch forces you to use something slow and awkward like double dispatch in cases like this. And that's assuming the person who wrote the generic code anticipates the need for specialization! If they don't allow for it, then you're just stuck. With multiple dispatch, you can just define the method you need the specialization for and you're done.


> This would allow a `plot` function in Python, originally written for the float data type, to also work for float-like (quacks like a duck) data types, like `Decimal`.

So let's say you do that. You define your own new number-like type, say Complex. You pass that in to plot(n).

Somewhere in the body of plot(), though, it turns out there's some code like:

    # Flip to put origin at bottom left.
    window_height = 400
    y = window_height - n
So now plot() tries to invoke "-" with an int on the left and your complex number on the right. It passes your complex number to the int class's minus operator, which has no idea about your type and barfs.

Python has a hacky workaround for this specific case where the right operand can implement __rsub__(), but that hacky workaround exists specifically because Python doesn't have multiple dispatch.

If it supported multiple dispatch natively, you'd just define an operator - that took a number on the left and a complex on the right, plot() would call it, and you'd be good.


Wow, nice example. Made possible also because arithmetic operators in Julia are functions, and you can extend them to work on any data types. (Operators like "+" have hundreds of methods.)


Plot recipes are a really good fundamental example of composability in the Julia ecosystem. Extending a plotting library cannot be a function because plotting functions don't compose. Want to support plotting quaternions in matplotlib? You'd write a `plotquaternion` function. Then the ODE solver makes a `plotode` function for the output object of its ODE solver. Now let's say you got ODEs solving on quaternions... what about plotting? Well you have to modify `plotode` to call `plotquaternion` or modify `plotquaternion` to call `plotode`, and this is how you get monorepos and the big functions we all know.

Plot recipes are different. The Plots.jl system calls the type recipe function recursively on the plotting data `X` until it sees a set of primitives that it knows. Libraries define dispatches to this function to denote how data types should transform to something more primitives. Quaternion numbers become four independent series of numbers (and add keyword arguments for doing things like modulus). The ODE solver library writes a recipe that say, if you try and plot an ODE, transform it into arrays of time series. Now plot on a quaternion ODE solution recurses into 4x time series of the solution, which is the plot you'd want to have. This is a feature I use all of the time: I don't generally do more than just `plot(sol)` since recursive recipes generally give me the plot I want (or else I open an issue because someone is missing a recipe).


__rsub__ isn't a hacky workaround. Magic methods are a fundamental part of python's object model. If you think it's hacky, I think you don't grok python.

> If it supported multiple dispatch natively, you'd just define an operator - that took a number on the left and a complex on the right, plot() would call it, and you'd be good.

This doesn't come across as a win to me.


What happens when it's an operation that doesn't have magic fallback methods like the arithmetic operators do? Say the code is `max(400, n)` instead. Now what? The `max` function doesn't know about your special type, so the code fails. Magic methods simply don't scale: they solve this problem in a few very specific cases and nowhere else. With multiple dispatch, you have a completely general solution to this entire class of problems: you just define a new method of `max` that knows how to handle your new number type and everything works.


> This would allow a `plot` function in Python, originally written for the float data type, to also work for float-like (quacks like a duck) data types, like `Decimal`.

This might come with sacrificed performance. If you want a different implementation for a different data type to get the best performance, writing one method makes that hard.

> The `Measurements` author very specifically (if I understand this right) implemented a `plot` functionality.

This is a good point. But in that example, note that `Measurements` also works with all of differential equations, and they didn't implement any differential equations specific code, unlike as you pointed out with `plot`. The fact that uncertainty propagates through a differential equation solver is pretty impressive, imo.


Yeah as I was reading this it occurred to me that a startling number of lab reports I wrote as an undergrad would have been much easier to do in Julia. We normally used Excel for the calculations and had to check all of our propagation of errors by hand.


I really wish someone could get traction with a replacement for Excel with better correctness options. So many people use it because it's easy to get started with but then find themselves in situations like what you mentioned where there's a lot of manual work (or undetected mistakes) to maintain it.


I think we're getting there, slowly. Quite a lot of people who don't in any way think of themselves as programmers now reach for python/R/julia for tasks that would once have been done in Excel.

https://github.com/fonsp/Pluto.jl is a super-nice way of just starting, without too much setup cost. I think it will soon be self-contained as to what packages & versions are needed, too.


The biggest barrier to an alternative in these situations is the learning curve. If a professor can teach it to a bunch of lower-division undergraduates in a single class and receive their reports in a uniform format then it'll be a hit.

A bigger barrier here is that we had to know how to apply propagation of errors, so although we could use a tool like this to do the calculation we still had to generate a set of equations for the lower bound, best estimate, and upper bound. There's a seductive argument to be made for just plugging those equations into Excel instead of using them to validate that the Julia functions are working correctly.


Mesh can do this, somewhat - exceptions in table columns stick out like a sore thumb in the sheet source.

http://mesh-spreadsheet.com


Julia!


I realize you like it but think about what we're talking about here: someone has a room full of students who have not been trained as programmers and they're working on a particular experiment, not taking a class on software development. Most of them have probably used Excel/Numbers before and if not the basic idea, especially given a template, is pretty easy to explain so you can get them up in running quickly and get back to the actual problem you're working on.

Now contrast that with how much stuff you need to learn to be proficient with any programming language — and no matter how much you think your favorite one is a friendly unicorn, teaching a room full of beginners will be eye-opening for the things they get stuck on. Which editor to use? How to install it? Are you saying “$LANG” but actually using that as a shorthand for “$LANG, Git and shell tools, and the conventions for using which are common in my specialty”? How much time does it take to learn the basics of the language, how to interpret error messages, and debug things?

None of that is insurmountable, of course, but the difference in learning curves is why many people end up with the Excel file which started in a hurry and is now Frankenstein's workbook which everyone is afraid to touch or share. Over a sufficient time interval, it'd be much easier to pick any decent toolchain because maintaining that level of complexity in almost any major programming language is going to be less work but unless you're starting with experienced developers the short term answer will probably favor Excel. I've seen people who've been meaning to get around to rewriting it long enough that the target language has shifted as things fall in and out of favor.


Multiple dispatch is nice because it makes extending library (or any) code very trivial.

The seamless way in which Julia libraries can be mixed and matched and used together is a testament to this. Python on the otherhand, each library is its own little world and cannot be used in novel or unexpected ways.

Moreover, the unperformant nature of Python means that you must use libraries in very cookie cutter ways or else performance falls off a cliff.


As the name implies, single dispatch is just a special case of multiple dispatch. Everything that can be done with single dispatch works exactly the same (arg1.f(arg2...) is just f(arg1, arg2...) in Julia), so there is no loss in expressivity. The only conceptual difference in multiple dispatch is that arg1 is not considered special, which makes sense for a scientific language as for example in 1 + 5 the operator '+' isn't really owned by the "object" 1, it's a property of the whole formula (the combination of the operator itself and all operands).

And the consequence in terms of programming is exactly that it that it shifts the responsibility of handling interoperability to whoever creates a new type. For example if I create MyType and want a method SomeoneType + MyType to work, I don't need to change the '+' implementation on SomeoneType for that to work (for example making a PR on his library), I only define methods with the type I just created. Which is why people say Julia ecosystem compose a lot, the creator of SomeoneType can just focus on what he wants and possibly never even know about MyType, MyType will extend it with whatever I want without having to re-implement stuff that SomeoneType already implemented (only MyType stuff and the intersection between libraries, defined by methods that use MyType and SomeoneType), and the final user can just import all stuff he wants and have a library that looks like a monolith but is actually many smaller libraries working together this way.


I wrote two articles which may clarify this better. One is about creating custom temperature units in Julia and Python. It shows how multiple dispatch gives an advantage over single dispatch: https://medium.com/@Jernfrost/defining-custom-units-in-julia...

This story is more like a general intro to Julia, but it has an example with Knights, Pikeman and Archers fighting each other (sort of rock, paper, scissors game), which also shows the utility of multiple dispatch.

https://levelup.gitconnected.com/knights-pikemen-archers-and...

But to give a quick idea here. Imagine writing functions for intersecting two geometric shapes. The algorithm for intersecting a circle and square is entirely different from intersecting a polygon and a line segment. The specific algorithm needed will depend on BOTH shapes not just one. Python can only dispatch on one of the shapes, not both.

So in Julia I could write a different function implementation like this:

    intersect(c::Circle, r::Rectangle)
    intersect(c1::Circle, c2::Circle)
    intersect(r1::Rectangle, r2::Rectangle)
and so on.


I appreciate your first article demonstrating the differences between Python's single dispatch and Julia's multiple dispatch, but your python example suffers from a lack of a child-aware parent class, Temperature, that could resolve the complaints raised. You're building inherently coupled classes in a decoupled way and complaining that they don't mix well.


Apart from CS and how software construction works with these things in practice, I reckon multiple dispatch is natural. At least it is to someone with a physics background, which is the context in which I encountered it long ago. I'm used to thinking essentially functionally and in terms of operation->arguments/objects (or verb->subjects), not something with the order and asymmetry of, say, object.operation(object). I wouldn't expect an operation to be tied just to one of the operands. I've seen the result of people trying to explain to scientists, say, Java-like "OO" as if it was somehow mirroring the sort of things they work on.


A couple of the other replies might have answered this for you already. You are correct that the recipe is specific to the Plots package. But consider: once written, it works not only for that specific plot function, but many others in the package as well, for example scatter. Also, I have written these recipes for my own data types, and they are incredibly easy to create. The recipe informs the functions in Plots how to handle your datatpye, so you can just pass it in as an argument. The thing is that I’ve never even looked at the code in the Plots package. With Julia you can use functions on combinations of your own custom datatypes to do new things, without looking at the code for the functions. Often nothing like the Plots recipes are required; but here the recipe lets you specify how you want your datatype to be visualized.


The deal with multiple dispatch, as I understand it, is that instead of obj->somemethod, you define function named somemethod, which is defined for obj. Maybe some other code calls it.

You can then define somemethod for other data types, without touching the original definition, and that makes your new definition available to the other code that used the original somemethod. It is a unique feature of Julia (at least unique in a sense that it is absent from mainstream languages)


The thing that distinguishes multiple dispatch from single is that you can dispatch based on the types of all arguments to the method, not just one. E.g. you can define

function somemethod(x::Int, y::Int)

function somemethod(x::Int, y::Float64)

function somemethod(x::Float64, y::Float64)

Multiple dispatch is available at least in Common LISP, though I guess that's not considered particularly mainstream.


As far as I know, multiple dispatch is optional i clisp.

It also exists in R, but again, is optional (and slow), and is therefore mostly unused, which limits its usefulness, making it less used, etc. etc.

In Julia, it is default, so everything participates in multiple dispatch. It completely pervades the language, which is what makes it so useful.


It is also available in Perl6, but there, as in CL, it is opt-in, and might slow things down.


Right. I (incorrectly) assumed that "multiple" meant multiple versions for different input types, not necessarily multiple arguments.


Isn't this the same as function overloading in some programming languages?


It's a form of function overloading, but works on runtime types. for example let's say we have a Dog and Cat, both subclass of Animal. And you define f(Cat, Dog), f(Animal, Animal). In languages without multiple dispatch, if you call it with a Cat object and a Dog object it will choose the first as expected, but if you pass with two Animal objects/pointers it will dispatch to the second regardless of what runtime values they are.

If the language is single dispatch though, animal1.f(animal2) will accurately call the Cat or Dog method for the first argument (before the dot), but the second argument will work like above, it will call the Animal version and not the Cat or Dog ones. If you want a method to call Cat or Dog method based on runtime information you'd have to redispatch on runtime within f(Animal), which is basically a pattern known as Visitor Pattern [1].

In Julia (and multiple dispatch languages) defining f(Cat, Dog) and f(Animal, Animal) means that even when I call f(animal1, animal2) the compiler will convert to the runtime values and if the first animal is Cat and the second is Dog it will call the first method (as a form of specialization) and otherwise it will go for f(Animal, Animal) (which is the more general method). And the Julia compiler is really good at inferring runtime types at compile time, so even though it's runtime values it's still static dispatch (that's why it's also fast).

[1] https://en.wikipedia.org/wiki/Visitor_pattern


You can think of it as similar to operator overloading. It's not quite the same, but it's a subtle distinction, or rather, the explanation of the difference is subtle.

You can find examples where operator overloading and multiple dispatch actually behaves differently and gives different answers, but the comparison works as a first order approximation.


Yes, except dispatch happens at runtime, not compile time. It takes into account the dynamic type of each argument, not just their static types.


Well, almost. Dispatch can happen at compile time, and in fact does for type-stable code. But the dispatch is on run-time types, not static types.

(The fact that you can have static dispatch on dynamic types is a pretty subtle point that hurts my head, but the upshot is that you get semantically dynamic dispatch with static performance as an optimization.)


As the other answers note, multiple dispatch is largely a performance improvement. When writing Python, performance has usually been sidelined as a priority in the first place (within some constraints).

Personally, I hate this presentation. Multiple dispatch (to me) is cognitively heavier and not what I would default to. If it's a performance gain and I need performance, then sure. But too often people hold it up as a virtue because the need for speed is assumed.


What? I used to write python, and to me, MD is first and foremost a conceptual improvement. Makes writing code so much sweeter, more expressive and sensical.

It also heavily improves composability.

It's LESS performant if not done carefully. Certainly FastAI, Matt rocklin etc didn't reimplement a subset in python just to get slower code

See https://youtu.be/kc9HwsxE1OY


I agree, the performance is a very nice bonus, but it is primarily about expressiveness, composability and natural simplicity.

It just seems profoundly un-natural and arbitrary to limit dispatch to only the first argument.


There’s a current Fall 2020 class at MIT that uses Julia.

https://computationalthinking.mit.edu/Fall20/

There’s a live video on ray tracing starting in about 20 minutes

https://youtu.be/MkkZb5V6HqM


It's notable that the 3B1B guy is contributing to that class.


Julia is better suited to numerical computation. Matlab is expensive and not really a language. Python is not designed for numerical computation and can be verbose.

The issue for Julia is, if you stay in Numpy and Scipy in Python, and can use JIT when applicable, you are basically using C. You can make it fast using C libraries and there is not whole lot of room for Julia to shine. Even with C, I had to be lately a bit careful to beat Numpy.


I have quite the opposite experience. Numpy is terrible in terms of allocation overhead (even when you use the available, but limited, inplace operators and memory views). In Julia it is 𝚝̶𝚛̶𝚒̶𝚟̶𝚒̶𝚊̶𝚕̶ much easier to write allocation free code. And only so many operations are trivially broadcastable, after which numpy becomes very cumbersome and slow.

Numpy is a great piece of engineering, but its limitations are very noticeable when compared to Julia.

edit: not trivial, but much easier


As much as I like Julia, I think "trivial to write allocation free code" is a bit of an overstatement. Depending on what you are doing, it can be difficult, for example iteratively calling any of the LinearAlgebra methods, since there is no interface for preallocating work arrays (doing the ccall on BLAS yourself is not a fun work-around). It is also not always clear why something is allocating, even with the debugging tools.


Perhaps I misunderstand what a 'work array' is, but you can easily pre-allocate arrays, and then work on them in-place, including with LinearAlgebra functions like `mul!` et. al.


Higher level BLAS operations, such as solving a linear system, or computing svd/eigen, cannot be done in-place the way matrix multiplication can, and require additional memory of a predetermined, fixed size, called the work array. This cannot be pre-allocated in Julia, as there is no interface to do so in LinearAlgebra, so these BLAS calls will always allocate memory.


Have you looked thoroughly over https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/ ?

There's a ton of in-place operations there, including in-place solving of linear systems, and at least some stuff related to svd (though the names are pretty un-intuitive, being wrappers for BLAS and LAPACK functions).


Yes, many of these are "in-place" but will still allocate. I typically use the geev!/ggev!/geevx! routines, if you look at the source code you will see that the work arrays are still allocated inside the call. The in-place here (unfortunately) means only that the input is overwritten, not that there is no allocation.


The purpose of in-place operations is to avoid allocations, so this seems like an oversight.

Is this a matter of simply wrapping the 'right' LAPACK routines, or is there something missing in the interfaces that could be trivially added in principle?


It is entirely possible, but is not trivial, especially for the user who then needs to know "arcane knowledge" of BLAS/LAPACK work array sizes and flags. There was some discussion about this on github, but it sort of trailed off without a real resolution. I think it is considered too complicated/niche to be in base, and was recommended to be an external library, but nobody (myself included) seems particularly interested in what amounts to maintaining a fork of the entire BLAS package. The base devs would have more insight, but this is my view from the outside at least.


Hah, you were not kidding about the names being unintuitive.

For anyone else interested there is a large list starting here [1]. The in-place versions will all end with `!`.

[1] https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#Linea...


Numpy isn't actually very fast, especially if the data is small.


Julia is a wonderfully designed language and I fully anticipate making it my main language in a few years. Right now I will stick with R though, because R’s ecosystem is simply unparalleled right now. And for someone who is not terrible interested in programming, except as a way to get things done, that is what matters most.

But i really think that in a few years time, when Julia’s ecosystem has had more time to grow and develop it can rival R.


> And for someone who is not terrible interested in programming, except as a way to get things done, that is what matters most.

For what it's worth, I found that taking the plunge into julia is what made me love programming.

I'm constantly amazed by how easy it is to whip something up myself that before I would have relied on a built-in routine in another language. Doing this made me a better programmer and more capable of doing the things that 'matter most' as well.

Plus, whenever I whip something up myself, I try to share it with the community either as it's own package or as a PR to an existing package, so I get to feel involved in the progress as we get to the sort of ecosystem you would find acceptable.



Julia is an interesting language. I see that there's a lot of excitement for it, but most excitement seems to come from data scientists who are heavier on the engineer side. For almost all of the rest of the data scientists I know, Python seems "more than good enough".

Also to note, Julia has seen a rise over the last few years, but so has Python, its primary competitor in the data science field.


It's only good enough for data scientists because they don't know any better.

For existing projects, fine, stick with Python. But Julia is strictly superior to Python in everyway, there is nothing that Python is better at than Julia. This is partially because Julia can call any Python library, but mostly because it's actually designed for data science and scientific computing.


"Everyway" seems like unneeded flamewar fuel. Here are a few things where it seems that Julia is not strictly superior to Python:

Julia doesn't run everywhere that Python does. Especially if you include all of the Python implementations, including MicroPython.

Last I checked, Julia's startup cost was high, making it the wrong choice for short-lived programs. Yes, people do scientific computing in short-lived programs.

Now, I don't know where Julia stands in this, but Python is used to teach people to program, including high school students and younger. By the time Python was 8 years old (as Julia is now), Python was being used in high school courses. Python's built-in "turtle" module exists as part of that education component - at one Python conference, an educator pointed out that it can be very hard to get anything installed at primary and secondary schools, so having a maintained "turtle" in the base installation was important.

I therefore suspect Julia is a worse teaching language than Python for high school level and below.


About your last point, you shouldn't conflate popularity with being better. There are languages with heavy focus on education (both in the language itself and tooling) like Racket that are still not heavily used compared to python even when they provide a great environment for learning from basic to more complex comp sci.

Regarding Julia, yes the focus seems to be more on college, where the matlab-like math support makes it a lot closer to regular math. Though since it's not very different from Python someone can still pick it up easily from knowing Python in high school (or Racket).


I think Python is a better teaching language because it comes out of and is informed by ABC. ABC went through user testing to see how to develop a programming language that would make it easier for non-programmers to learn how to program.

Remember, I am presenting what I think are counter-examples to "is strictly superior to Python in everyway" (excluding "existing projects"). I still think that's true even if Python weren't popular at all.

As an example of the influence, the ":" at the end of lines which start a block in Python is not required, in the language sense. However, ABC testing showed it made the language more useful.

As an example of improving Python for learners, one of the reasons for Python's change from 1/2 == 0 -> 1/2 == 0.5 was feedback from the Alice developers, where they found that students did not understand van Rossum's decision to follow C's semantics.

Python started being used in schools when it wasn't popular. One of the presenters at the Python conference in 1998 or so was a high school AP comp sci teacher. At that time C++ and Java were the AP languages. He reported better success teaching students Python first and then teaching C++ or Java, than spending all of the time in one of the latter languages.


I think Julia is actually easier. No classes. 1 instead of 0 based indexing. Functional features (I think functional programming is easier to learn).


There are people who think Haskell is easier.

Which is why I attempted support my personal beliefs via examples of Python uptake. Something you interpreted as an argument by popularity.

I observe that https://julialang.org/learning/classes/ lists only a single high school, and I didn't see a single course meant as a general introductory programming course for non-programmers.

Here's the IPCC talk (from 2000) I mentioned about "Using Python in a High School Computer Science Program" https://legacy.python.org/workshops/2000-01/proceedings/pape...

It references the "Computer Programming for Everybody" essay at https://www.python.org/doc/essays/everybody/ , which I argue shows the stronger emphasis on the Python developers for beginning programs, inherited from ABC.

I also mentioned Alice. "Because Alice is targeted towards novice programmers, it is important that Python, more than languages such as Tcl or Scheme, can be mastered by new Alice programmers with little effort" (quoting the 1995 paper at http://www.cs.cmu.edu/~stage3/publications/95/journals/IEEEc... ). But that was at a university.

Python of course is much more established than Julia, which is why I linked to resources from when Python was only a year or two older than Julia is now.

That said, at https://www.python.org/community/sigs/current/edu-sig/ you can find other teaching resources for high school teaching and for "kids" and "young children". My local library here in Sweden has a book on programming for kids, which uses Python.

Is there a similar movement as CP4E within the Julia? Nothing from https://julialang.org/learning/ suggests that the needs of non-programmers, such as high school students, plays a strong role in its design.

Do you know of any such internal movement? Can you point me to any external resources which show Julia being used as a teaching language for non-programmers?


>But Julia is strictly superior to Python in everyway

That is until you use the debugger.


The debugger has had a huge amount of work done on it recently.

Anyway, half the data-scientists I know don't use the debugger, they're solidly in the print-to-debug camp, or they're using notebooks and going step-by-step.


Hell half the programmers I know don't ever use a debugger. I usually prefer to just think about my code and add some print statements, but occasionally I get super frustrated and throw up my hands. Feels like a personal failing on my part though.


>>But Julia is strictly superior to Python in everyway >That is until you use the debugger.

Or stumble into the first example in JuliaPlots, which doesn't work.


The first example here (https://docs.juliaplots.org/latest/) is the Lorentz attractor - just tried it in 1.5 and it works.


What example are you referring to?


I am pretty excited about Julia potential for web apps


I wouldn't be. I love julia, but it doesn't have a spectacular error-handling story. So if you have to start running a web server, and also have to worry about people issuing a DDOS or finding sneaky ways to cause your system to consume a ton of resources on a handful of threads, and lower availability, it's not going to get pretty.

Unless you mean compling julia or something like it to WASM and running it in the browser. That could get interesting, but julia is still "a bit too chonky" for that.


To be fair you wouldn't just expose your server directly nowadays, you'd have them in a container orchestrator like k8s that would have a liveness probe (and container usage tracking) that would appropriately scale/restart pods as to not have the system going down, together with load balancers and DDoS protection systems like AWS Shield to detect such attempts.

And of course, you can also use a more low latency high availability oriented language handling the frontend (like Elixir, which will have trouble doing the heavy data oriented stuff Julia can), with Julia as a microservice to handle the actual backend logic and analytics, which is probably the setup I'd go for (using 2 high level lispy-languages with very different qualities, each doing what it does best).


> And of course, you can also use a more low latency high availability oriented language handling the frontend (like Elixir, which will have trouble doing the heavy data oriented stuff Julia can), with Julia as a microservice to handle the actual backend logic and analytics, which is probably the setup I'd go for (using 2 high level lispy-languages with very different qualities, each doing what it does best).

I would do this exactly too, if I had to do number crunching.


this is a problem with pretty much every lang used for web dev. Most web apps are behind Nginx or some API proxy etc. to mitigate.


Nim is a great general-purpose language I have been using lately for my numeric and data heavy processing needs.

Julia's overenthusiastic posts here in HN make me wary of the language instead of curious.


Yes I like nim more than Julia but Julia has this big push behind it. If you put the same resources into nim it would be a more usable (not saying better) language.


Can the DifferentialEquations package actually solve quaternion equations correctly? Quaternions are non-commutative and I'd hazard to guess that some solver methods would assume commutativity at some point.


My experience with various linear solvers (which a diffeq solver typicaly relies on) is that if they assume commutivity anywhere at all, which they often do, they almost definitely will not work for quaternions. Even if a derivation of the algorithm can be done with non-commutivity in mind, the implementation typically is not. Many NLA solvers are based on orthogonality transforms, which do not translate directly to quaternions, and even solvers which use only inner products and mat-vec multipication like BiCG do not really work as-is for quaternion matrices. Luckily dense linear solves with LU are fine! One can usually use the complex matrix expansion for quaternions and solve that instead without much difficulty though, and for computing eigenvalues this is perhaps even preferable, as it gives a canonical representation. I have not tried to use DifferentialEquations.jl for quaternion problems, so maybe they have figured some of it out, but it is non-trivial.


ODE solvers only use LU


This is good to know, thanks Chris. I mostly solve sparse PDEs, being able to always use LU makes everything much simpler, especially with quaternions.


I would say it's probably worth converting your problem to its linear-algebraic representation and then using the solvers on that. They're quite fast there, and it's probably fairly easy to make a representation map and its inverse work well. (I will say that I don't usually deal with quaternions other than in a linear-algebraic form when doing some robotics stuff, so maybe this is more of a pain than warranted!)


I found this approach to work well for linear systems, here is some rough code I used for the representation map (note the jmag/kmag functions are part of my implementation, not sure what the equivalent is for Quaternions.jl).

  function cmatrix(Q::AbstractVecOrMat{Quaternion{T}}) where {T}
      [complex.( real.(Q),  imag.(Q)) complex.( jmag.(Q), kmag.(Q));
       complex.(-jmag.(Q),  kmag.(Q)) complex.( real.(Q), -imag.(Q))]
  end

  function qmatrix(C::AbstractMatrix{Complex{T}}) where {T}
      n, m = size(C)
      quat.(C[1:n÷2, 1:m÷2], C[1:n÷2, m÷2+1:m])
  end


Searching for Julia related work on Upwork resulted in only 3 jobs. One of them was actually to port an existing application from Julia to C++


Julia looks great but in practice I can pretend it doesn’t exist without much worry. Python and MATLAB are good enough for interactive work and for the performance stuff we use CUDA or similar directly.

Edit just to mention that Numba solves by and large the middle ground for the domain I work in, despite what Julia proponents have to say.


> Julia looks great but in practice I can pretend it doesn’t exist without much worry.

That's always the problem with a new programming language. The target for the language is the people that already have something that works. Then a decade passes and that language is no longer a shiny new object, so enthusiasm for it dies, it's just another old language with warts.


Not to sound dismissive but I remember when that was true of Python (vis-a-vis Perl and some other things). My recollection of discussions on forums when it was starting to gain mindshare were arguments largely about the aesthetics of the syntax.

Numerical computing is something where there's always been something that works. Before Python it was C and Fortran.

The problem people ran into is that when everything is wrapped around low-level libraries for speed, you eventually run into the catch 22 of using the slower language that you prefer for clarity, or the faster one that makes it acceptable performance-wise.

In other words, with Python, to get the performance you would have got in C or Fortran, you have to code in C or Fortran. Then you're not using Python anymore. The idea (in theory, and a lot in practice) with Julia (or Nim, or other LLVM-targeting languages) is that you don't have this penalty.

So in that sense Julia is providing something that isn't working in Python or Matlab.

I think Julia's not quite what it's cracked up to be, but mostly it is, and I'd probably prefer working with it over Python or Matlab for numerical stuff.


> I remember when that was true of Python

But Python didn't take off because of scientific computing. Google was a heavy user of Python from the start. Python seriously sucked for scientific computing in 2005. I was looking for a new language (the old one wasn't cutting it) and I'd been looking for a use for Python for years. It had some projects but it just was not in a useful state so I went with R, which was quite mature by that time, and it focused on statistical analysis. Years later Python became popular for scientific computing, likely due to the userbase it had built up in other areas.


While it's true in 2004 there was still a Numeric/numarray schism, at the time I was using Pyrex (which soon evolved into Cython) to have a gradually strongly typed Python-like thing cross-compatible with the rest of the ecosystem. That made it quite easy to write code which ran circles around R performance-wise without leaving the Python syntax domain and with a very simple FFI to call C to boot. I even had a tiny "pycc" script to create "executables" instead of "importable modules". Yes, those executables did depend on the installed base of Python stuff.

These days, Nim is a better Cython but with less dynamic temptations and more powerful metaprogramming (and, yes, a much smaller ecosystem..maybe not that much smaller than Python in the late 90s, though). { Not that this is all Nim is...It's actually a really good everything-language that's tricky to summarize in just a few words. }


Yeah I don't disagree re: scientific computing and Python (although I think from the beginning some of the advocacy for Python was coming from more "math-oriented" communities). I still think that when it started accelerating in use (which was mainly among scripting and then web applications, some other stuff too) there were existing solutions. A lot of discussions about established languages versus new ones are very similar to one another; the particular languages are just swapped out for different ones.


> My recollection of discussions on forums when it was starting to gain mindshare were arguments largely about the aesthetics of the syntax.

I remember dynamic languages people picking up on three things in particular: Python apparently being designed to be inherently inefficient, not having proper garbage collection, and having weird scoping. At least Python was something to point people at who rejected Lisp as an alternative to Tcl and Perl for scientific computing.


> Before Python it was C and Fortran

And before that, analog circuits. Hodgkin and Huxley computed their first results for neuronal activity without a computer.


Although you might have needed the distinction between an electronic computer and a human one then.


No. Before C and Fortran it was assembly language.


I think even this is overstated: Python via Numba and MATLAB both have excellent JIT compilers now, and for the maximum performance you need to hit SIMD. Latter is not reliable in Julia and the former, well, you already have Python and MATLAB.


"SIMD not reliable" is an overstatement itself for a couple reasons:

1. It's very easy to inspect generated code of your kernels where you really need SIMD. For instance:

    julia> function my_kernel(xs)
             total = zero(eltype(xs))
             for x in xs
               @fastmath total += x
             end
             total
           end
    julia> @code_native debuginfo=:none my_kernel(rand(10))
    ...
    L96:
     vaddpd 8(%rcx,%rax,8), %ymm0, %ymm0
     vaddpd 40(%rcx,%rax,8), %ymm1, %ymm1
     vaddpd 72(%rcx,%rax,8), %ymm2, %ymm2
     vaddpd 104(%rcx,%rax,8), %ymm3, %ymm3
     addq $16, %rax
     cmpq %rax, %rdi
     jne L96
    ...
    
2. There's a Julia package that does the code gen for vectorization exactly because LLVM does not always get it right: https://chriselrod.github.io/LoopVectorization.jl/stable/exa...

3. You can make LLVM explain why it did not vectorize certain loops just like in clang.


SIMD convergence is a big issue here: how do you keep those lanes running together? If you have trivial kernels sure, but ISPC guarantees convergence across control flow which isn’t even available in CUDA (according to ISPC docs at least).

Just to be clear: I want to use Julia but without guessing about SIMD use in kernels. Until this is available for complex control flow in Julia, it’s not a game changer in my opinion.

Edit: that kernel is trivial. I have nested control flow to vectorize.


Right, I see what you're getting at. Just taking a slightly less trivial example from Intel's docs, and superficially comparing the assembly they generate to what LoopVectorization.jl can do, I guess there is some hope at least:

    julia> function simple!(ys, xs)
             @avx for i = eachindex(xs)
               ys[i] = if xs[i] < 3.
                  xs[i]^2
               else
                  sqrt(xs[i])
               end
             end
           end
This generates vbroadcastsd, vpcmpgtq, vmulpd, vsqrtpd, vblendvpd and vmaskmovpd AVX2 instructions (I don't have AVX512). But more complex control flow does indeed not work; the `@avx` macro errors it can't handle it at the moment.


Thanks for the example, I am interested in trying these packages in the future, but here's a current snippet of ISPC,

    varying float aff[nc], xh[nn], wij[nn], x[nc];
    varying int ih[nn];
    uniform int t_ = t & (nl - 1);
    uniform float k;
    for (int j=0; j<nc; j++)
        aff[j] = 0.0f;
    for (int j=0; j<nn; j++)
        for (int i=0; i<nc; i++)
            aff[i] += wij[j] * shuffle(xh[j], ih[j]);
    for (int i=0; i<nc; i++)
        x[i] = 0.1f*(x[i] + x[i]*x[i]*x[i]/3.0f + k*aff[i]);
    for (uniform int i=0; i<nn; i++)
        xh[i] = insert(rotate(xh[i], 1), 0, extract(x[i/nl], i&(nl-1)));
Just being able to absorb SIMD lanes into the notion of varying vs uniform makes this easy to write, not to mention SIMD operations like shuffle or rotate. In Julia or Numba or CUDA I have to index into arrays, ensure compatible data layout etc. I imagine this could be done with more macrology in Julia, but again why not use something which already works.


> but again why not use something which already works

Sure, but there's always somebody crazy enough to try to implement it :p

From the fact that ISPC can generate C++ (with --emit-c++) I would think their compiler is conceptually very high-level. Since so few new concept are introduced, I wouldn't be surprised if someone would implement the same DSL in Julia at some point, just to see if one can get similar performance.


> ISPC can generate C++ (with --emit-c++) I would think their compiler is conceptually very high-level

It's a Clang-based frontend for LLVM, so yeah, no reason Julia couldn't make use of their work. I think the main thing to tap into is the uniform vs varying support, since this drives the whole thing.


I'm sorry, but as a regular Matlab user, I have to say that, no, Matlab does not have an "excellent jit compiler." It has a jit compiler which is really limited, and works for straight-line code with built-in functions.

For performance you virtually always need to write 'vectorized' code, which is often difficult and memory heavy.

The daily churn of handling input parsing, along with contorting code into vectorized form takes up an extraordinary amount of time when writing Matlab code.

Matlab is certainly not 'good enough' for me! It's just what I'm stuck with at work.


Maybe, depends on the work. I’ve seen Matlab jit code to match C performance. I think it’s the language holding it back, mainly.


I have also seen Matlab code match Julia for simple loops. If you write your code properly vectorized and with loops that use only calls to 'built-in' operators and C-functions, then it can match C/Julia.

But that's just extremely restrictive. The jit does not work on arbitrary user code.


Everything is possible with enough effort. People wrote games in Assembly language too in the past. The thing though is that as long as the domain you want to work in has Julia packages, there is a high chance you job will simply be a lot easier in Julia.

I have recently been working with a REST API, something I don't do with either Python or Julia normally. I spent probably 10x as long time getting this to work in Python. It was not just a matter of not doing what I wanted, but also in not supporting well established conventions well as well as simply having far more complex APIs than the Julia equivalent.

This is something I frequently find. Julia APIs for doing the same thing as in Python will generally be much easier to work with.

Python APIs also tend to be very OOP oriented and stateful. Meaning there is a lot of mutating of hidden state. Julia APIs tend to show much more clearly what the data flow, is because they follow a more functional approach, where mutation of state is generally avoided. Although Julia is pragmatic. It isn't like working in Haskell.

NOTE: I am not saying Python is a bad language. It would usually be my second choice. I am just trying to say that there are REAL advantages to using Julia, which go far beyond mere performance. I never started using Julia due to performance but due to its expressiveness and clean APIs.


Being stuck in the middle like this can be a difficult if not deadly issue to resolve.


Sounds like Innovator's Dilemma.


Please allow Julia tooling to easily compile to a native binary/dynlib/dll that can be deployed on both desktop and mobile OS'es.

Right now, it appears like a niche interpreted language for numeric geeks only.


Matlab and its Simulink extension, which is widely used for designing control system has a feature for generating C code that can be deployed without Matlab runtime. Someone with knowledge of code generation and compilers could take this Matlab feature as an inspiration and implement a C code generator for Julia functions.

> Right now, it appears like a niche interpreted language for numeric geeks only.

Yes, it was designed for numerical and scientific computing. In Julia, you can at least prototype the numerical algorithm before implementing in C++ or Fortran.


https://github.com/JuliaLang/PackageCompiler.jl

Note that it doesn't do slim binaries yet for embedded. Don't know if that's in the roadmap.


Julia is pretty neat, but I'm still using Python at the moment for numerical computing work. Julia does have built-in sparse arrays, but I don't think it's on par with scypy.sparse yet.


I would assume scipy uses MKL, and Julia has wrappers for that too: https://github.com/JuliaSparse/MKLSparse.jl. For the GPU CUDA.jl has support for sparse matrices as well.


My only problem too is I just want to write high level mathematical code and have it work. With Python's anaconda distribution, I just download it and the sparse stuff works. I don't want the hassle of dealing with wrappers and code that is not very mature. I know it's a chicken and the egg thing.


Have you even read the readme in that link?

Start a julia repl, hit ] for the package manager, paste `add MKLSparse`, write `using SparseArrays, MKLSparse`, run `sprand(100,100, 0.1) * rand(100)`


I think the functionality of Julia's sparse arrays is mostly on par with scipy.sparse, it's just a bit rough around some edges still, and spread out into non-base packages (e.g. into IterativeSolvers.jl, ARPACK.jl, MUMPS(*).jl etc) of various maturity. Having the sparse arrays built-in might (hopefully) lead to a large ecosystem of interoperable sparse packages, but as with much of the Julia ecosystem this is still a work in progress.

Personally, working on sparse eigensolvers, my only complaint is that IterativeSolvers.jl is still somewhat lacking (though the ARPACK.jl bindings are more comparable to what scipy does), but otherwise cannot imagine going back to python for this kind of low-level numerical research. Having code run comparably to fortran speed (barring some memory annoyances) is huge for working on numerical algorithms.


The nice thing about julia is that its much simpler to roll your own things. With numpy, you might have to be writing C.


Just the opposite, because more people know C than Julia.


A Julia user knows more about Julia than a Python user knows about C.


So more people knowing C than Julia somehow makes it easier for me? Absolute nonsense.


They do, what's your point.


My point is that there are far more C programmers and C code in the world than there are Julia programmers and Julia code. This is the advantage of Python: it was able to leverage billions of lines of C code by simply acting as a glue language.


But I don't know C, and it's easier for me to be able to write fast code in a language I actually know.

Having to switch to a language you don't know is not a good situation, it doesn't help that a lot of other people know that other language.


Just fyi, Julia has a fantastic and very fast C foreign function interface, just like Python.


> With numpy, you might have to be writing Python

FTFY cf Numba

http://numba.pydata.org


Numba is great (as well as Jax, which is even better), but its limitations are drastic compared to Julia. So much of what is great about python (e.g., compossibility, introspection, simplicity) is lost with these libraries, while Julia still retains them.


The main thing missing from Numba is user defined structures, the jit classes are a performance penalty usually.

That said, in terms of composability you can jit over the closure to achieve a lot of what you might want, e.g.

    def make_loop(f):
        @jit
        def fn(x):
            for i in range(x.shape[0]):
                 x[i] = f(x[i])
        return fn
for any jit'd function f will be just as fast as if you had inlined the body of f.

For introspection and simplicity, I think, in high performance, you simply have to choose two of fast, simple and generic. Julia clearly chooses fast and generic.


Could you elaborate on why you are saying julia is not as simple? Certainly, there are performance tips one needs to be aware of, but that is the case with python too. I have used python for relatively high-performance numerics for research work for the last decade, and now that I am exploring Julia, it certainly addresses all of the concerns I mentioned above with code which is at least as legible/simple and much more introspective. It has its warts, but those warts are on the roadmap and I have seen significant improvement between v1.3 and v1.6 (e.g. compilation latency and debuggability).


Compiler errors can be on par with heavily templates C++, with similarly difficult to read code bases. It’s of course easier to work with than C++ but not the panacea that seems to be implied in a lot of the comments.

In terms of deployment, you essentially have to have Julia installed wherever you want to run, which frequently ok, and if not there’s always PackageCompiler except it’s not that easy to get a working shared lib.

There are other things I find complex but probably because I’ve used Python too long so Julia is different etc. I think Julia is a great choice for HPC generally except for challenging SIMD codes where it’s hard to vectorize without an explicit ILP model.


How does Jax lose composability or introspection?


E.g. jax does not autodifferentiate anything that is not jax (scipy ode solvers, special functions, image processing libraries, special number types (mpmath), domain-specific libraries). Compare that to Zygote.jl


It is true that Jax cannot differentiate through C code. But it can differentiate through python code that was written to accept Numpy.


Which is extremely limited compared to Zygote which can do custom types, dicts, custom arrays, complex type system and multiple dispatch uses etc


Try reading the docs before making sweeping negative comments about what a piece of software can and cannot do.

https://jax.readthedocs.io/en/latest/notebooks/autodiff_cook...


Are you talking about "Differentiating with respect to nested lists, tuples, and dicts" from that page? The comment to which you are responding covers quite a bit more. The jax documentation specifically says "standard Python containers". Zygote.jl and other less stable Julia auto-diff libraries go far beyond the built-ins and can work with structures defined by packages never designed to be used with automatic differentiation. Of course, there are limitations, but quite a bit less severe than the one in jax (and again, I am saying this while being a big jax fan).


As the document I linked to says, Jax autograd supports custom data types and custom gradients.

It’s honestly exhausting arguing with all you Julia boosters. You can down vote me to hell, I don’t care. I’m done engaging with this community.

You all are not winning over any market share from Python with your dismissive, arrogant, closed minded culture.


I understand you are frustrated, however, please remember

> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that."

> Please don't comment about the voting on comments. It never does any good, and it makes boring reading.

> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.

https://news.ycombinator.com/newsguidelines.html


I am confused why you assume I am a "Julia booster" or use such combative language. I love Python and Jax and use it for much of my research work, I just also like learning of other approaches. Please try to honestly address the sibling comments. We have repeatedly claimed that tools like Zygote.jl can autodifferentiate efficiently things that Jax can not (without a lot of extra special code and hand-defined backprop methods), e.g., an array of structs with scalar and vector properties over which a scalar cost is defined. Just give examples, so that we can both learn something new about these wonderful tools instead of using such offensive language. It is hard to not take your own comments as the ones being dismissive.

Also, look from where this conversation started. My claim was that jax does not work with "(scipy ode solvers, special functions, image processing libraries, special number types (mpmath), domain-specific libraries)". A julia library does not need to know of Zygote.jl to be autodifferentiable. A python library needs to be pure-python numpy-based library to work with jax.

In order to try to contribute to the discussion: I think this paper describes relatively well what is so special about the Julia autodiff tools: https://arxiv.org/abs/1810.07951

For a separate approach, which is also very original, check out https://github.com/jrevels/Cassette.jl


I don't see anywhere about a better type system or multiple dispatch. Try being less salty

In what language are you defining these custom arrays or types? certainly not in python, or they'll be too slow to be worthwhile.


Indeed, but "python code written to accept numpy" is a pretty restrictive subset (comparatively; I do still enjoy using python). It does not even cover most of scipy, let alone the domain specific libraries, which frequently end up using cython or C for their tightest loops.


But then writing something in C is not really difficult isnt it? (The real difficult part is the algorithm not the implementation in a language)


I'm a pretty competent C programmer, but doing even fairly simple matrix programming in C is fiendishly hard not to screw up. When I was porting this code [1] to this [2], it took an absurdly long time to iron out all the bugs. And just look at the difference in verbosity and difficulty. Worst of all, C has no guard rails: it's so easy to think you have it right, only to realize much later (or not at all, frankly) that you're computing complete nonsense.

[1] https://github.com/JuliaLang/Microbenchmarks/blob/af3d18f7b3...

[2] https://github.com/JuliaLang/Microbenchmarks/blob/af3d18f7b3...


Thanks for the example. Could you also tell how much the speedup of this led to?


That C code is 39% faster than the Julia version, so faster but not massively so. You can optimize the Julia code to be as fast as the C code by using in-place operations, but it's probably not worth if for such a small speedup unless the code is a real bottleneck. By comparison, the Python version (using NumPy, of course) is 25x slower than C, which is a whole different situation.

Most the speed advantage of the C here is due to reusing the same memory over and over, which you can do pretty easily in Julia as well. Here's the Julia code using in-place operations and it's still quite a bit more readable than the C version: https://gist.github.com/StefanKarpinski/e57f5a36b7890b261a0d.... I timed this and it's the same speed as the C version. When developing this, it follows the same general outline as the C version, but you have several benefits: (1) You can use asserts to compare to the easy version; (2) There are niceties like bounds checks and array indexing. In C you can't do (1) because there is on easy version to compare with. And doing the array index computations in C is kind of a nightmare—it's so easy to accidentally screw them up.


I think you're overestimating how many Python developers are willing to write and compile C. Debugging gets more difficult across two languages. Shipping binaries with a Python package correctly can be a hassle too (what about -march=native performance).


That is not too hard if one uses Pybind11 C++ library, which allows creating native code Python modules (shared libraries) in a simple and declarative manner. Other library that can be used with Pybind11 is Eigen for linear algebra that can perform loop fusion in numerical computation, for instance, it can add multiple vectors of same size in a single loop.


Numba compiles Python to LLVM as a backend, why is this even a discussion?


Because numba (and jax), although being wonderful pieces of engineering, work on a fairly limited subset of python, do not permit the same level of introspection as python, and do not play nice with other python libraries. Julia does not make that sacrifice.


For Jax I believe this is false.

Jax is composable. In fact it’s a core design goal. Jax arrays implement the Numpy API. I routinely drop Jax arrays into other python libraries designed for Numpy. It works quite well. It’s not effortless 100% of the time but no library interop is (including Julia multiple dispatch).

I can introspect Jax. Until you wrap your function with jit(foo) it’s as introspectable as any other Python code, at least if I’m understanding what you mean by introspection.

Jax has implemented most of the Numpy functions, certainly most of the ones anyone needs to use on a regular basis. I rarely find anything missing. And if it is, I can write it myself, in python, and have it work seamlessly with the rest of Jax (autodiff, jit, etc)


Jax is awesome. But supporting most of numpy isn't enough. Because numpy isn't composable. You want to add banded-block-banded matrices to numpy? Then you need to fork numpy (or in this case fork jax); this is a package in julia and it works with everything. You want add names to your array dimentions like PyTorch recently did, then like PyTorch you need to fork numpy; again this is package in julia. You want to do both? You have to merge those two forks into each other. In julia this isn't even a package this is just using the aformentioned two packages.

You want to work with Units or track Measurment error (or both?). Basically same story. Except better in some ways worse in others. Better because you don't have to fork numpy, it is extensible enough to allow that. Packages exist that use that etendability for exactly that. Worse because those are scalar types, why are you even having to write code to deal with array support at all. Agian 2 julia packages and they don't even mention arrays internally.

The problem's not Jax. The problem is numpy. Or rather the problem is this level of composability is really hard most of the time in most languages (including the python + C combo. Especially so even).

Its true that this is not always trivial 100$% of the time with julia's multiple dispatch. but it is truer there than anywhere else i have seen.


> do not play nice with other python libraries.

Julia hasn't been around long enough to build ecosystem with multiple commerical giants building competing products.

> Julia does not make that sacrifice.

Sure, Julia sacrifices your sanity on the alter of stack traces, method resolution, time to first plot, among others.

> fairly limited subset of python

Fantastic comment, always brought up, except that this same subset of Python (really how to use CPU efficiently) is about the same that anyone writing about Julia performance is preaching.


I think a really good example of how Julian tools are composable and work with the whole of the language, while that is not the case with Pythonic tools, is autodifferentiation. The many different julian autodiff libraries have little trouble performing the autodiff through special functions, weird data structures, and 3rd party libraries. In python, none of the autodiff tools play nice with scipy or with mpmath or with sympy (or qutip, which is something I need). This is not about commercial competitors either - all of the libraries I mentioned are non-commercial.

Julia provides the interoperability between such libraries, including custom datatypes, frequently with fast zero-cost abstractions.


I agree but think that Julia had the benefit of watching Python and MATLAB suck at auto diff and do better as a community. That said PyTorch and Jax are great.

> fast zero-cost abstraction

Until you get a compiler error and the stack trace takes up a few screens of text. Who maintains that abstraction by the way? Nothing is free.


>Fantastic comment, always brought up, except that this same >subset of Python (really how to use CPU efficiently) is ?about the same that anyone writing about Julia performance is >preaching.

This is completely and demonstrably false. Julia allows fast programming with macros, multiple dispatch, abstract types and more.


All those features have a cost that, in a performance oriented setting, require restricted the code to a subset of Julia that matches what the hardware is fast at.


Where are you getting this from? None of those features have a performance cost, if used according to some simple guidelines.

In fact, multiple dispatch is essential to get good performance, abstract types have a cost if explicitly used inside struct definitions, but that's why you use parametric types, and abstract types do not have a cost in function signatures.

Macros are expanded at compile-time, and at least have no runtime cost, in fact they are often used for improved performance.

I think there's some fundamental misunderstanding going on here, but I'm not sure what it is. Or do you mean to say that if it is possible to write slow code in a language, then only a subset of that language has good performance? If that is the case, I don't know how to respond.


No, that's not true. I specifically mentioned features that can easily be made zero cost on both CPU and GPU targets.

In fact, those aren't special features and there is no other subset. Those are the core abstractions on which everything in Julia, down to primitive types and wrappers for llvm intrinsics are built. Without them you wouldn't have Julia.

Julia's GPU ecosystem wouldn't be where it is with just one or two people maintaining it without being able to reuse those features to plug into existing machinery.


I beg your pardon. What parts of Julia are not supported by its jit compiler?


Numba isn't as slam dunk of a solution as it seems, at least when your code needs to run on something besides x86. I recently witnessed someone getting numba based code running on an ARM based system, and they fought llvm package and performance issues for weeks. The code in question was generally working and seemed pretty good on x86, but took up multiple CPU cores at 100% on the ARM platform. Doing a straight forward port of the algorithm to C++ and wrapping it with pybind11 resulted in the same algorithm using 14% of one CPU core. Perhaps the numba implementation could have been made to work eventually, but it seemed pretty difficult to debug due to all the layers involved.


A subset of Python for one, and debugging stays awkward with gdb.

Makes me think of this little pun: https://github.com/FluxML/Flux.jl/blob/master/src/utils.jl#L...


The subset of Python compatible with Julia's whole performance story (mostly, structs are the missing piece), yes sure.


You mean the type system and multiple dispatch? I guess those are fairly minor points, yes.


One performance feature I find really nice in Julia, is that you can write your own function (cost function, function to integrate, whatever) that you can pass into a solver/integrator/etc. Then it will be inlined and compiled into that external library.

Does this work with numba? Are your jit-compiled functions compiled into external modules?


Even if structs worked, they wouldn't be part of a good type system, don't work with abstract interfaces/inheritance,traits, multiple dispatch, fast union types, zero allocation immutables or any myriad of other things


That really depends but on average it will be more difficult and there will be a lot of boiler plate.


Finally, Lisp wins ;).


It's not finally. It's more like, here we go again.

Ideas from Lisp have been creeping into "mainstream" programming languages for decades. As the article points out, Common Lisp introduced multiple dispatch over 40 years ago and now it's finding its way into Julia. There's lots more in Lisp that can still find its way into a modern programming language.

I'm particularly interested in the inclusion of a LispSyntax for Julia that might allow Lisp macros to be written at the LispSyntax level that can then be used at the surface Julia level.


What do you mean?


Julia has a heavy Lisp heritage. If you want a clearer comparison, see Dylan. On the topic of Lisp, Julia's parser is written in a custom version of Scheme called femtolisp.


And Dylan is at https://opendylan.org/ I never really forgave it losing s-expressions, but it did pioneer hygienic, referentially transparent macros with infix syntax.


Perhaps, as well as Dylan, it's worth referencing another Lisp system, T, that pioneered fast dynamic language implementations on the typical Unix workstation-type hardware of the time, and was of interest for research computing: https://en.wikipedia.org/wiki/T_(programming_language)

(T has some historical interest just as the original implementation language of the Yale Haskell compiler.)


When I look at google trends or redmonk rankings, Julia appears stable, not accelerating.


I don't understand google trends. Julia's progress doesn't look like much there, but even python has had pretty modest progress there, which doesn't make sense.


Anyone have thoughts on the ecosystem around Julia? Suppose you want to hook it into a database, or implement production-grade logging, or hook into e.g. Spark. Are those things do-able these days?

I've heard from some colleagues that Julia as a language is pretty solid these days, but all the machinery that you might need to take a system to production _aside_ from the business logic is a little lacking. I'd be interested to hear if others agree!


My employer (Invenia) runs Memento.jl (a logging package) in production, hooking in variously to SumoLogic and AWS Cloudwatch and from their into PagerDuty. I would say it is production grade logging. and it is integrated enough with standard lib logging that stuff logged using that also ends up at the right place.

I don't think it would actually be that hard to get the stdlib logging to work directly with that kind of system either. (Probably if we were starding today we might have built ontop of the stdlib logging, but Memento is a bit older than that, and a bit different in philosophy)

THere are a bunch of database drivers. We use LibPQ.jl in production. I hear good things also about OCDB.jl, and SQLite.jl also. and anything to do with data is really nice because of how interoperable all the data-frames like packges are. Everything just works together as long as the minimal API described by Tables.jl is implemented. (and everything implements that)


It depends. There are quite a lot of really good libraries for things like that but there are still a few important things missing.


Pet curiosity ... if the inventors would humour me.

What was the thinking behind adopting the `end` notation for marking the end of blocks. I know that certain family of PLs (Erlang, Ruby) use them. But the dominant flavor is the `{}` syntax (c/c++/scala/rust etc)

Was it an explicit choice or more based on your past familiarity with other languages?


The reason is that they want to reserve the curly braces for other uses and the language devs call using `{}` to delimit code a 'waste of valuable ascii characters'.

Julia currently uses {} mostly for type systemy things like parametric types. When I write Foo{T}, this says that T is a parameter in the type Foo.

E.g. a vector of Float64s in julia is an Array{Float64, 1} (1 means one dimensional) whereas a matrix of vectors of complex Ints is an Array{Array{Complex{Int}, 1}, 2}.


It’s from Matlab. That’s really it. Well, and probably {} are perfectly good characters that shouldn’t be wasted on code blocks.


So it's from Fortran ;)


Honestly, I just wish that this was 0-indexed. 1-indexing annoys me enough that I haven't really started playing with Julia.


I don't get why people care about this. How often does that hit you in normal code? It never bothered me in Lua, and certainly not in Julia. For me I just do a context switch. I know when I am in Julia I have to think about things like in my mathematics books. Just like you index a matrix.

When I am in other languages like C/C++ I tell myself that I working with memory addresses. For memory addresses 0-index makes more sense. For tables, matrices etc I honestly think 1-based indexing makes more sense.

Only time I dislike 1-based indexing in Julia is when I write code which is strongly related to how memory works. Like when I was implementing a CPU simulator and assembler.


It depends on which kind of maths you refer to.

After studying ordinal numbers, I feel like the normal way to start counting is from 0 (or the "empty set" ordinal).

This is also how you prove something by transfinite induction, you need to start by 0.

I wish I could easily recompile Julia so that it is 0-indexed. But then what about all the libraries...


Most well-written libraries do not depend on the specific type of indexing in an array. There's a strong convention for writing index-agnostic code, with eachindex(), enumerate(), firstindex(), lastindex(), begin, end, etc.

I try to write code to accept 0-, 1-, 2-, and StarWars-based indexing.


There's nothing preventing you from using a 0-base, it's just extra work to implement. You can even use the fabled 2-base index! https://docs.julialang.org/en/v1/devdocs/offset-arrays/


For people complaining about 1-indexing, it is precisely because they wish what they feel the "natural" way to count would come without overhead.

So it's this "extra work" that is the problem.

It would be nice to have it as an environment variable or a runtime option. Then what about portability issues...

So let's just fork Julia and call it JuliB with 0-indexing =)


I wished python was 1-indexed. I can't count the number of times I made an error because somehow programming languages count from 0, which is insane. I also keep having to do dumb things like len(x)-1, which also is dumb.


It's mostly about habit. Apart from that, I really enjoy this venture into the history of 0-based vs. 1-based indexing: http://exple.tive.org/blarg/2013/10/22/citation-needed/

> the technical reason we started counting arrays at zero is that in the mid-1960’s, you could shave a few cycles off of a program’s compilation time on an IBM 7094. The social reason is that we had to save every cycle we could, because if the job didn’t finish fast it might not finish at all and you never know when you’re getting bumped off the hardware because the President of IBM just called and fuck your thesis, it’s yacht-racing time.


so has anyone tried making games with this yet or no


Yes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: