I like Cython, but I'm always surprised at how willing programmers seem to be to use the cython syntax. In so many cases I look at the optimized program and think to myself, "I would understand that better if was just written in C."
I agree, my vastly preferred option now is Python code and C code with cffi to glue them together. I find that much preferable to Cython which is basically a third language altogether (and IMO not an especially nice one, although that is a bit of a value judgement).
Even if you write the Cython naively (PyObjs everywhere), you get a modest speed boost simply by cutting out the Python bytecode interpreter. This is a very low friction process as opposed to carving out chunks of code that can be isolated from the Python runtime.
This is probably because you don't use much of the Python ecosystem in your C module. If you're trying to speed up something that both calls and is called from Python then Cython saves a lot of verbiage and some fiddly bits.
I couldn’t disagree more. I find Cython syntax allows me to express the thing I want at the C-level with significantly less work than the equivalent syntax directly in C.
For example consider using cdef to define a simple class (which compiles directly to a struct + helper functions). Organizing it with class-like syntax is so much easier and better mapped to the concept model, and (as it should be) the compiler worries about how to map that concept down to a thin struct with functions (something the programmer should not have to consciously think about but should still benefit from).
You can go really far with this with Cython. For example this is a side project to write a “typeclass”-like pattern for polymorphism in Cython.
The function definitions hardly look different from plain Python, yet you’re getting auto-generated specialized functions for each resolvable child type of the type class (called “Ord” here), plus a dispatcher to invoke them from Python.
So you sort of get just as much type dynamism as plain Python, except each typed instance is much more performant with no PyObject or PyFunction overhead in the call sequence.
Nim is a language that mimicks python syntax but compiles down to c. Really cool and easily gives you 'fast code'. If you are into this kind of stuff I would recommend you take a look.
How's the standard library in nim? What's the 3rd party package ecosystem like? If I have questions will I be able to find a good answer on Google? How about jobs?
There's a handful of languages out there you could consider "faster python". I just use cython though.
I have limited knowledge about it, but mainly it isn't version 1.0 yet so the std lib stills changes and has some quirks/missing pieces or pieces that are not quite aligned yet.
The package system is Nimble, seems easy enough, fetches packages from github given a version, though I'm not sure how secure that is.
Not sure about Google as its relatively new, but in general the Nim forum is pretty active. I have asked around there a few times and got answers even from the language creators themselves.
It's quite complete for anything numerical/machine learning/statistical at this point, with almost 2000 packages. This is in addition to the very comprehensive standard library, which includes everything you get in Numpy/Scipy. Of course, outside of numerics, you're pretty much out of luck, which is not the case for Python.
It also has multiple dispatch like Common Lisp, as well as gradual typing. All functions are multimethods, actually. This is an enormous abstraction advantage over Python. It's much easier to do complicated things without pulling your hair out in comparison to Python.
I use it a fair amount in my job as a data scientist. It's also what I reach for if I need to write some custom algorithm myself that needs to be high performance, rather than doing it in C.
I don't find Julia to be nearly as programmer-friendly as Python, with more syntax and more cognitive overhead. This is on an admittedly small experience base.
This. If you need to drop out of regular Python for performance reasons, then Julia offers the same high level flexibility, but with types and performance.
There's a reason most high-performance Python libraries are not written that way, and core routines are just written in C instead. Proof of the pudding is in the eating!
See this talk by Armin Ronacher (creator of Flask) on why the design of python makes it fundamentally unfriendly to performance optimizations: https://youtu.be/qCGofLIzX6g?t=171
If your domain falls under the umbrella of numerical and scientific computing, writing Julia is as painless as writing python, with code that automatically runs roughly as fast as C. If you're used to writing numpy, you can hit the ground running in Julia, with maybe a few hours to become comfortable with the slightly different syntax and the names of useful libraries.
The point is that Cython provides a nice intermediate stage between C and CPython. Most optimizations need the first factor of 100, not the last factor of 2. You can usually achieve that in Cython with an effort measured in characters changed rather than lines of code changed.
I've played with Julia. It's nice enough, but it doesn't offer me anything I don't already get through the C/Cython/CPython hierarchy.
I haven't used it, but nuitka (http://nuitka.net/) translates python (2.6, 2.7, 3.3 to 3.7) to C and compiles that. It claims to be highly compliant and performant without any extra pragmas.
How does is compare to the two technologies in the article?
IIRC, this is false regarding NumPy. The fundamental problem NumPy has is that it only vectorizes 1 operation for each pass through the data (or occasionally 2-3, like dot product). It can't do arbitrarily hybrid operations (like idk, maybe a[i] * b[i] / (|a[i]| + |b[i]|)) in one pass through your data. This is an inherent limitation in NumPy, and the lack of it is inherent in Numba. So you very much can expect speedups in some cases -- it just depends on what operations you are performing and how big your data is.
You're right, apart from np.einsum() and numexpr which is a separate package (albeit less drastic to use than Numba, because with numexpr you don't write your own loops).
I meant "occasionally 2-3, like dot product" to include stuff like einsum. FWIW I found einsum was actually slower than tensordot last time I tried, so you may want to use that instead if this is still the case.
Yes, but it can be difficult to figure out the best way to make said carefully optimized numpy operations for anything moderately complicated. It can also be difficult or impossible to avoid extra memory allocations in numpy. Sometimes it's just quicker to bang out the obvious loop-based implementation in numba or cython.
Also, numba does in fact support targeting the GPU,̶ b̶u̶t̶ ̶I̶ ̶t̶h̶i̶n̶k̶ ̶i̶t̶ ̶r̶e̶q̶u̶i̶r̶e̶s̶ ̶a̶ ̶l̶i̶c̶e̶n̶s̶e̶ ̶f̶o̶r̶ ̶n̶u̶m̶b̶a̶p̶r̶o̶ ̶(̶i̶.̶e̶.̶ ̶n̶o̶t̶ ̶f̶r̶e̶e̶,̶ ̶t̶h̶o̶u̶g̶h̶ ̶l̶a̶s̶t̶ ̶I̶ ̶u̶s̶e̶d̶ ̶i̶t̶ ̶t̶h̶e̶y̶ ̶h̶a̶d̶ ̶f̶r̶e̶e̶ ̶l̶i̶c̶e̶n̶s̶e̶s̶ ̶f̶o̶r̶ ̶s̶t̶u̶d̶e̶n̶t̶s̶)̶.̶ (edit: it's free now, see below).
Numba is not made by NVidia. It is [1] made by Anaconda (formerly Continuum Analytics), which was co-founded by Travis Oliphant, the primary author of NumPy.
They aren't drivers, it's just the ability to generate CUDA kernels in Numba. It has nothing to do with Nvidia supporting it, they were not involved AFAIK.
Interesting, thanks :-) either way, we’re all set to accelerate Python code on the GPU! Personally I intend to focus my efforts here rather than learning Julia.
The main use I've found for numba (I'm a theoretical physics/maths undergrad), is avoiding memory allocations in tight loops. There are some cases where I've found it hard to wrangle and arrange my numpy/scipy code such that all the vectorized operations happen in-place without extraneous allocations (in my case the difficulties have been with scipy's sparse arrays, although I can't remember the exact problems).
In particular, if you find you cannot use vectorized functions in numpy or scipy and absolutely MUST index, then typing the array in Cython is a life saver. Indexed operations on numpy arrays without Cython is very slow. (eg https://stackoverflow.com/q/22239199/300539)
Agree. It's a bit surprising to see that numpy indexed operations are even slower than the built-in list in this example. It seems the idiomatic numpy way to perform the iterations is through vectorization, but that often leads to code that is not straightforward to reason about. For this example, I'd prefer the simplicity of the Cython for-loop when it comes to optimization.
That's because with generic python code the values in a numpy array need to be boxed into python objects leading to extra memory allocations (whereas they would already be boxed in the built-in list case).
It depends. This is the only deviation I've ever found: https://github.com/cython/cython/issues/1936 ; that is, Cython will execute __prepare__ in Python 2; but __prepare__ doesn't exist in Python 2, so the normal interpreter (CPython) won't execute it. This can lead to deviations; in my case, the code crashes if run under Cython, and executes fine under CPython.
The Cython maintainers disagree with me that this is a bug, so, if you're under Python 2, I would say it is "very nearly" compatible. If you're in a recent version 3, AFAICT, it just makes Python code faster.