Passing the torch of NumPy and moving on to Blaze

carterschonwald · on Dec 17, 2012

Congrats to Travis and the rest of the Continuum analytics team on the Darpa XDATA funding!

As someone working to build tools in the same space as Continuum (and perhaps as a competitor), having your competitors (Continuum) be intelligent, nice, interesting folks who really understand the problem domain is pretty darn great.

Point being: the numerical computing / data analysis landscape is going to be seing a lot of great tools emerge and/or mature over the next year, and I have no doubt that 30-50% of them will be coming from Continuum Analytics. [edit: to the substantial enrichment of high level tools for extending numerical within python / and likely generally!]

I can only hope that I execute my tool building work at WellPosed well enough that I can call them a competitor for years to come!

xaa · on Dec 17, 2012

As someone who has dabbled in using Python for numerical computing in several small projects, I wonder: what would be the motivation for further investment in Python as a numerical platform, considering all Python's problems with concurrency.

Real threads will never come to Python. MPI is a real pain unless you are running very large computations. This will only become more true as time progresses. Am I missing something?

dagw · on Dec 17, 2012

Is concurrency really that important for numeric work? Surely parallelism is what you care about.

Many numpy primitives are already parallel since it basically just hands off to your BLAS library. Beyond that there is numexpr which is really good at doing parallel evaluation of large array expressions. If your problem isn't solved by any of these, there are other powerful solutions like IPython, Parallel Python and even multiprocessing from the standard library

If you need even more performance, cython has some support for semi-automated parallelization, and if all else fails drop down to C and use OpenMP or whatever else you like.

So while concurrency is a problem in python, numeric parallelism is an area where many good solutions exist.

hippyloopy · on Dec 17, 2012

Why should I have to revert to a C library in order to do anything in Python? What's the point of using Python if every time I want to do something in parallel I'm going to have to write a C library?

Python people have their heads in the sand! If we have a hundred core processors, running Python on a single core is not going to be a tractable solution to any problem. Your BLAS may be parallel, but any time you go back into the Python driver code suddenly it's a massive bottleneck.

Distributed Python is a messy hack that wastes all the amazingly tuned shared memory support in the processor. Writing C extensions goes against the whole point of using Python.

"if all else fails" The problem with Python is that the moment you want to do something in parallel, which in the next decade will be everyone, "all else fails" is your starting point!

sqrt17 · on Dec 17, 2012

The model that has worked amazingly well for Python (and Matlab, and R, and probably a number of others) is to encapsulate the hard stuff - say BLAS for linear algebra, or GraphLab for loopy belief propagation - together with all the amazingly tuned shared memory support, concurrency, parallelism, data locality, whatnot - in C-level modules written by expert people and expose a powerful API that doesn't expose you to the nontrivialities of concurrent or parallel programming. If you spend lots of time in the driver code, you most certainly won't be happy with Python, R, or Matlab, but then Cython (and possibly Numba at some point) help push this "lots of time" further and further down.

"all else fails" is the starting point of pretty much everyone doing real work. What's your alternative here? Most of the time, specialized libraries will be both more convenient and more efficient than rolling your own with fine-grained concurrency/parallelism in Java or PyPy.

StefanKarpinski · on Dec 17, 2012

The paper "Evaluating the Design of the R Language" [1] is a great read on this subject. A key figure they found (p. 17 end of top paragraph) is that in a realistic corpus of work, only 22% of compute time was spent in C/Fortran "kernels" as opposed to R code. So the effectiveness of the "two language" design is somewhat limited, even for scientific workloads where kernels like BLAS, FFTs, etc. apply (and there are many areas where they don't really).

[1] http://r.cs.purdue.edu/pub/ecoop12.pdf

sqrt17 · on Dec 18, 2012

R is (according to that study) 500x slower than C. But let's say that we have a language that is just 10x slower than medium-optimized C. In that case, a 100sec. program run spends 78sec. in that language and 22sec. in that compute kernel.

Now imagine that you speed up the language by 2x but have to forego the use of efficient C code. Now, the program would spend 39sec. outside the "kernel" stuff and 110sec. in the stuff that used to be a C library but had to be reimplemented.

Then again, even if you consider a "one language" design such as Cython (where you can write code that's between Python and C, both convenience-wise and performance-wise), performance-sensitive code looks markedly different than straighforwardly writing down a program.

This is why the "two language" design survives, even while you see very usable work in pure-C++ or even pure-Java.

StefanKarpinski · on Dec 17, 2012

In my view, this is the strongest evidence that Matlab, SciPy, R, etc. haven't found the right abstraction level for numerical computing. The high-level language is supposed to be the abstraction, yet in these systems you continually need to break through that abstraction and code in C for performance and scale. That's not a very good abstraction. This problem is precisely what Travis Oliphant and his team are tackling with Numba and Blaze, but it remains to be seen if they can produce a better abstraction.

If you're willing to try another language altogether, Julia [http://julialang.org] is a general purpose language with enough performance and expressiveness to be an effective abstraction layer for numerical programming – you never have to dip into C for speed, scale or control. In developing the language, we haven't allowed ourselves to resort to C – instead, we've worked at making Julia itself fast enough to implement things like I/O, Dicts, Strings, BitArrays (packed 8 bits-per-byte boolean arrays), etc. – all in pure Julia code while getting C-like performance.

srean · on Dec 17, 2012

Indeed. I have pretty much stopped engaging with the standard dialog repeated ad-infinitum that goes along the lines of "code the bottleneck in C", "GIL is a non-issue, just use parallel processes".

For some workloads, the latter is actually a good advice, but for my typical use case that does not help. These would be tight'ish loop wrapped around a fork-join. Shared memory handling can be quite clunky in numpy, and if you want to do message passing, the overheads bleed off any advantage that parallalelism ought to have given you. I dont mind the message passing abstraction, just that the overhead for doing it in python/numpy is too much. About the former, one major motivation to use numpy et. al. was to not use C with its explicit indexing over arrays. Its both verbose and error prone.

It is never pleasant to drop into a different language, though it is much much better than how bad it could be, thanks to Swig, Cython, Weave. Contrary to common wisdom I prefer Weave because of its much succincter syntax. In Cython I am back to writing C again but with a different syntax. This is not a criticism of Cython, its an excellent tool and it is much much more pleasant to parallelize from Cython than from Numpy/Python.

Julia looks pretty good. I have one suggestion: The best way to get speed out of Julia is not to write vectorized expressions but to writeout explicit loops. Thats a little unfortunate because though vectorization constructs evolved out of the necessity to avoid loops (which was slow in the older languages), it did have an excellent byproduct of succinct code. Ideally I would like to retain that.

xaa · on Dec 17, 2012

Julia is a wonderful and elegantly designed language. Fast and a great type system. Intuitive. For several days, I was very excited about the prospect of moving my research to Julia.

And then I discovered that it too has no reasonable shared memory parallelism story, just the same manual distribution of arrays plus multiprocessing that exists in Python.

I will speculate that Julia's authors have the same attitude as many in the Python community -- namely, that there are small jobs, which can be run in one process, and large jobs, which need to be massively parallelized, and nothing in between. But in reality there are many scientific tasks that are medium-sized, for which OpenMP-style solution is the best fit. Tasks which might take days can be reduced to hours. With new developments like Xeon Phi, that ratio might further improve.

Also many problems require a lot of heterogeneous shared state, and it is tedious to manually distribute each element in this shared state. Finally there are many problems, such as natural language processing, that are only partially numerical. For these problems, distributing arrays is only part of the solution.

dagw · on Dec 17, 2012

I totally agree as such. I certainly don't think Python has a particularly good solution, just the best current practical solution. I mean it's pretty much an accident of history the python became a popular language for numerics and it's certainly not what it was designed for.

I'm following Numba and Blaze with interest and honestly consider Julia the most exciting new language out there. But until they reach a point where they are useable for me I'll keep using python and the incredibly powerful, if slightly kludgy, solutions it offers.