Hacker News new | past | comments | ask | show | jobs | submit login

Maybe it's just me, but reimplementing NumPy on the PyPy platform in 6 months with an estimated 1000 hours of work seems extraordinarily ambitious. I don't mean to pooh-pooh this effort (which I would like to see happen), but the general feeling I've gotten from folks in the scientific Python community is that introducing a JIT is not going to be a magical solution for our performance problems, especially considering that many scientific Python programmers are already programming very close to the metal with Cython (or wrapping Fortran 90 with f2py). I didn't come up with this-- I'm just rehashing conversations I had at PyCodeConf last week. Other members of the SciPy community have some fairly different ideas about building a new architecture for array computing (http://conference.scipy.org/scipy2011/slides/wang_metagraph....), i.e. building a dynamic fusing compiler ("stream fusion") for array expressions.

However, having a numpy-lite in PyPy would enable a lot of people to switch to PyPy who are currently being well-served by the current version of NumPy and thus benefit from the rest of their Python code being a lot faster.

A lot of people have asked me recently if PyPy would help me with my library, pandas. My answer so far has been "even if NumPy worked on PyPy, probably not all that much". It'd be cool if I am proved wrong :)




Even with no speedup, the main benefit would be the ability to use pypy for the rest of the "supporting" python code.


Exactly. Numpy is fast enough. Its the code calling it that is slow.


NumPy is not fast enough, that's the problem (http://technicaldiscovery.blogspot.com/2011/07/speeding-up-p...). In scientific applications, the code calling it is rarely the bottleneck-- if it is you might be doing something wrong. The biggest bottlenecks I encounter are a) computation and b) data serialization / deserialization (especially if a database of some kind is involved)


Theano is also very interesting in regard of array computing, I've used it for very fast convolutional-neural-network training on GPU, but it can do many things (automatic symbolic differentiation, code generation, etc): http://deeplearning.net/software/theano/

I do really love the PyPy work for 'creating a faster Python'. I have a lot of scripts in Python that do parsing and then some work with numpy. These would hugely benefit from this.


I hadn't heard about metagraph. Sounds useful. Where's the source?

As I understand it, the implementation in pypy is lazy by default and only "forces" a result when it's needed. So (again IIRC) it potentially avoids intermediates like numexpr (http://code.google.com/p/numexpr/).


The current code only implements a fairly trivial array structure in python, and it does not seem to implement any expression lazyness. But pypy should obviously make it easier to try this kind of things compared to the current numpy.


It does implement lazy evaluation of array expressions and JIT to compile it on the fly to assembler. Having an assembler generator that's not too bad helps immensely with such efforts




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: