Hacker News new | past | comments | ask | show | jobs | submit login
PyPy: NumPy funding and status update (morepypy.blogspot.com)
92 points by kingkilr on Oct 12, 2011 | hide | past | favorite | 17 comments



Maybe it's just me, but reimplementing NumPy on the PyPy platform in 6 months with an estimated 1000 hours of work seems extraordinarily ambitious. I don't mean to pooh-pooh this effort (which I would like to see happen), but the general feeling I've gotten from folks in the scientific Python community is that introducing a JIT is not going to be a magical solution for our performance problems, especially considering that many scientific Python programmers are already programming very close to the metal with Cython (or wrapping Fortran 90 with f2py). I didn't come up with this-- I'm just rehashing conversations I had at PyCodeConf last week. Other members of the SciPy community have some fairly different ideas about building a new architecture for array computing (http://conference.scipy.org/scipy2011/slides/wang_metagraph....), i.e. building a dynamic fusing compiler ("stream fusion") for array expressions.

However, having a numpy-lite in PyPy would enable a lot of people to switch to PyPy who are currently being well-served by the current version of NumPy and thus benefit from the rest of their Python code being a lot faster.

A lot of people have asked me recently if PyPy would help me with my library, pandas. My answer so far has been "even if NumPy worked on PyPy, probably not all that much". It'd be cool if I am proved wrong :)


Even with no speedup, the main benefit would be the ability to use pypy for the rest of the "supporting" python code.


Exactly. Numpy is fast enough. Its the code calling it that is slow.


NumPy is not fast enough, that's the problem (http://technicaldiscovery.blogspot.com/2011/07/speeding-up-p...). In scientific applications, the code calling it is rarely the bottleneck-- if it is you might be doing something wrong. The biggest bottlenecks I encounter are a) computation and b) data serialization / deserialization (especially if a database of some kind is involved)


Theano is also very interesting in regard of array computing, I've used it for very fast convolutional-neural-network training on GPU, but it can do many things (automatic symbolic differentiation, code generation, etc): http://deeplearning.net/software/theano/

I do really love the PyPy work for 'creating a faster Python'. I have a lot of scripts in Python that do parsing and then some work with numpy. These would hugely benefit from this.


I hadn't heard about metagraph. Sounds useful. Where's the source?

As I understand it, the implementation in pypy is lazy by default and only "forces" a result when it's needed. So (again IIRC) it potentially avoids intermediates like numexpr (http://code.google.com/p/numexpr/).


The current code only implements a fairly trivial array structure in python, and it does not seem to implement any expression lazyness. But pypy should obviously make it easier to try this kind of things compared to the current numpy.


It does implement lazy evaluation of array expressions and JIT to compile it on the fly to assembler. Having an assembler generator that's not too bad helps immensely with such efforts


This is great news. There are so many great packages that depend on NumPy support.


The strategy followed by pypy to reimplement numpy from scratch makes it rather unlikely that it will support packages depending on numpy, because of the various dependencies on numpy's implementation details.


That's what they said about reimplementing Python ;)


fair enough.

But you could also argue that this reinforces my point, as few people use pypy instead of python.


recursion overload!

That reinforces the goal of porting NumPy (and PyPy's trackrecord at achieving such ports). That is all the people not using pypy cause it lacks numpy will, after this port, have the option to.


There is no question about the value of numpy on top of pypy. The issue is whether a reimplementation from scratch is the best way to achieve that.


There are two blog posts as of why:

http://morepypy.blogspot.com/2011/05/numpy-in-pypy-status-an... http://morepypy.blogspot.com/2011/05/numpy-follow-up.html

It's not possible to do cool stuff with reusing - like parallelizing expressions etc. The architecture as it is now already can score 2x wins over original numpy with array expressions and we expect it to get only better with SSE and more parallelizing. This requires reimplementing numpy.


This is again a different argument. I understand it is more fun, more rewarding and more challenging to implement a new array module on top of pypy. But it is seriously doubtful that it is the best way forward to make pypy usable for libraries which depend on numpy.


What about Scipy? Will Pypy support Scipy soon, too? Without Scipy, I don't neet Numpy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: