Hacker News new | past | comments | ask | show | jobs | submit login
[dupe] An open-source Python implementation using JIT techniques (github.com/dropbox)
83 points by daGrevis on April 4, 2014 | hide | past | favorite | 27 comments



Google V8 is certainly a brilliant piece of engineering, but by no means Method JIT ( V8 is based on ) has won or the best. Because the King, LuaJIT is based on Tracing JIT.

I have to wonder why Python as a communities have always have the interested to substantially speed up Python. While the Ruby Camp has always been about if you want more speed, do it in C. ( Best Tools for the Job manta? ) Although I hope Ruby MRI will one day get its own Google V8 / LuaJIT treatment.


"Write it in C if you need speed" is a mainstream, orthodox view in the Python community. Pypy is largely the result of hard work from a small number of people (same goes for Jython and IronPython).


Writing C extensions is a great optimization, but it is often a pain to have that platform specific code ported to another platforma (Windows/Linux/x86/x64..) However, speeding up Python itself might lessen the need for 'monolithic' and platform specific extensions... Which is a very good thing.


Both V8 and LuaJIT are extremely impressive but I'm not sure either dominates the other, at least when I last benchmarked things <http://tratt.net/laurie/research/pubs/files/metatracing_vms/... (the benchmark suite is on github, so you can update/rerun). To cut a long story short, each has some benchmarks where it blows the other out of the water, but overall I'd say that, from a performance perspective, it seems to be a score draw.


> I have to wonder why Python as a communities have always have the interested to substantially speed up Python. While the Ruby Camp has always been about if you want more speed, do it in C.

Not sure I agree with that - a big part of the excitement about 1.9 was how much faster it was, and stock Python isn't significantly faster than YARV. Didn't Python 3 actually come with a modest reduction in performance when it was released?

PyPy's certainly ahead quite a bit, but it still seems rather ignored by most.

> I hope Ruby MRI will one day get its own Google V8 / LuaJIT treatment.

Rubinius is a JITing LLVM-based Ruby VM with accurate compacting generational GC and no global interpreter lock. And of course there's the similarly capable JRuby, which leverages the JVM JIT quite heavily.


The funny thing is that any module that needs to be "fast" is written in C, and uses cpython's C-python interface. So a JIT for python can only act on half the code, and not the part that might need to be really fast.


Hence PyPy people rewriting CPython's C extensions in Python to make them fast. For example, rewriting JSON encoding from C to Python resulted in a decent speedup.

http://morepypy.blogspot.com/2011/10/speeding-up-json-encodi...


To paint broad strokes, I think it's more accurate to say that the Ruby maintainers have generally prioritized user experience (where the user here is the programmer) over performance. I think the Python maintainers have maybe prioritized performance a bit higher than their Ruby counterparts, but there's definitely still a higher emphasis on ease of use. Given the goals of both languages and the need to make pragmatic choices, I think their priorities are well set. That said, I think a Java-fast Python or Ruby is an incredibly laudable goal, and I'm happy every time I see people working on it.


To use some other broad strokes.. Are there actually any large performance sensitive ruby projects? Python has quite a following in scientific computing and there are some people trying to do some large performance matters projects in python.

I can also tell you that the idea of simply using c for the slow parts is good but it's not too difficult to build a largish project in a dynamic language where that's very difficult. A high speed kernel with slow control plane can still be slow.

Pyston seems like it has some interesting goals and it might achieve them. There are some substantial ones they haven't addressed yet, cough Gil cough. If they are good, it'll just be a faster python, whereas pypy is very good and if the community embraces what they are doing, it's a new kid of thing.


Because people good at writing Python/Ruby/whatever dynamic language might not be the best suited for writing C code (or because seen from a Python programmer's point of view, C definitely sucks).


Cython, Psycho, others that I'm forgetting, the "rewritten in C for speed" libraries in the std lib, the huge number of c-ext libs. Are all examples disproving your assumption that Python community doesn't champion "do it in c, for speed".

I know of just three Python (re)implementations attempted at least in part for speed, one abandoned (unladen swallow), one just starting(OP one), and one fairly far along but not yet complete (Pypy). All are niche and relatively small efforts compared to rest of Python communities "do it in C".


The faster the language, the more often you can afford to use it, the more you get to use it.

Projects like pypy lets me code thing in Python I would otherwise have had to do in C. How is that not an awesome thing?


It's interesting, but it looks like it has a loooong way to go before reaching feature parity with PyPy, which has itself a number of challenges to overcome before becoming mainstream. I would be interested to read the rationale for starting work on Pyston vs using/improving PyPy.


From the site: https://tech.dropbox.com/2014/04/introducing-pyston-an-upcom...

Why a new implementation

There are already a number of Python implementations using JIT techniques, often in sophisticated ways. PyPy has achieved impressive performance with its tracing JIT; Jython and IronPython are both built on top of mature VMs with extensive JIT support. So why do we think it’s worth starting a new implementation?

In short, it’s because we think the most promising techniques are incompatible with existing implementations. For instance, the JavaScript world has switched from tracing JITs to method-at-a-time JITs, due to the compelling performance benefits. Whether or not the same performance advantage holds for Python is an open question, but since the two approaches are fundamentally incompatible, the only way to start answering the question is to build a new method-at-a-time JIT.

Another point of differentiation is the planned use of a conservative garbage collector to support extension modules efficiently. Again, we won’t know until later whether this is a better approach or not, but it’s a decision that’s integral enough to a JIT that it is difficult to test in an existing implementation.

The downside of starting from scratch is, unsurprisingly, that creating a new language implementation is an enormous task. Luckily, tools are starting to come out that can help with this process; in particular, Pyston is built on top of LLVM, which lets us achieve top-tier code generation quality without having to deal with the details ourselves. Nonetheless, a new Python implementation is a huge undertaking, and Pyston will not be ready for use soon.


Thanks for that. Well, I won't say no to open-source R&D :)


Their technique (by-method JIT, LLVM toolchain) is more likely to produce a mature implementation in the short term, possibly with a modest (but welcome) speedup, without major incompatibilities with extensions. If it happens, it's better than "That Perfect JIT VM" that is never finished.


Except PyPy is here now and you can use it. Not necessarily perfect, but the design is sound.


The major issue with PyPy is that it has a radically different C API than CPython, leading to incompatibility with NumPy, Pandas, etc. to mention just a few prominent performance-critical Python packages that are written in C. PyPy works if you want to use pure Python code, but that's rarely the case in real-world high-performance Python projects.


Compatibly with CPython's C API will always be a problem for any other implementation. The only practical solution is to switch to a better FFI, like cffi.



Given that this is a new project, why focus on python 2.7, rather than 3.4?


Not wishing to put words in the mouths of the creators, but there are a few reasons why one might target 2.7 rather than 3.x:

1. 2.7 is a stable target. There will be no further features added to the 2.x series.

2. 2.7 is a known target. Other experiments like PyPy have already tried many paths to improve 2.7, so what doesn't work is reasonably well known.

3. 2.7 is a widely deployed target. This is less important from an "adoption" point of view, more so that there is a lot of software written that targets 2.x, meaning there's more software to test with.

4. Finally, I suspect much of Dropbox's codebase is 2.7[1], and since this is an effort to improve things for Dropbox, it makes sense to spend the initial efforts there :)

[1] checking the version of Python bundled with the Dropbox client confirms that it, at least, is 2.7

  /Applications/Dropbox.app/Contents/MacOS/python --version
  Python 2.7.3


1 is a big item. It is not fun playing catchup. CPython actually suspended any "language change" from 2009 to 2011 so that alternative implementations can catch up.


On the other hand, starting behind just means you have even more catching up to do. This happened in my personal VM project with ruby. I targeted ruby 1.8.7 when all these things held true for it, but by the time I got to it running most of the rubyspec language specs, 1.8.7 was basically obsolete.


Their code base is written in Python 2.7. In the previous thread, they said that contributions to implement Python 3 compatibility were welcome.


Would this allow python to be run in the browser (by going through Emscripten and asm.js)? Is that a goal that makes sense?


Not really any more than any other python implementation, no.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: