Hacker News new | past | comments | ask | show | jobs | submit login

For those with Python 2 baselines that aren't (or will never be) ported, PyPy will support version 2 basically forever [0]. PyPy supports a huge number (but not all) of Python packages, including numpy [1]. Moreover, PyPy is significantly faster in many cases [2], and for the numerical types of things I like to write [3], it's amazingly faster.

    > time pypy mandel.py > pypy.pgm
    seconds: 0.318101

    real        0m0.426s
    user        0m0.396s
    sys         0m0.013s

    > time python2 mandel.py > python2.pgm
    seconds: 30.141954

    real        0m30.156s
    user        0m30.136s
    sys         0m0.003s
That's just a silly Mandelbrot example, but for numerical algorithms like this PyPy is nearly 100 times faster than Python2, and that includes startup cost for the JIT!

I'm not in any way associated with the PyPy project, but I can't help but believe a sane world would've moved from Python to PyPy in the same way everything moved from Brandon Eich's original JavaScript interpreter to the modern JIT ones like V8.

[0] https://doc.pypy.org/en/latest/faq.html#how-long-will-pypy-s...

[1] http://packages.pypy.org/

[2] https://speed.pypy.org/

[3] https://godbolt.org/z/J9Xwp6




I can't help but believe a sane world would've moved from Python to PyPy

Yes, and this is my fundamental complaint with the Python 3 transition. It took probably millions of engineer-hours, and the result was better Unicode support and a handful of minor features most of which could have been added to Python 2 just as easily. I suspect most users would gladly have traded those benefits for a 10x performance improvement.


> millions of engineer-hours, and the result was better Unicode support and a handful of minor features

Yes, and I wish the better Unicode support had been implemented similarly to how Go does it - one string type, and you use it to hold UTF-8 if needed. In other words, they could've simply deprecated the Python 2.X unicode object and added libraries to extract code points, or grapheme clusters, or perform normalization etc... This seems much simpler and more "Pythonic".

I guess everything is 20/20 in hindsight.


I totally agree with you. I hope we still have a chance to do it well with PyPy.


> and the result was better Unicode support

Different Unicode support. And worse bytes support.

What could previously be done using python -c "..." is now long, horrible and ugly.


> Different Unicode support. And worse bytes support.

I feel like you're the first person I've seen on the planet to echo my sentiments on these. I expect a lot of people will jump here to tell you you're wrong like they have to me, so just wanted to let you know I've felt exactly these pains and agree with you.


I am hoping this is in agreement. py2’s flexibility to handle utf8 bytes without fuss is amazing. Then people come up with all kind of purity reasons to make it more complicated.


Take out "utf8" and I'll agree ;)

The fundamental problem as I see it is that "string" is a grossly leaky and misunderstood abstraction. The string type is not the same thing as a "text" type. It's being used in all the wrong places for that purpose. People treat "string" like it means "text", but in so many places where we deal with them, they just aren't (and should never be) text. Everything from stdio to argv to file paths to environment variables to "text" files to basically any interface with the outside world needs to be dealt with in bytes rather than text if you care about actually producing correct code that doesn't lose, corrupt, or otherwise choke on data.

C++ understood this and got it right, preferring to focus on optimizing rather than constraining the string type. Many other languages did pretty well by avoiding enforcing encodings on strings, too. And Python 2 defaulted to bytes as well, and only really cared about encoding/decoding at I/O boundaries where it thought it can assume it's dealing with text (though it sometimes didn't behave well there, and yes it got painful as a result). Then Python 3 came along and just made everyone start treating most data as if they're inherently (Unicode) text by default, when they really had no such constraints to begin with.

It boggles my mind that Python 3 folks like to beat the drum on how Python 3 got the bytes/unicode right without taking a single moment to even notice that most strings people deal with aren't (and never were!) actually guaranteed to be in a specific, known textual encoding a priori. They were just arrays of code units with few restrictions on them, and if you want to write correct code, you're going to have to deal with bytes by default (or something else with similar flexibility) instead of text. It would've been totally fine to introduce a text type, but it fundamentally can't take the place of a blob type, which is the language of the outside world.


"The outside world", by and large, also speaks Unicode.

Java uses UTF-16 throughout, including file paths. So does .NET. All Apple platforms are UTF-16. C++ - if you just look at stdlib, sure, it's byte-centric; but then look at popular frameworks such as Qt.

In practice, this means that, yeah, you can have that odd filename that is technically not Unicode. But the vast majority of code running on the most popular desktop and mobile platforms is going to handle it in a way that expects it to be Unicode. Why should Python go against the trend, and make life more complicated for developers using it in the process?


File names? I listed so much more for you than file names.

That HTML you just fetched? How do you know it's Unicode?

That .txt file the user just asked to load? How do you know that's Unicode?

For heaven's sake, when can you actually guarantee that even sys.stdin.read() is going to read Unicode? You can only do that when you're the one piping your own stdin... which is not the common case.

What do you do when your fundamentally invalid assumptions break? Do you just not care and simply present a stack trace to the user and tell them to get lost?

I've gotten tired of these debates though, so just a heads up I may not have the energy to reply if you continue...


In the real world Python2 gave stack traces by default when presented with common strings. Python3 doesn't.


>That HTML you just fetched? How do you know it's Unicode?

Headers contain information about the charset. If the charset isn't specified then only god knows the used encoding. This applies to all encodings. If they aren't specified you can't interpret them.

>That .txt file the user just asked to load? How do you know that's Unicode?

If you don't know the used encoding then you simply cannot interpret the file as a string. If the encoding isn't specified you can't interpret the file.

>For heaven's sake, when can you actually guarantee that even sys.stdin.read() is going to read Unicode?

Again if the encoding isn't specified then all bets are off. This is an inherent problem with unix pipes. Text isn't any different than say a protobuffer packet. You have to know how to interpret it otherwise it's just a raw byte array without any meaning.

>What do you do when your fundamentally invalid assumptions break? Do you just not care and simply present a stack trace to the user and tell them to get lost?

I don't understand you at all. Just load it as a byte array if you don't care about the encoding. If you do care about the encoding then tough luck. You're never going to understand the meaning of that text unless it is an agreed upon encoding like UTF-8 and in that case the assumptions of always choosing UTF-8 are part of the value proposition.

Let me tell you why reading a text file as a byte array and pretending that character encodings don't exist is a bad idea. There are lots of Asian character encodings that don't even contain the latin alphabet. Now imagine you are running source.replace("Donut", "Bagel"). What meaning does running this function have on a byte array? It doesn't have any.

That operation simply cannot be implemented at all if you don't know the encoding. So if you were to choose the python 2 way then you would have to either remove all string operations from the language or force the user to specify the encoding on every operation.

A string literal like "Donut" isn't just a string literal. It has a representation and you first have to convert the logical string into a byte array that matches the representation of the source string. Lets say your python program is loading UTF-16 text. Instead of simply specifying the encoding you just load the text without any encoding. If you wanted to run the replace operation then it would have to look like something like this: source.replace("Donut".getBytes("UTF-16"), "Bagel".getBytes("UTF-16")). This is because you need to convert all string literals to match the encoding of the text that you want to replace.

Well, doesn't this cause a pretty huge problem? You now need to have a special type just for string literals because the runtime string type can use any encoding and therefore isn't guaranteed to be able to represent the logical value of a literal. Isn't that extremely weird?


I'm too tired of these to reply to everything, so I'll just reply to the first bit and rest my case. It's like you're completely ignoring the fact that <meta charset="UTF-8"> and <?xml encoding="UTF-8"...?> and all that are actually things in the real world. You can't just treat them as strings until you read their bytes, was my point. The notion that the user can or should always provide you out-of-band encoding info or otherwise let you assume UTF-8 everywhere every time you read a file or stdin is just a fantasy and not how so many of our tools work.


So treat them as bytes. It's not like Python 3 removed that type. It just made it impossible to inadvertently treat bytes as a string in a certain encoding - unlike Python 2, which would happily implicitly decode assuming ASCII.


> So treat them as bytes.

Which was my entire point!! You have to go to bytes to get correct behavior. They didn't fix the nonsense by changing the default data type to a string, they just made it even more roundabout to write correct code.

> It just made it impossible to inadvertently treat bytes as a string in a certain encoding

It most certainly did not! It's like you completely ignored what I just told you. I already gave you an example: sys.stdin.read(). Uses some encoding when you really can't ever guarantee any encoding, or when the encoding info itself, is embedded in the byte stream is the normal case. How do can you know a priori what the user piped in? Are you sure users magically know every stream's encoding and just neglecting to provide it to you? At least if they were bytes by default, you'd maintain correct state and only have to worry about encoding/decoding at the I/O boundary. (And to top off the insanity, it's not even UTF-8 everywhere; on Windows it's CP-1252 or something, so you can't even rely on the default I/O being portable across platforms, even for text! Let alone arbitrary bytes. This insanity was there in Python 2, but they sure didn't make it better by moving from bytes to text as the default...)


Sure it did. Here's an easy test, using your own test case with stdin:

   Python 2.7.17 (v2.7.17:c2f86d86e6, Oct 19 2019, 21:01:17) [MSC v.1500 64 bit (AMD64)] on 
   win32
   Type "help", "copyright", "credits" or "license" for more information.
   >>> s = raw_input()
   abc
   >>> s
   'abc'
   >>> s + u"!"
   u'abc!'
So it was bytes after reading it, and became Unicode implicitly as soon as it was mixed with a Unicode string. And guess what encoding it used to implicitly decode those bytes? It's not locale. It's ASCII. Which is why there's tons of code like this that works on ASCII inputs, and fails as soon as it seems something different - and people who wrote it have no idea that it's broken.

Python 2 did this implicit conversion, because it allowed it to have APIs that returned either bytes or unicode objects, and the API client could basically pretend that there's no difference (again, only for ASCII in practice). By removing the conversion, Python 3 forced developers to think whether the data that they're working with is text or binary, and to apply the correct encoding if it's binary that is encoded text. This is exactly encoding/decoding at the I/O boundary!

The fact that sys.stdout encoding varies between platforms is a feature, not a bug. For text data, locale defines the encoding; so if you are treating stdin and stdout as text, then Python 3 will use locale encoding to encode/decode at the aforementioned I/O boundary, as other apps expect it to do (e.g. if you pipe the output). This is exactly how every other library or framework that deals with Unicode text works; how is that "insanity"?

Now, if you actually want to work with binary data for stdio, then you need to use the underlying BytesIO objects: sys.stdin.buffer and sys.stdout.buffer. Those have read() and write() that deal with raw bytes. The point, again, is that you are forced to consider your choices and their consequences. It's not the same API that tries to cover both binary and text input, and ends up with unsafe implicit conversions because that's the only way to make it look even remotely sane.

The only thing I could blame Python 3 for here is that sys.stdin is implicitly text. It would be better to force API clients to be fully explicit - e.g. requiring people to use either sys.stdin.text or sys.stdin.binary. But either way, this is strictly better than Python 2.


> The fact that sys.stdout encoding varies between platforms is a feature, not a bug. [...] This is exactly how every other library or framework that deals with Unicode text works; how is that "insanity"?

No, it's utterly false that every other framework does it. Where do you even get this idea? Possibly the closest language to Python is Ruby. Have you tried to see what it does? Run ruby -e "$stdout.write(\"\u2713\")" > temp.txt in the Command Prompt and then tell me you face the same nonsensical Unicode error as you do in Python (python -c "import sys; sys.stdout.write(u\"\u2713\")" > temp.txt)? The notion that writing text on one platform and reading it back on another should produce complete garbage is absolute insanity. You're literally saying that even if I write some text to a file in Windows and then read it back on Linux with the same program on the same machine from the same file system, it is somehow the right thing to do to have an inconsistent behavior and interpret it as complete garbage?? Like this means if you install Linux for your grandma and have her open a note she saved in Windows, she will actively want to read mojibake?? I mean, I guess people are weird, so maybe you or your grandma find that to be a sane state of affairs, but neither me, nor my grandma, nor my programs (...are they my babies in this analogy?) would expect to see gibberish when reading the same file with the same program...

As for "Python 3 forced developers to think whether the data that they're working with is text or binary", well, it made them think even more than they already had to, alright. That happens as a result of breaking stuff even more than it happens as a result of fixing stuff. And what I've been trying to tell you repeatedly is that this puristic distinction between "text" and "binary" is a fantasy and utterly wrong in most of the scenarios where it's actually made, and that your "well then just use bytes" argument is literally what I've been pointing out is the only solution, and it's much closer to what Python 2 was doing. This isn't even something that's somehow tricky. If you write binary files at all, you know there's absolutely no reason why you can't mix and match encodings in a single stream. You also know it's entirely reasonable to record the encoding inside the file itself. But regardless, just in case this was a foreign notion, I gave you multiple examples of this that are incredibly common—HTML, XML, stdio, text files... and you just dodged my point. I'll repeat myself: when you read text—if you can even guarantee it's text in the first place (which you absolutely cannot do everywhere Python 3 does)—it is likely to have an encoding that neither you nor the user can know a priori until after you've read it and examined its bytes. XML/HTML/BOM/you name it. You have to deal with bytes until you make that determination. The fact that you might read complete garbage if you read back the same file your own program wrote on another platform just adds insult to the injury.

But anyway. You know full well that I never suggested everything was fine in Python 2 and that everything broke in Python 3. I was extremely clear that a lot of this was already a problem, and that some stuff did in fact improve. It's the other stuff got worse and even harder to address that's the problem I've been talking about. So it's a pretty illegitimate counterargument to cherrypick some random bit about some implicit conversion that actually happened to improve. At best you'll derail the argument into a discussion about alternative approaches for solving those problems (which BTW actually do exist) and distract me. But I'm not about to waste my energy like this, so I'm going to have to leave this as my last comment.


Every other language and framework as in Java, C#, everything Apple, and most popular C++ UI frameworks.

Ruby is actually the odd one out with its "string is bytes + encoding" approach; and that mostly because its author is Japanese - Japan is not all sold on Unicode for some legitimate reasons. This approach also has some interesting consequences - e.g. it's possible for string concatenation to fail, because there's no unified representation for both operands.


> Possibly the closest language to Python is Ruby.

Not really; they are similar in that they are dynamic scripting languages, but philosophically and in terms of almost every implementation decision, they are pretty radically opposed.


Many of us deal in bytes that simply aren't UTF8 and never could be. Because they're just bytes.

How many things are stored as binary files?

> All Apple platforms are UTF-16.

I'm glad all their executable files are apparently text files. How amazing.

> Why should Python go against the trend, and make life more complicated for developers using it in the process?

You tell me why Python3 did that.


> py2’s flexibility to handle utf8 bytes without fuss is amazing

Without fuzz? No, sorry, it was anything but.

First of all it would default encoding to "ASCII". Have any whiff of non-explicitly handled UTF-8 and it would just go bang at the worse time possible.

That was a stupid decision

"Oh but there was setdefaultencoding" Yeah here's the first result for that https://stackoverflow.com/questions/3828723/why-should-we-no...

So no, Python2 way of dealing with Unicode was the most annoying way possible, because hey who needs anything but ASCII right?


> Python2 way of dealing with Unicode was the most annoying way possible

The part about defaulting to ASCII is annoying, yes. And using sys.setdefaultencoding to change the default would still be annoying, yes. The reason for that is that any default encoding will be annoying whenever the actual encoding when the program is running doesn't match the default.

The correct way to fix this problem is to not have a default encoding at all. Don't try to auto-detect encodings; don't try to guess encodings. Force every encode and decode operation to explicitly specify an encoding. That way the issue of what the encoding is, how to detect it, etc., is handled in the right place--in the code of the particular application that needs to use Unicode. It should not be handled in a language runtime or a standard library, precisely because there is no way for a language runtime or a library to properly deal with all use cases of all applications.

What Python 3 did, instead, was to change the rules of default encodings and auto-detection/guessing of encodings, so that they were nicer to some use cases, and even more annoying than before to others.


I agree that it was easy to shoot yourself in the foot, and if you did have to deal with unicode, it was often a pain, but at the same time, Python's simplicity and ease of use is what makes it great, with the ability to do something cleaner if you choose to. You can choose to type annotate all your code and make it better. You can choose to organize your code in however package structure you want or keep it all in one file. The language doesn't force any of that onto you. That's the approach I think would've been much nicer and Pythonic in my mind. Now you're forced to use a much clunkier bytes/str paradigm, which yes will make your life much nicer in the 10% of the time when you'll need it, but the other 90% of the time will just be slightly more annoying. Similarly, I may be alone in this, but having to put parens around print statements is also annoying 99% of the time, but nice that 1% of the time I need to pass it as a function or pass it some extra arguments.


> Have any whiff of non-explicitly handled UTF-8 and it would just go bang at the worse time possible.

Whereas python3 is just waiting to explode the moment there are bytes in your UTF-8 that are invalid.

Oh the http request got truncated to leave invalid utf-8 in an otherwise fine utf-8 response? FUCKING ERROR.


Except for that part where it would happily implicitly convert them to/from a Unicode string in any context where one was needed or present... using ASCII, rather than UTF-8, as the encoding.


> I feel like you're the first person I've seen on the planet to echo my sentiments on these.

There have been plenty of people with similar sentiments. I'm one of them. I have felt ever since I first looked at Python 3 that the ways in which it broke backward incompatibility were heavily skewed towards a few particular use cases and did not take into account the needs of all of the Python community.


Got any examples?


I didn't want to reply to the PyPy comment and be negative, but I haven't really gotten speedup from the few times I've tried PyPy. In fact I've generally gotten slowdowns. Definitely not 10x improvements. I'm not sure what the reason might be though, because the rest of the world seems to think differently.


It's been a while since I used PyPy, but JITs in general warm up over time. If you have hot loops with heavy arithmetic and no branches, that's usually best case scenario for JITs. If you have branchy non-uniform control flow, that's the worse scenario. So it really depends on your usage - you may be paying the JIT overhead costs with little benefit.


Spot on, and the difference between a tracing JIT vs a method JIT can be night and day too.


Really depends on the kind of work you're doing. As mentioned above, numerical and looping code will see the most benefit. I used it heavily when doing Project Euler problems, and those would easily see 10-100x speedups, taking some problems from 5-10m run time to seconds.


Yup. I like Python3, it's better than 2, and has lots of good new features, but so many of them could simply have been added onto 2 without requiring this huge painstaking migration that cost an unbelievable amount of effort worldwide.

Contrast with Java, which has made much more substantial changes to the language over all these years, but goes to great pains to support backwards compatibility. Upgrading major Java versions was never remotely as painful, and as a result, huge amounts of engineering time was saved.


Python 3 features in Python 2 was entire purpose of Python 2.7. All features it came with were backports of python 3 features. They could continue doing that, but then we would never fix the Unicode.

There's so much whining, when they gave over 5 years to do the migration. I think the whole problem was that they gave too much time, and people were thinking it will be like that forever.

Also, Java is not a dynamic language, the type system allows for easy fix of issues like these, the python solution to do that is mypy, but it requires work by adding types.


> There's so much whining, when they gave over 5 years to do the migration.

You make it sound so generous of the Python maintainers to "give over five years" before breaking backwards compatibility--something that C, Java, JavaScript, C++, and probably most other languages haven't done for decades, or in some cases ever. Programming languages are serious basic infrastructure, programmer time is valuable, and Python's installed base is large enough that many man-centuries were probably wasted on this migration.

If the Python maintainers had not given as much time as they did, I doubt the Python community or ecosystem would have transitioned any more quickly than they did. More likely would have been some fork or alternate implementation of Python 2 becoming the new standard.


Okay, but how many man-centuries have we spent fixing buffer overflows in C because the type system can't check that without breaking reverse compatibility?

Reverse compatibility isn't free.


It’s not like C is the only programming language to have never broken backwards compatibility, or that conflating byte arrays and strings is as dangerous as buffer overflow.


You're right, but it's the oldest that you mentioned, which is why it has the most serious signs of age. C originated 50 years ago: give Python 50 years of development without breaking any backwards compatibility and you'd have a programming language with many of the same kinds of problems.


People have made memory-safe compilers for ANSI C (e.g. Fail-Safe C.)

Unfortunately clang and gcc don't (yet?) have --safe or an x86-64-safe compilation target.


Yes, but without major semantic changes to the language to go the direction of i.e. Rust, you're relying on bounds checking to make it memory safe. If you're using C for performance reasons, adding bounds checking is a break in reverse compatibility, because it happens at run time and drastically degrades performance in some cases.


> better Unicode support

I guess it's a matter of emphasis, but I'd say it has different Unicode support. It's better mildly for some use cases, but worse for others.

It's bloody horrible for one use case in particular: when you know the text is readable and mostly ASCII based and you are only interested in the ASCII bits, but don't know the encoding. That is the position you find yourself in for any designed in pre-unicode times, and that happens to include just about every file in a Unix file system.

The solution in those circumstances is to treat everything as bytes (b''). That wasn't even possible in the beginning. Now it mostly works, but all with hundreds of corner exceptions (like Exception messages, so you can't easily include a Unix filename in an error message).


It's only 10x for web. Anyone using python for batch processing won't see the gains because the work is done before pypy warms up.

But yes the only people who wanted better unicode were web people. So they could have been better served moving to pypy.


Pypy is great, but numpy is even faster (2-3x over your pypy example), and the code is less strange. Here's the mandel function in numpy:

    def mandel(data: np.ndarray):
      c = data
      z = c
      for j in range(255):
          z = z**2 + c
(from here: https://scipy-lectures.org/intro/numpy/auto_examples/plot_ma...)


Yeah, that's the thing with silly examples (like my Mandelbrot program). For real code, I frequently need to write low level loops which aren't easily expressed as parallel operations. If numpy doesn't have it, or if you can't figure out how to parallelize it, you're screwed.

Moreover, for some very common things in signal processing, like working with evens/odds or left/right parts of an array, the parallel numpy operation will create lots of temporary arrays and copies.

And for what it's worth, your version of mandel should work with PyPy. So you can have your cake and eat it too.

EDIT: I should add the reason my code is "strange" is because I wrote it so I could do a one-to-one comparison with other languages which don't have builtin complex numbers. Maybe I should've cleaned that up before posting.


> Moreover, for some very common things in signal processing, like working with evens/odds or left/right parts of an array, the parallel numpy operation will create lots of temporary arrays and copies.

iiuc, this shouldn't be correct. given an ndarray A, `A[::2, 1::2]` will provide a (no-copy) view of the even rows/columns of A. Same with A[:len(A)/2] to get only half of A.

> And for what it's worth, your version of mandel should work with PyPy. So you can have your cake and eat it too.

Indeed, most of the scipy stack works with Pypy, it's great.


As soon as you do any operations (addition, subtraction, etc...) on those views, you're going to get temporary arrays.

For instance, your Mandelbrot example doesn't even use views and it creates two temporaries the size of the entire array on each iteration:

    for j in range(255):
        t = z**2    # create a new squared array
        u = t + c   # create a new summed array
        z = u       # replace the old array
And all of this is ignoring how inconsistent numpy is about when it creates a view and when it creates a copy.


Unnecessary temporary arrays is definitely a major source of inefficiency when working with NumPy, but recent versions of NumPy go to heroic lengths (via Python reference counting) to avoid doing so in many cases: https://github.com/numpy/numpy/blob/v1.18.3/numpy/core/src/m...

So in this case, NumPy would actually only make one temporary copy, effectively translating the loop into the following:

    for j in range(255):
        u = z**2   # create a new squared array
        u += c     # add in-place 
        z = u      # replace the old array


Your general point is correct, although in this specific instance, replacing the loop body with

    z **= 2 
    z +=c
gets rid of the temps. But yes there are cases where that isn't possible.


This gets rid of temporary arrays, but this still isn't optimal if z is large. Memory locality means it's faster to apply a scalar operation like z2+c in a single pass, rather than in two separate passes.

Explicitly unrolling loopy code (e.g., in pypy or Numba) is one easy way to achieve this, but you have to write more code.

Julia has some really nice syntax that lets you write things in this clean vectorized way but still get efficient code: https://julialang.org/blog/2017/01/moredots/


Gotcha. I feel like I remember some numpy or scipy way of creating complex ufunc ops and applying them simultaneously, but maybe I'm misremembering or thinking np.vectorize was fast?


Vectorization FTW


The python is the least of issues, you can just keep 2.7.18 and 10 years from now it will still work. The problem is dependencies. Developers of packages couldn't wait until they can drop all the cruft that python 2.7 was required.

https://python3statement.org/

Just do yourself a favor and migrate your codebase, Py3 is much more enjoyable to program in. I feel like all those vocal Python 2 supporters never had a chance to write a new code in Python 3. If you only did 2->3 migration you might hate Python 3, because it doesn't let you run a broken code that Python 2 happily executed, but if you can write Python 3 app from scratch you don't even notice the unicode, text is just text.


"A broken code that happily executed"?

I think that's unfair. There was plenty of code that was out there for 10 years or more that was working completely fine and had to be ported. One of the most frustrating things I had to do was completely rearchitect some legacy binary file reading/writing because of the changes to how Python handled bytes. That code was out there as open source in the wild, stable, and was being widely used, and it basically required a full rewrite underneath the API.

One of the most frustrating things was that many packages we used as dependencies took ~5 years to port to Python 3, and then dropped Python 2 support immediately, leaving us with no choice but to use the old version for some time. We'd done a lot of the easy stuff already (2to3 on all files), but lots of the non-trivial things were the interactions with other packages so couldn't be touched until they had themselves got a Python 3 version.


> many packages we used as dependencies took ~5 years to port to Python 3, and then dropped Python 2 support immediately

A lot of packages were held back by early criticism of the move and the extended timeframe allowed to 2.x.

If 10% of the stop-energy dedicated to shit on 3 would have been put into supporting the effort, things would have gone differently. But most people dragged their feet and this is the result.


I saw pretty great results switching a pretty “standard” high traffic Python web app (Flask & uwsgi) over to run on PyPy. We saw about 30% faster response time for HTTP requests, several times the number of tasks/second on the workers, and we were able to scale down the total instances needed. Mostly the typical Python web app libraries just worked, and I spent a couple of days making sure everything was stable, but it was a great success overall.


This is pretty interesting.

Years ago I had a bunch of code that was basically just matrix multiplication with some large-ish matrices, and then taking some eigenvectors/eigenvalues at the end. At the time I found the same thing -- if I decomposed things into simple lists of numbers for vectors and lists of lists PyPy was way faster.

I just had the opportunity to brush this code off in Python 3 and run it as-is, and it performs much better than it used to. But I am always curious to hear about these cases.

PyPy really is a wonderful project.


Came here just to say that PyPy is amazing, from my somewhat limited exposure, it fulfilled my use case very well with a speed gain.


I am interested and would like to learn more. So do you just

    sudo apt install pypy3
and then

    pypy3 -m pytest /my/python/app
and if things go well you either got a 5-10x speed up or an insufficient test suite?


Another casualty of python 3?


Honestly, I think Python's C API exposing so much of the internals of the implementation is the real problem. You can basically see every pointer in every struct, including tons of things you shouldn't need. Large packages inevitably end up using some unfortunate detail which couples them tightly to CPython, and this makes using those packages with PyPy nearly impossible. The fact that PyPy got so many of those to work as well as they did (numpy stands out) is a testament to their talent and stamina.

I believe this is also a huge part of the reason why migrating from CPython version 2 to version 3 was delayed. I've adapted a few small C extension modules to run under both, and using #ifdef for the special cases to support both was unpleasant. So I imagine that any large package which needed to support both through the transition really suffered for it.


Python 3 has "limited API", which is much better in that regard.

https://www.python.org/dev/peps/pep-0384/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: