Python 2.7.18, the last release of Python 2

xscott · on April 20, 2020

For those with Python 2 baselines that aren't (or will never be) ported, PyPy will support version 2 basically forever [0]. PyPy supports a huge number (but not all) of Python packages, including numpy [1]. Moreover, PyPy is significantly faster in many cases [2], and for the numerical types of things I like to write [3], it's amazingly faster.

    > time pypy mandel.py > pypy.pgm
    seconds: 0.318101

    real        0m0.426s
    user        0m0.396s
    sys         0m0.013s

    > time python2 mandel.py > python2.pgm
    seconds: 30.141954

    real        0m30.156s
    user        0m30.136s
    sys         0m0.003s

That's just a silly Mandelbrot example, but for numerical algorithms like this PyPy is nearly 100 times faster than Python2, and that includes startup cost for the JIT!

I'm not in any way associated with the PyPy project, but I can't help but believe a sane world would've moved from Python to PyPy in the same way everything moved from Brandon Eich's original JavaScript interpreter to the modern JIT ones like V8.

[0] https://doc.pypy.org/en/latest/faq.html#how-long-will-pypy-s...

[1] http://packages.pypy.org/

[2] https://speed.pypy.org/

[3] https://godbolt.org/z/J9Xwp6

orangecat · on April 21, 2020

I can't help but believe a sane world would've moved from Python to PyPy

Yes, and this is my fundamental complaint with the Python 3 transition. It took probably millions of engineer-hours, and the result was better Unicode support and a handful of minor features most of which could have been added to Python 2 just as easily. I suspect most users would gladly have traded those benefits for a 10x performance improvement.

xscott · on April 21, 2020

> millions of engineer-hours, and the result was better Unicode support and a handful of minor features

Yes, and I wish the better Unicode support had been implemented similarly to how Go does it - one string type, and you use it to hold UTF-8 if needed. In other words, they could've simply deprecated the Python 2.X unicode object and added libraries to extract code points, or grapheme clusters, or perform normalization etc... This seems much simpler and more "Pythonic".

I guess everything is 20/20 in hindsight.

suzuki · on April 21, 2020

I totally agree with you. I hope we still have a chance to do it well with PyPy.

scoot_718 · on April 21, 2020

> and the result was better Unicode support

Different Unicode support. And worse bytes support.

What could previously be done using python -c "..." is now long, horrible and ugly.

dataflow · on April 21, 2020

> Different Unicode support. And worse bytes support.

I feel like you're the first person I've seen on the planet to echo my sentiments on these. I expect a lot of people will jump here to tell you you're wrong like they have to me, so just wanted to let you know I've felt exactly these pains and agree with you.

harikb · on April 21, 2020

I am hoping this is in agreement. py2’s flexibility to handle utf8 bytes without fuss is amazing. Then people come up with all kind of purity reasons to make it more complicated.

dataflow · on April 21, 2020

Take out "utf8" and I'll agree ;)

The fundamental problem as I see it is that "string" is a grossly leaky and misunderstood abstraction. The string type is not the same thing as a "text" type. It's being used in all the wrong places for that purpose. People treat "string" like it means "text", but in so many places where we deal with them, they just aren't (and should never be) text. Everything from stdio to argv to file paths to environment variables to "text" files to basically any interface with the outside world needs to be dealt with in bytes rather than text if you care about actually producing correct code that doesn't lose, corrupt, or otherwise choke on data.

C++ understood this and got it right, preferring to focus on optimizing rather than constraining the string type. Many other languages did pretty well by avoiding enforcing encodings on strings, too. And Python 2 defaulted to bytes as well, and only really cared about encoding/decoding at I/O boundaries where it thought it can assume it's dealing with text (though it sometimes didn't behave well there, and yes it got painful as a result). Then Python 3 came along and just made everyone start treating most data as if they're inherently (Unicode) text by default, when they really had no such constraints to begin with.

It boggles my mind that Python 3 folks like to beat the drum on how Python 3 got the bytes/unicode right without taking a single moment to even notice that most strings people deal with aren't (and never were!) actually guaranteed to be in a specific, known textual encoding a priori. They were just arrays of code units with few restrictions on them, and if you want to write correct code, you're going to have to deal with bytes by default (or something else with similar flexibility) instead of text. It would've been totally fine to introduce a text type, but it fundamentally can't take the place of a blob type, which is the language of the outside world.

int_19h · on April 21, 2020

"The outside world", by and large, also speaks Unicode.

Java uses UTF-16 throughout, including file paths. So does .NET. All Apple platforms are UTF-16. C++ - if you just look at stdlib, sure, it's byte-centric; but then look at popular frameworks such as Qt.

In practice, this means that, yeah, you can have that odd filename that is technically not Unicode. But the vast majority of code running on the most popular desktop and mobile platforms is going to handle it in a way that expects it to be Unicode. Why should Python go against the trend, and make life more complicated for developers using it in the process?

dataflow · on April 21, 2020

File names? I listed so much more for you than file names.

That HTML you just fetched? How do you know it's Unicode?

That .txt file the user just asked to load? How do you know that's Unicode?

For heaven's sake, when can you actually guarantee that even sys.stdin.read() is going to read Unicode? You can only do that when you're the one piping your own stdin... which is not the common case.

What do you do when your fundamentally invalid assumptions break? Do you just not care and simply present a stack trace to the user and tell them to get lost?

I've gotten tired of these debates though, so just a heads up I may not have the energy to reply if you continue...

gvjddbnvdrbv · on April 21, 2020

In the real world Python2 gave stack traces by default when presented with common strings. Python3 doesn't.

imtringued · on April 21, 2020

>That HTML you just fetched? How do you know it's Unicode?

Headers contain information about the charset. If the charset isn't specified then only god knows the used encoding. This applies to all encodings. If they aren't specified you can't interpret them.

>That .txt file the user just asked to load? How do you know that's Unicode?

If you don't know the used encoding then you simply cannot interpret the file as a string. If the encoding isn't specified you can't interpret the file.

>For heaven's sake, when can you actually guarantee that even sys.stdin.read() is going to read Unicode?

Again if the encoding isn't specified then all bets are off. This is an inherent problem with unix pipes. Text isn't any different than say a protobuffer packet. You have to know how to interpret it otherwise it's just a raw byte array without any meaning.

>What do you do when your fundamentally invalid assumptions break? Do you just not care and simply present a stack trace to the user and tell them to get lost?

I don't understand you at all. Just load it as a byte array if you don't care about the encoding. If you do care about the encoding then tough luck. You're never going to understand the meaning of that text unless it is an agreed upon encoding like UTF-8 and in that case the assumptions of always choosing UTF-8 are part of the value proposition.

Let me tell you why reading a text file as a byte array and pretending that character encodings don't exist is a bad idea. There are lots of Asian character encodings that don't even contain the latin alphabet. Now imagine you are running source.replace("Donut", "Bagel"). What meaning does running this function have on a byte array? It doesn't have any.

That operation simply cannot be implemented at all if you don't know the encoding. So if you were to choose the python 2 way then you would have to either remove all string operations from the language or force the user to specify the encoding on every operation.

A string literal like "Donut" isn't just a string literal. It has a representation and you first have to convert the logical string into a byte array that matches the representation of the source string. Lets say your python program is loading UTF-16 text. Instead of simply specifying the encoding you just load the text without any encoding. If you wanted to run the replace operation then it would have to look like something like this: source.replace("Donut".getBytes("UTF-16"), "Bagel".getBytes("UTF-16")). This is because you need to convert all string literals to match the encoding of the text that you want to replace.

Well, doesn't this cause a pretty huge problem? You now need to have a special type just for string literals because the runtime string type can use any encoding and therefore isn't guaranteed to be able to represent the logical value of a literal. Isn't that extremely weird?

dataflow · on April 21, 2020

I'm too tired of these to reply to everything, so I'll just reply to the first bit and rest my case. It's like you're completely ignoring the fact that <meta charset="UTF-8"> and <?xml encoding="UTF-8"...?> and all that are actually things in the real world. You can't just treat them as strings until you read their bytes, was my point. The notion that the user can or should always provide you out-of-band encoding info or otherwise let you assume UTF-8 everywhere every time you read a file or stdin is just a fantasy and not how so many of our tools work.

int_19h · on April 21, 2020

So treat them as bytes. It's not like Python 3 removed that type. It just made it impossible to inadvertently treat bytes as a string in a certain encoding - unlike Python 2, which would happily implicitly decode assuming ASCII.

dataflow · on April 21, 2020

> So treat them as bytes.

Which was my entire point!! You have to go to bytes to get correct behavior. They didn't fix the nonsense by changing the default data type to a string, they just made it even more roundabout to write correct code.

> It just made it impossible to inadvertently treat bytes as a string in a certain encoding

It most certainly did not! It's like you completely ignored what I just told you. I already gave you an example: sys.stdin.read(). Uses some encoding when you really can't ever guarantee any encoding, or when the encoding info itself, is embedded in the byte stream is the normal case. How do can you know a priori what the user piped in? Are you sure users magically know every stream's encoding and just neglecting to provide it to you? At least if they were bytes by default, you'd maintain correct state and only have to worry about encoding/decoding at the I/O boundary. (And to top off the insanity, it's not even UTF-8 everywhere; on Windows it's CP-1252 or something, so you can't even rely on the default I/O being portable across platforms, even for text! Let alone arbitrary bytes. This insanity was there in Python 2, but they sure didn't make it better by moving from bytes to text as the default...)

int_19h · on April 21, 2020

Sure it did. Here's an easy test, using your own test case with stdin:

   Python 2.7.17 (v2.7.17:c2f86d86e6, Oct 19 2019, 21:01:17) [MSC v.1500 64 bit (AMD64)] on 
   win32
   Type "help", "copyright", "credits" or "license" for more information.
   >>> s = raw_input()
   abc
   >>> s
   'abc'
   >>> s + u"!"
   u'abc!'

So it was bytes after reading it, and became Unicode implicitly as soon as it was mixed with a Unicode string. And guess what encoding it used to implicitly decode those bytes? It's not locale. It's ASCII. Which is why there's tons of code like this that works on ASCII inputs, and fails as soon as it seems something different - and people who wrote it have no idea that it's broken.

Python 2 did this implicit conversion, because it allowed it to have APIs that returned either bytes or unicode objects, and the API client could basically pretend that there's no difference (again, only for ASCII in practice). By removing the conversion, Python 3 forced developers to think whether the data that they're working with is text or binary, and to apply the correct encoding if it's binary that is encoded text. This is exactly encoding/decoding at the I/O boundary!

The fact that sys.stdout encoding varies between platforms is a feature, not a bug. For text data, locale defines the encoding; so if you are treating stdin and stdout as text, then Python 3 will use locale encoding to encode/decode at the aforementioned I/O boundary, as other apps expect it to do (e.g. if you pipe the output). This is exactly how every other library or framework that deals with Unicode text works; how is that "insanity"?

Now, if you actually want to work with binary data for stdio, then you need to use the underlying BytesIO objects: sys.stdin.buffer and sys.stdout.buffer. Those have read() and write() that deal with raw bytes. The point, again, is that you are forced to consider your choices and their consequences. It's not the same API that tries to cover both binary and text input, and ends up with unsafe implicit conversions because that's the only way to make it look even remotely sane.

The only thing I could blame Python 3 for here is that sys.stdin is implicitly text. It would be better to force API clients to be fully explicit - e.g. requiring people to use either sys.stdin.text or sys.stdin.binary. But either way, this is strictly better than Python 2.

dataflow · on April 22, 2020

> The fact that sys.stdout encoding varies between platforms is a feature, not a bug. [...] This is exactly how every other library or framework that deals with Unicode text works; how is that "insanity"?

No, it's utterly false that every other framework does it. Where do you even get this idea? Possibly the closest language to Python is Ruby. Have you tried to see what it does? Run ruby -e "$stdout.write(\"\u2713\")" > temp.txt in the Command Prompt and then tell me you face the same nonsensical Unicode error as you do in Python (python -c "import sys; sys.stdout.write(u\"\u2713\")" > temp.txt)? The notion that writing text on one platform and reading it back on another should produce complete garbage is absolute insanity. You're literally saying that even if I write some text to a file in Windows and then read it back on Linux with the same program on the same machine from the same file system, it is somehow the right thing to do to have an inconsistent behavior and interpret it as complete garbage?? Like this means if you install Linux for your grandma and have her open a note she saved in Windows, she will actively want to read mojibake?? I mean, I guess people are weird, so maybe you or your grandma find that to be a sane state of affairs, but neither me, nor my grandma, nor my programs (...are they my babies in this analogy?) would expect to see gibberish when reading the same file with the same program...

As for "Python 3 forced developers to think whether the data that they're working with is text or binary", well, it made them think even more than they already had to, alright. That happens as a result of breaking stuff even more than it happens as a result of fixing stuff. And what I've been trying to tell you repeatedly is that this puristic distinction between "text" and "binary" is a fantasy and utterly wrong in most of the scenarios where it's actually made, and that your "well then just use bytes" argument is literally what I've been pointing out is the only solution, and it's much closer to what Python 2 was doing. This isn't even something that's somehow tricky. If you write binary files at all, you know there's absolutely no reason why you can't mix and match encodings in a single stream. You also know it's entirely reasonable to record the encoding inside the file itself. But regardless, just in case this was a foreign notion, I gave you multiple examples of this that are incredibly common—HTML, XML, stdio, text files... and you just dodged my point. I'll repeat myself: when you read text—if you can even guarantee it's text in the first place (which you absolutely cannot do everywhere Python 3 does)—it is likely to have an encoding that neither you nor the user can know a priori until after you've read it and examined its bytes. XML/HTML/BOM/you name it. You have to deal with bytes until you make that determination. The fact that you might read complete garbage if you read back the same file your own program wrote on another platform just adds insult to the injury.

But anyway. You know full well that I never suggested everything was fine in Python 2 and that everything broke in Python 3. I was extremely clear that a lot of this was already a problem, and that some stuff did in fact improve. It's the other stuff got worse and even harder to address that's the problem I've been talking about. So it's a pretty illegitimate counterargument to cherrypick some random bit about some implicit conversion that actually happened to improve. At best you'll derail the argument into a discussion about alternative approaches for solving those problems (which BTW actually do exist) and distract me. But I'm not about to waste my energy like this, so I'm going to have to leave this as my last comment.

int_19h · on April 22, 2020

Every other language and framework as in Java, C#, everything Apple, and most popular C++ UI frameworks.

Ruby is actually the odd one out with its "string is bytes + encoding" approach; and that mostly because its author is Japanese - Japan is not all sold on Unicode for some legitimate reasons. This approach also has some interesting consequences - e.g. it's possible for string concatenation to fail, because there's no unified representation for both operands.

dragonwriter · on April 22, 2020

> Possibly the closest language to Python is Ruby.

Not really; they are similar in that they are dynamic scripting languages, but philosophically and in terms of almost every implementation decision, they are pretty radically opposed.

scoot_718 · on April 29, 2020

Many of us deal in bytes that simply aren't UTF8 and never could be. Because they're just bytes.

How many things are stored as binary files?

> All Apple platforms are UTF-16.

I'm glad all their executable files are apparently text files. How amazing.

> Why should Python go against the trend, and make life more complicated for developers using it in the process?

You tell me why Python3 did that.

raverbashing · on April 21, 2020

> py2’s flexibility to handle utf8 bytes without fuss is amazing

Without fuzz? No, sorry, it was anything but.

First of all it would default encoding to "ASCII". Have any whiff of non-explicitly handled UTF-8 and it would just go bang at the worse time possible.

That was a stupid decision

"Oh but there was setdefaultencoding" Yeah here's the first result for that https://stackoverflow.com/questions/3828723/why-should-we-no...

So no, Python2 way of dealing with Unicode was the most annoying way possible, because hey who needs anything but ASCII right?

pdonis · on April 21, 2020

> Python2 way of dealing with Unicode was the most annoying way possible

The part about defaulting to ASCII is annoying, yes. And using sys.setdefaultencoding to change the default would still be annoying, yes. The reason for that is that any default encoding will be annoying whenever the actual encoding when the program is running doesn't match the default.

The correct way to fix this problem is to not have a default encoding at all. Don't try to auto-detect encodings; don't try to guess encodings. Force every encode and decode operation to explicitly specify an encoding. That way the issue of what the encoding is, how to detect it, etc., is handled in the right place--in the code of the particular application that needs to use Unicode. It should not be handled in a language runtime or a standard library, precisely because there is no way for a language runtime or a library to properly deal with all use cases of all applications.

What Python 3 did, instead, was to change the rules of default encodings and auto-detection/guessing of encodings, so that they were nicer to some use cases, and even more annoying than before to others.

ehsankia · on April 21, 2020

I agree that it was easy to shoot yourself in the foot, and if you did have to deal with unicode, it was often a pain, but at the same time, Python's simplicity and ease of use is what makes it great, with the ability to do something cleaner if you choose to. You can choose to type annotate all your code and make it better. You can choose to organize your code in however package structure you want or keep it all in one file. The language doesn't force any of that onto you. That's the approach I think would've been much nicer and Pythonic in my mind. Now you're forced to use a much clunkier bytes/str paradigm, which yes will make your life much nicer in the 10% of the time when you'll need it, but the other 90% of the time will just be slightly more annoying. Similarly, I may be alone in this, but having to put parens around print statements is also annoying 99% of the time, but nice that 1% of the time I need to pass it as a function or pass it some extra arguments.

scoot_718 · on April 29, 2020

> Have any whiff of non-explicitly handled UTF-8 and it would just go bang at the worse time possible.

Whereas python3 is just waiting to explode the moment there are bytes in your UTF-8 that are invalid.

Oh the http request got truncated to leave invalid utf-8 in an otherwise fine utf-8 response? FUCKING ERROR.

int_19h · on April 21, 2020

Except for that part where it would happily implicitly convert them to/from a Unicode string in any context where one was needed or present... using ASCII, rather than UTF-8, as the encoding.

pdonis · on April 21, 2020

> I feel like you're the first person I've seen on the planet to echo my sentiments on these.

There have been plenty of people with similar sentiments. I'm one of them. I have felt ever since I first looked at Python 3 that the ways in which it broke backward incompatibility were heavily skewed towards a few particular use cases and did not take into account the needs of all of the Python community.

Too · on April 21, 2020

Got any examples?

dataflow · on April 21, 2020

I didn't want to reply to the PyPy comment and be negative, but I haven't really gotten speedup from the few times I've tried PyPy. In fact I've generally gotten slowdowns. Definitely not 10x improvements. I'm not sure what the reason might be though, because the rest of the world seems to think differently.

ericflo · on April 21, 2020

It's been a while since I used PyPy, but JITs in general warm up over time. If you have hot loops with heavy arithmetic and no branches, that's usually best case scenario for JITs. If you have branchy non-uniform control flow, that's the worse scenario. So it really depends on your usage - you may be paying the JIT overhead costs with little benefit.

xscott · on April 21, 2020

Spot on, and the difference between a tracing JIT vs a method JIT can be night and day too.

ehsankia · on April 21, 2020

Really depends on the kind of work you're doing. As mentioned above, numerical and looping code will see the most benefit. I used it heavily when doing Project Euler problems, and those would easily see 10-100x speedups, taking some problems from 5-10m run time to seconds.

CydeWeys · on April 21, 2020

Yup. I like Python3, it's better than 2, and has lots of good new features, but so many of them could simply have been added onto 2 without requiring this huge painstaking migration that cost an unbelievable amount of effort worldwide.

Contrast with Java, which has made much more substantial changes to the language over all these years, but goes to great pains to support backwards compatibility. Upgrading major Java versions was never remotely as painful, and as a result, huge amounts of engineering time was saved.

takeda · on April 21, 2020

Python 3 features in Python 2 was entire purpose of Python 2.7. All features it came with were backports of python 3 features. They could continue doing that, but then we would never fix the Unicode.

There's so much whining, when they gave over 5 years to do the migration. I think the whole problem was that they gave too much time, and people were thinking it will be like that forever.

Also, Java is not a dynamic language, the type system allows for easy fix of issues like these, the python solution to do that is mypy, but it requires work by adding types.

philwelch · on April 21, 2020

> There's so much whining, when they gave over 5 years to do the migration.

You make it sound so generous of the Python maintainers to "give over five years" before breaking backwards compatibility--something that C, Java, JavaScript, C++, and probably most other languages haven't done for decades, or in some cases ever. Programming languages are serious basic infrastructure, programmer time is valuable, and Python's installed base is large enough that many man-centuries were probably wasted on this migration.

If the Python maintainers had not given as much time as they did, I doubt the Python community or ecosystem would have transitioned any more quickly than they did. More likely would have been some fork or alternate implementation of Python 2 becoming the new standard.

kerkeslager · on April 21, 2020

Okay, but how many man-centuries have we spent fixing buffer overflows in C because the type system can't check that without breaking reverse compatibility?

Reverse compatibility isn't free.

philwelch · on April 21, 2020

It’s not like C is the only programming language to have never broken backwards compatibility, or that conflating byte arrays and strings is as dangerous as buffer overflow.

kerkeslager · on April 22, 2020

You're right, but it's the oldest that you mentioned, which is why it has the most serious signs of age. C originated 50 years ago: give Python 50 years of development without breaking any backwards compatibility and you'd have a programming language with many of the same kinds of problems.

musicale · on April 21, 2020

People have made memory-safe compilers for ANSI C (e.g. Fail-Safe C.)

Unfortunately clang and gcc don't (yet?) have --safe or an x86-64-safe compilation target.

kerkeslager · on April 21, 2020

Yes, but without major semantic changes to the language to go the direction of i.e. Rust, you're relying on bounds checking to make it memory safe. If you're using C for performance reasons, adding bounds checking is a break in reverse compatibility, because it happens at run time and drastically degrades performance in some cases.

rstuart4133 · on April 22, 2020

> better Unicode support

I guess it's a matter of emphasis, but I'd say it has different Unicode support. It's better mildly for some use cases, but worse for others.

It's bloody horrible for one use case in particular: when you know the text is readable and mostly ASCII based and you are only interested in the ASCII bits, but don't know the encoding. That is the position you find yourself in for any designed in pre-unicode times, and that happens to include just about every file in a Unix file system.

The solution in those circumstances is to treat everything as bytes (b''). That wasn't even possible in the beginning. Now it mostly works, but all with hundreds of corner exceptions (like Exception messages, so you can't easily include a Unix filename in an error message).

fnord123 · on April 21, 2020

It's only 10x for web. Anyone using python for batch processing won't see the gains because the work is done before pypy warms up.

But yes the only people who wanted better unicode were web people. So they could have been better served moving to pypy.

joshuamorton · on April 20, 2020

Pypy is great, but numpy is even faster (2-3x over your pypy example), and the code is less strange. Here's the mandel function in numpy:

    def mandel(data: np.ndarray):
      c = data
      z = c
      for j in range(255):
          z = z**2 + c

(from here: https://scipy-lectures.org/intro/numpy/auto_examples/plot_ma...)

xscott · on April 20, 2020

Yeah, that's the thing with silly examples (like my Mandelbrot program). For real code, I frequently need to write low level loops which aren't easily expressed as parallel operations. If numpy doesn't have it, or if you can't figure out how to parallelize it, you're screwed.

Moreover, for some very common things in signal processing, like working with evens/odds or left/right parts of an array, the parallel numpy operation will create lots of temporary arrays and copies.

And for what it's worth, your version of mandel should work with PyPy. So you can have your cake and eat it too.

EDIT: I should add the reason my code is "strange" is because I wrote it so I could do a one-to-one comparison with other languages which don't have builtin complex numbers. Maybe I should've cleaned that up before posting.

joshuamorton · on April 20, 2020

> Moreover, for some very common things in signal processing, like working with evens/odds or left/right parts of an array, the parallel numpy operation will create lots of temporary arrays and copies.

iiuc, this shouldn't be correct. given an ndarray A, `A[::2, 1::2]` will provide a (no-copy) view of the even rows/columns of A. Same with A[:len(A)/2] to get only half of A.

> And for what it's worth, your version of mandel should work with PyPy. So you can have your cake and eat it too.

Indeed, most of the scipy stack works with Pypy, it's great.

xscott · on April 20, 2020

As soon as you do any operations (addition, subtraction, etc...) on those views, you're going to get temporary arrays.

For instance, your Mandelbrot example doesn't even use views and it creates two temporaries the size of the entire array on each iteration:

    for j in range(255):
        t = z**2    # create a new squared array
        u = t + c   # create a new summed array
        z = u       # replace the old array

And all of this is ignoring how inconsistent numpy is about when it creates a view and when it creates a copy.

shoyer · on April 21, 2020

Unnecessary temporary arrays is definitely a major source of inefficiency when working with NumPy, but recent versions of NumPy go to heroic lengths (via Python reference counting) to avoid doing so in many cases: https://github.com/numpy/numpy/blob/v1.18.3/numpy/core/src/m...

So in this case, NumPy would actually only make one temporary copy, effectively translating the loop into the following:

    for j in range(255):
        u = z**2   # create a new squared array
        u += c     # add in-place 
        z = u      # replace the old array

joshuamorton · on April 20, 2020

Your general point is correct, although in this specific instance, replacing the loop body with

    z **= 2 
    z +=c

gets rid of the temps. But yes there are cases where that isn't possible.

shoyer · on April 21, 2020

This gets rid of temporary arrays, but this still isn't optimal if z is large. Memory locality means it's faster to apply a scalar operation like z2+c in a single pass, rather than in two separate passes.

Explicitly unrolling loopy code (e.g., in pypy or Numba) is one easy way to achieve this, but you have to write more code.

Julia has some really nice syntax that lets you write things in this clean vectorized way but still get efficient code: https://julialang.org/blog/2017/01/moredots/

joshuamorton · on April 21, 2020

Gotcha. I feel like I remember some numpy or scipy way of creating complex ufunc ops and applying them simultaneously, but maybe I'm misremembering or thinking np.vectorize was fast?

edraferi · on April 20, 2020

Vectorization FTW

takeda · on April 21, 2020

The python is the least of issues, you can just keep 2.7.18 and 10 years from now it will still work. The problem is dependencies. Developers of packages couldn't wait until they can drop all the cruft that python 2.7 was required.

https://python3statement.org/

Just do yourself a favor and migrate your codebase, Py3 is much more enjoyable to program in. I feel like all those vocal Python 2 supporters never had a chance to write a new code in Python 3. If you only did 2->3 migration you might hate Python 3, because it doesn't let you run a broken code that Python 2 happily executed, but if you can write Python 3 app from scratch you don't even notice the unicode, text is just text.

physicsguy · on April 21, 2020

"A broken code that happily executed"?

I think that's unfair. There was plenty of code that was out there for 10 years or more that was working completely fine and had to be ported. One of the most frustrating things I had to do was completely rearchitect some legacy binary file reading/writing because of the changes to how Python handled bytes. That code was out there as open source in the wild, stable, and was being widely used, and it basically required a full rewrite underneath the API.

One of the most frustrating things was that many packages we used as dependencies took ~5 years to port to Python 3, and then dropped Python 2 support immediately, leaving us with no choice but to use the old version for some time. We'd done a lot of the easy stuff already (2to3 on all files), but lots of the non-trivial things were the interactions with other packages so couldn't be touched until they had themselves got a Python 3 version.

toyg · on April 21, 2020

> many packages we used as dependencies took ~5 years to port to Python 3, and then dropped Python 2 support immediately

A lot of packages were held back by early criticism of the move and the extended timeframe allowed to 2.x.

If 10% of the stop-energy dedicated to shit on 3 would have been put into supporting the effort, things would have gone differently. But most people dragged their feet and this is the result.

macNchz · on April 21, 2020

I saw pretty great results switching a pretty “standard” high traffic Python web app (Flask & uwsgi) over to run on PyPy. We saw about 30% faster response time for HTTP requests, several times the number of tasks/second on the workers, and we were able to scale down the total instances needed. Mostly the typical Python web app libraries just worked, and I spent a couple of days making sure everything was stable, but it was a great success overall.

talentedcoin · on April 20, 2020

This is pretty interesting.

Years ago I had a bunch of code that was basically just matrix multiplication with some large-ish matrices, and then taking some eigenvectors/eigenvalues at the end. At the time I found the same thing -- if I decomposed things into simple lists of numbers for vectors and lists of lists PyPy was way faster.

I just had the opportunity to brush this code off in Python 3 and run it as-is, and it performs much better than it used to. But I am always curious to hear about these cases.

PyPy really is a wonderful project.

RMPR · on April 20, 2020

Came here just to say that PyPy is amazing, from my somewhat limited exposure, it fulfilled my use case very well with a speed gain.

olsgaarddk · on April 21, 2020

I am interested and would like to learn more. So do you just

    sudo apt install pypy3

and then

    pypy3 -m pytest /my/python/app

and if things go well you either got a 5-10x speed up or an insufficient test suite?

dilap · on April 20, 2020

Another casualty of python 3?

xscott · on April 20, 2020

Honestly, I think Python's C API exposing so much of the internals of the implementation is the real problem. You can basically see every pointer in every struct, including tons of things you shouldn't need. Large packages inevitably end up using some unfortunate detail which couples them tightly to CPython, and this makes using those packages with PyPy nearly impossible. The fact that PyPy got so many of those to work as well as they did (numpy stands out) is a testament to their talent and stamina.

I believe this is also a huge part of the reason why migrating from CPython version 2 to version 3 was delayed. I've adapted a few small C extension modules to run under both, and using #ifdef for the special cases to support both was unpleasant. So I imagine that any large package which needed to support both through the transition really suffered for it.

int_19h · on April 21, 2020

Python 3 has "limited API", which is much better in that regard.

https://www.python.org/dev/peps/pep-0384/

jillesvangurp · on April 21, 2020

Python is currently one of the most successful languages out there. Most of the growth at this point is coming from python 3. So, from that point of view, it's been a huge success.

If at this point you are still stuck on 2.7; you probably don't care a lot about updates in any case; including point releases. It's been well over a decade since it was made clear that this was going to end. So, IMHO the impact to remaining 2.7 users is minimal. They were in any case extremely conservative updating and are probably also running lots of other outdated stuff like Red Hat / Ubuntu versions that long dropped out of LTS, etc. That's fine and valid but at this point you shouldn't be surprised that you are on your own. If you didn't plan for this, that's on you.

From a security point of view that just means you probably don't want to run unprotected 2.7 servers running e.g. a web server. But otherwise it's fine if you shield it a bit. Lots of python is more about other types of jobs where the impact of security vulnerabilities is much less.

And, I'm sure that if there's demand, somebody might actually step up to do the occasional patch release if it is really needed. This has also happened in the Java world where several companies provide support for openjdk 6, 7, and 8 where Oracle no longer supports that (v8 stopped getting public updates already; you can still pay for some extended support but that too is being ramped down). I imagine, e.g. Red Hat might step up here as they seem to have continued to ship this for quite long and their LTS cycles might out run the python 2.7 cut off date.

wiremine · on April 20, 2020

This isn't a knock per Perl, but looking back, it's really interesting to see how the two communities handled their respective transitions: Python 2 to Python 3, and Perl 5 to Perl 6 (now called raku [1])

I say it isn't a knock because I think they were equally fine with the goals: Perl was looking to make a bold break towards an unknown future [2], and Python wanted a very slow and sustainable migration.

I'm glad to see Python 3 go mainstream, I'm glad that Python 2 succeeded so well, and I'm glad there are segments of computer science that still throw mugs and aim for the moon.

[1] http://blogs.perl.org/users/ovid/2019/10/larry-has-approved-...

[2] https://www.nntp.perl.org/group/perl.packrats/2002/07/msg3.h...

donio · on April 20, 2020

You have it backwards. Perl 6/Raku is a completely new language, the mistake there was to call it "Perl". Perl5 on the other hand has handled its evolution much more gently than Python 2 -> 3 did.

arunix · on April 21, 2020

There's more to it than that. Looking at its early history [1], it is clear that Perl6 was conceived/intended to be the next version of Perl after Perl5 e.g.

First, Perl will support multiple syntaxes that map onto a single semantic model. Second, that single semantic model will in turn map to multiple platforms.

Multiple syntaxes sound like an evil thing, but they're really necessary for the evolution of the language. To some extent we already have a multi-syntax model in Perl 5; every time you use a pragma or module, you are warping the language you're using. As long as it's clear from the declarations at the top of the module which version of the language you're using, this causes little problem.

There were even plans for a translator [2] similar to Python's 2to3 tool

Larry Wall and others are already working on a Perl 5 to Perl 6 translator, which will be able to translate (most) Perl 5 source code to the equivalent Perl 6 syntax.

In addition, Perl 6 will provide a "Perl 5 compatibility mode", allowing the compiler to directly execute any code that it recognizes as being written in Perl 5.

[1] https://raku.org/archive/doc/design/apo/A01.html

[2] https://raku.org/archive/faq.html

lizmat · on April 22, 2020

Those were the plans. In the meantime, Perl 6 has been renamed to Raku (https://raku.org using the #rakulang tag on social media).

Integrating Perl code in Raku can be done with the excellent Inline::Perl5 module (https://modules.raku.org/dist/Inline::Perl5:cpan:NINE). In fact, that efficiency of that module basically killed the "parse Perl code in Raku" project.

CydeWeys · on April 21, 2020

Raku isn't a completely new language. It's very clearly a descendant of the Perl lineage.

INTPenis · on April 21, 2020

Yeah but I think they have a point. As a former Perl user and current Python convert I do think the mistake was to call Perl 6 Perl.

It gave me at least a false sense of thinking Perl 5 was done and going to be replaced.

At the same time I found Python to be much easier to write, maintain and I became attached to the structure that PEP provided.

There were a lot of other factors that made me switch, but that particular point of not calling Perl 6 Perl made me think.

csande17 · on April 20, 2020

If you're interested in what actually changed in this update, the release notes are here: https://github.com/python/cpython/blob/2.7/Misc/NEWS.d/2.7.1...

MattGaiser · on April 20, 2020

Still know a heck of a lot of people using it. Even know one researcher using it for a new project.

It will be decades before the final Python 2 program goes offline.

colanderman · on April 20, 2020

Ya, there's no way this is the "final" release. Maybe by the core Python team, but it will be forked to fix bugs. Ten years from now there will still be Python 2 code running critical infrastructure at various companies, and the most responsible path to address discovered issues in the runtime will not be "rewrite the application to work in Python 3!" but "upgrade the interpreter to this community-vetted fork of 2.7.18".

Mumble mumble something about conflating languages with implementations.

takeda · on April 21, 2020

What's use of Python 2 if you can't use libraries[1]?

It will only get more difficult to maintain your app.

[1] https://python3statement.org/ - note many libraries weren't even waiting until 2020. It is a lot of work to maintain code with python 2 cruft. Not all packages are listed there, for example Django is Python 3 only, starting from 2.0 (currently at 3.0)

josefx · on April 21, 2020

> What's use of Python 2 if you can't use libraries[1]?

Unless some python 3 fanatic goes out of his way to write a python 2 library deleting virus the existing code wont disappear. Also some of these pledges only limit feature releases, afaik numpy planned to still provide a long term support version with bugfixes for python 2. It also helps that python already comes with a lot of build in bells and whistles so third party libraries aren't always necessary either.

takeda · on April 21, 2020

Sorry I wasn't clear nothing happens if your application doesn't change, but if you do sooner or later you'll be forced to upgrade your dependencies (could be a bug that you just found, maybe a bug, or maybe a performance improvement) if the updated version won't work on your python it will be tough. You'll have choice to either migrate your app to python 3 or fork the library and backport fixes.

You might be lucky and someone else might do that for you, but it will be harder and harder with time. Already according to JetBrains survey in 2019 (I believe) about 80% of people surveyed they already use python 3.

As for numpy I just checked[1] and the only wheels they are providing for the latest version are 3.5+ the package also says that it is 3 only.

[1] https://pypi.org/project/numpy/#files

philwelch · on April 21, 2020

The problem isn't that Python 2 is bad. Python 2 is a fantastic language. The problem is that the maintainers of Python decided to break backwards compatibility and force library developers to support what are essentially two different programming languages.

kstrauser · on April 21, 2020

I don’t think there was any clear path around that, though. The single biggest change was that Python 2 pretended that text and binary data were the same datatype, where Python 3 correctly makes you distinguish between the two. There’s not really a great way to roll out that major change without breaking tons of stuff along the way. And, well, if you’re already making a backward-incompatible version, here’s this checklist of other breaking changes you might as well bring along for the ride.

philwelch · on April 21, 2020

And that raises an obvious question: why didn’t every other programming language immediately break backwards compatibility when UTF-8 became a de facto standard?

> And, well, if you’re already making a backward-incompatible version, here’s this checklist of other breaking changes you might as well bring along for the ride.

Sorry, that doesn’t track. Treating quoted strings as UTF-8 by default instead of ASCII-or-arbitrary-bytes would have been a small migration that would not have taken over a decade to complete.

b2gills · on April 27, 2020

The way Perl dealt with this was to have you declare when you are using UTF8.

    # declare that the code itself is written in utf8
    use utf8;

    my $ā = 'ā';

If you need unicode strings to work, you turn on the unicode strings feature.

    use feature 'unicode_strings';

Another way to turn it on:

    use v5.12;

(Declaring which version of the language you need is something you should do anyway.)

Really mostly what you have to do is declare the encodings of the file handles.

    # change it for all files
    use open ':encoding(UTF-8)';

To change it per file handle, you would use `binmode`. (Which was originally added to allow binary code to work on Windows.)

    open my $fh, '<', 'example.txt';
    binmode $fh, ':utf8';

(Declaring the encoding of an opened file is something you should do anyway.)

---

Basically Perl just defaults to the old original ways. If you need a new feature which would break old code, you just declare that you need it.

Because of that, most code that was written for an earlier form of Perl still works on the latest version.

takeda · on April 21, 2020

Because many of these languages were created when Unicode already existed. Someone listed Java and Javascript, both of them started from the point that python 3 tries to bring.

When python was written in 1989 Unicode didn't exist yet.

As for your second argument, many people bring out Go, that had such amazing idea of using everything as UTF-8 and it works great. They don't realize that Go is pretty much doing the same thing that Python does (ignoring how the string is represented internally, since that shouldn't really be programmer's concern).

Go clearly distinguishes between string (string type) and bytes ([]byte type) to use string as bytes you have to cast it to []byte and to convert bytes to string you need to cast them to string.

That's the equivalent of doing variable.encode() to get bytes and you do variable.decode() to get a string.

What python 3 inroduced is two types str and bytes, and blocked any implicit casting between them. That's exactly same thing Go does.

The only difference is implementation detail, Go stores strings as utf-8 and casting doesn't require any work, they are just for having compiler catch errors it also ignores environment variables and always uses utf-8. Python has an internal[1] representation and does do conversion. It respects LANG and other variables and uses that for stdin/out/err. Initially when those variables were undefined it assumed us-ascii which created some issues, but I believe now that was fixed and utf-8 is the default.

[1] Python 3 actually tries to be smart and uses UCS1 (Latin 1), UCS2 or UCS4 depending what characters are contained. If an UTTF-8 conversion was requested it will also cache that representation (as a C-string) so it won't do the conversion next time.

philwelch · on April 22, 2020

> Because many of these languages were created when Unicode already existed. Someone listed Java and Javascript, both of them started from the point that python 3 tries to bring.

That was me in a parallel thread. Java and JavaScript internally use UTF-16 encoding. I also mentioned C, which treats strings as byte arrays, and C++, which supports C strings as well as introducing a string class that is still just byte arrays.

> As for your second argument, many people bring out Go, that had such amazing idea of using everything as UTF-8 and it works great.

Has Go ever broken backwards compatibility? Let me clarify my second argument: if you are going to break backwards compatibility, you should do so in a minimal way that eases the pain of migration. The Python maintainers decided that breaking backwards compatibility meant throwing in the kitchen sink, succumbing to second system effect, and essentially forking the language for over a decade. The migration from Ruby 1.8 to 1.9 was less painful, though in fairness I suppose the migration from Perl 5 to Perl 6 was even more painful.

b2gills · on April 27, 2020

Actually migrating from Perl5 to Raku may be less painful than migrating from Python2 to Python3 for some codebases.

That is because you can easily use Perl5 modules in Raku.

    use v6;

    use Scalar::Util:from<Perl5> <looks_like_number>;

    say ?looks_like_number( '5.0' ); # True

Which means that all you have to do to start migrating is make sure that the majority of your Perl codebase is in modules and not in scripts.

Then you can migrate one module at a time.

You can even subclass Perl classes using this technology.

Basically you can use the old codebase to fill in the parts of the new codebase that you haven't transferred over yet.

---

By that same token you can transition from Python to Raku in much the same way. The module that handles that for Python isn't as featurefull as the one for Perl yet.

    use v6;

    {
        # load the interface module
        use Inline::Python;

        use base64:from<Python>;

        my $b64 = base64::b64encode('ABCD');

        say $b64;
        # Buf:0x<51 55 4A 44 52 41 3D 3D>

        say $b64.decode;
        # QUJDRA==
    }

    {
        # Raku wrapper around a native library
        use Base64::Native;

        my $b64 = base64-encode('ABCD');

        say $b64;
        # Buf[uint8]:0x<51 55 4A 44 52 41 3D 3D>

        say $b64.decode;
        # QUJDRA==
    }

    { 
        use MIME::Base64:from<Perl5>;

        my $b64 = encode_base64('ABCD');

        say $b64;
        # QUJDRA==
    }

    {
        use Inline::Ruby;
        use base64:from<Ruby>;

        # workaround for apparent missing feature in Inline::Ruby
        my \Base64 = EVAL ｢Base64｣, :lang<Ruby>;

        my $b64 = Base64.encode64('ABCD');

        say $b64;
        # «QUJDRA==
        # »:rb

        say ~$b64;
        # QUJDRA==
    }

I just used four different modules from four different languages, and for the most part it was fairly seamless. (Updates to the various `Inline` modules could make it even more seamless.)

So if I had to I could transition from any of those other languages above to Raku at my leisure.

Not like Python2 to Python3 where it has to mostly be all or nothing.

takeda · on April 22, 2020

> That was me in a parallel thread. Java and JavaScript internally use UTF-16 encoding. I also mentioned C, which treats strings as byte arrays, and C++, which supports C strings as well as introducing a string class that is still just byte arrays.

C and C++ doesn't really have Unicode support, and most C and C++ applications don't support unicode. There are libraries that you need to use to get this kind of support.

> Has Go ever broken backwards compatibility? Let me clarify my second argument: if you are going to break backwards compatibility, you should do so in a minimal way that eases the pain of migration. The Python maintainers decided that breaking backwards compatibility meant throwing in the kitchen sink, succumbing to second system effect, and essentially forking the language for over a decade. The migration from Ruby 1.8 to 1.9 was less painful, though in fairness I suppose the migration from Perl 5 to Perl 6 was even more painful.

Go is only 10 years old Python is 31. And in fact it had some breaking changes for example in 1.4, 1.12. Those are easy to fix since they would show up during compilation. Python is a dynamic language and unless you use something like mypy you don't have that luxury.

Going back to python, what was broken in Python 2 is that str type could represent both text and bytes, and the difficulty was that most Python 2 applications are broken (yes they worked fine with ascii text but broke in interesting ways whenever unicode was used. You might say, so what, why should I care if I don't use Unicode. The problem was that mixing these two types and implicit casting that python 2 did made it extremely hard to write correct code even when you know what you're doing. With python 3 is no effort.

There is a good write up by one of Python developers why python 3 was necessary[1].

[1] https://snarky.ca/why-python-3-exists/

philwelch · on April 24, 2020

> Going back to python, what was broken in Python 2 is that str type could represent both text and bytes...

You know, it’s astounding to me that you managed to quote my entire point and still didn’t even bother to acknowledge it, let alone respond to it.

If they had to break backwards compatibility to fix string encoding, that’s fine and I get it. That doesn’t explain or justify breaking backwards compatibility in a dozen additional ways that have nothing to do with string encoding.

Are you going to address that point or just go on another irrelevant tangent?

lizmat · on April 22, 2020

There is no migration from Perl 5 to Perl 6, but mainly because Perl 6 has been renamed to Raku (https://raku.org using the #rakulang tag on social media).

That being sad, you can integrate Perl code in Raku (using the Inline::Perl5 module), and vice-versa.

philwelch · on April 22, 2020

Yes, that was the joke :)

afiori · on April 21, 2020

some would say that the distinction was in the wrong places, like assuming that the command line arguments or file paths were utf8

ynik · on April 21, 2020

Fundamentally, the "right place" here differs between Windows and Linux. On Windows, command line arguments really are unicode (UTF-16 actually). On Linux, they're just bytes. In Python 2, on Linux you got the bytes as-is; but on Windows you got the command line arguments converted to the system codepage. Note that the Windows system codepage generally isn't a Unicode encoding, so there was unavoidable data loss even before the first line of your code started running (AFAIK neither sys.argv nor sys.environ had a unicode-supporting alternative in Python 2). However, on Linux, Python 2 was just fine.

Now with Python 3 it's the other way around -- Windows is fine but Linux has issues. However, the problems for linux are less severe: often you can get away with assuming that everything is UTF-8. And you can still work with bytes if you absolutely need to.

pdonis · on April 21, 2020

> On Windows, command line arguments really are unicode (UTF-16 actually)

No, they're not. Windows can't magically send your program Unicode. It sends your program strings of bytes, which your program interprets as Unicode with the UTF-16 encoding. The actual raw data your program is being sent by Windows is still strings of bytes.

> you can still work with bytes if you absolutely need to

In your own code, yes, you can, but you can't tell the Standard Library to treat sys.std{in|out|err} as bytes, or fix their encodings (at least, not until Python 3.7, when you can do the latter), when it incorrectly detects the encoding of whatever Unicode the system is sending/receiving to/from them.

> AFAIK neither sys.argv nor sys.environ had a unicode-supporting alternative in Python 2)

That's because none was needed. You got strings of bytes and you could decode them to whatever you wanted, if you knew the encoding and wanted to work with them as Unicode. That's exactly what a language/library should do when it can't rely on a particular encoding or on detecting the encoding--work with the lowest common denominator, which is strings of bytes.

takeda · on April 21, 2020

> In your own code, yes, you can, but you can't tell the Standard Library to treat sys.std{in|out|err} as bytes,

Actually you can, you should use sys.std{in,out,err}.buffer, which will be binary[1]

> or fix their encodings (at least, not until Python 3.7, when you can do the latter), when it incorrectly detects the encoding of whatever Unicode the system is sending/receiving to/from them.

I'm assuming you're talking about scenario where LANG/LC_* was not defined, then Python assumed us-ascii encoding. I think in 3.7 they changed default to UTF-8.

[1] https://docs.python.org/3/library/sys.html#sys.stdin

pdonis · on April 22, 2020

> Actually you can, you should use sys.std{in,out,err}.buffer,

That's fine for your own code, as I said. It doesn't help at all for code in standard library modules that uses the standard streams, which is what I was referring to.

> I think in 3.7 they changed default to UTF-8

Yes, they did, which is certainly a saner default in today's world than ASCII, but it still doesn't cover all use cases. It would have been better to not have a default at all and make application programs explicitly do encoding/decoding wherever it made the most sense for the application.

takeda · on April 22, 2020

> That's fine for your own code, as I said. It doesn't help at all for code in standard library modules that uses the standard streams, which is what I was referring to.

I'm not aware what code you're talking about. All functions I can think of expect to provide streams explicitly.

> Yes, they did, which is certainly a saner default in today's world than ASCII, but it still doesn't cover all use cases. It would have been better to not have a default at all and make application programs explicitly do encoding/decoding wherever it made the most sense for the application.

I disagree, it would be far more confusing when stdin/stdout/stderr were sometimes text sometimes binary. If you meant that they should always be binary that's also unoptimal. In most use cases an user works with text.

pdonis · on April 22, 2020

> I'm not aware what code you're talking about.

All the places in the standard library that explicitly write output or error messages to sys.stdout or sys.stderr. (There are far fewer places that explicitly take input from sys.stdin, so there's that, I suppose.)

> it would be far more confusing when stdin/stdout/stderr were sometimes text sometimes binary

I am not suggesting that. They should always be binary, i.e., streams of bytes. That's the lowest common denominator for all uses cases, so that's what a language runtime and a library should be doing.

> If you meant that they should always be binary that's also unoptimal. In most use cases an user works with text.

Users who work with text can easily wrap binary streams in a TextIOWrapper (or an appropriate alternative) if the basic streams are always binary.

Users who work with binary but can't control library code that insists on treating things as text are SOL if the basic streams are text, with buffer attributes that let user code use the binary version but only in code the user explicitly controls.

takeda · on April 21, 2020

Linux had issues whenever LANG/LC_* variables weren't defined, python assumed us-ascii, I believe that was changed recently to just assume utf-8.

int_19h · on April 21, 2020

Python 3 does not make such assumptions; it uses the appropriate locale encoding.

(IIRC it used to do that in 3.0, but they backtracked very quickly - and 3.0 was effectively treated as a beta by the community at large, anyway.)

someguydave · on April 21, 2020

> correctly

Sometimes it’s better to be correct and also yield to the common good.

adrianN · on April 21, 2020

You can keep using the libraries you're already using. That's totally fine for many applications.

anon73044 · on April 20, 2020

More likely by then the push to "RIIR"™ will be so overwhelming that most people won't be able to help themselves.

anon73044 · on April 21, 2020

Glad to have struck a nerve with the Rust evangelists.

tom_mellior · on April 20, 2020

> Mumble mumble something about conflating languages with implementations.

So your claim is that "Python 2" is a language spec, not an implementation? And that there will be future releases of this language spec in the future? I doubt it.

I agree it's likely that there will be people wasting their time maintaining an interpreter fork, but that will not be Python-the-language (a trademarked term BTW), it will be a fork of the implementation.

colanderman · on April 20, 2020

No, my claim is that, while ceasing development of the language Python 2 is wholly sensible, ceasing development of the implementation Python 2 (CPython specifically) is not (due the almost certain existence of latent bugs). My "mumble" at the end was meant exactly in reference to that.

I suppose one could argue that, the CPython implementation is the language specification. (And I seem to recall hearing that notion somewhere years ago.) In which case, it would not be possible to freeze development of the language without freezing the implementation as well. There are various reasons I wholeheartedly disagree with such a characterization, but I guess there's some self-consistency there at least.

tom_mellior · on April 21, 2020

> ceasing development of the language Python 2 is wholly sensible, ceasing development of the implementation Python 2 (CPython specifically) is not

Development of CPython 2 has ended, bugs and all. It's past its end of life, this is well known and has been known for a long time. Any remaining bugs are the problem of the users, not the responsibility of the former developers.

Sure people will fork it and do stuff with those forks, but those will no longer be new versions of CPython, they will be new versions of some-fork-of-CPython.

nxpnsv · on April 21, 2020

This is clear, but does the cpython team want to maintain Python2? If not, either it’s time to use an alternative or move on....

j88439h84 · on April 21, 2020

> So your claim is that "Python 2" is a language spec, not an implementation? And that there will be future releases of this language spec in the future?

PyPy maintains a Python 2 implementation and will continue to do so.

carapace · on April 21, 2020

> "Python 2" is a language spec

Cython, IronPython, Brython, Stackless Python, Nuitka, etc.

contravariant · on April 20, 2020

Does anyone know what the main reason is for not updating from python 2? I'm genuinely curious as I don't really know any modules that won't work under Python 3 and I can't really come up with any other blocking changes that would make upgrading that hard.

jsmeaton · on April 20, 2020

I did the work for a reasonably sized project recently - a few hundred thousand LOC. It was long, boring, risky work. Let me rattle off some of the tasks.

Audit all strings coming in and going out for encoding issues. Update all dependencies to their python 3 equivalent. Replace dependencies that hadn’t been updated (typically older django dependencies). Use python-future to bulk update incompatibilities. Changes to metaclasses were annoying. Force all uses of pickle to use protocol version 2. I documented some more during the migration on Twitter https://twitter.com/jarshwah/status/1209381850822496256?s=21

We began getting the code base into a compatible position about 1.5 years earlier. A final push of 3-4 weeks of work got it over the line, with many bug fixes after the deployment.

Other older larger systems will have similar problems at a larger scale.

This isn’t a condemnation by the way. Python 3 is better. The only reason we held out so long was because of the business justification. Once we couldn’t wait any longer it got prioritised.

Joeboy · on April 21, 2020

That's all uncannily similar to my experience porting our ~350k LOC app.

"It's a dirty job, but someone's got to do it".

malkia · on April 20, 2020

Many internal tools for one, platforms, etc. Hard to tell.

One industry example is https://vfxplatform.com/ - they just (this year) moved to Python3, but with some delays, from the site:

    The move to Python 3 was delayed from CY2019 to CY2020 due to:

    No supported combination of Qt 5.6, Python 3 and PySide 2 so Qt first needed to be upgraded.
    Upgrade of both Qt and Python in the same year was too large a commitment for software vendors and large studios.
    Python 3 in CY2020 is a firm commitment, it will be a required upgrade as Python 2 will no longer be supported beyond 2020. Software vendors are strongly encouraged to provide a tech preview release in 2019 to help studios with testing during their Python migration efforts.

takeda · on April 21, 2020

The active development of Python 2.7 stopped in 2015, that was the time to start migrating. Seems like this application would never updated if 2 wasn't EOL.

orionblastar · on April 21, 2020

There are still Classic Visual BASIC programs out there that haven't been ported to VB.Net or C# yet because of how huge they are and how hard it is to code that they cannot afford to hire developers to do it for them. The same is true of many old languages like COBOL.

I heard that some places still using Turbo Pascal for DOS and have to stick with 32 bit machines because 64 bit can't run 16 bit DOS code.

takeda · on April 21, 2020

Yes, and you similarly can continue using python 2.7.18 for next 10 years, no one expects Microsoft to continue releasing new versions of the classic VB. A lot of python users have weird expectations.

jefft255 · on April 20, 2020

Lot’s of code is simply unmaintained. The guy that wrote it is gone, it’s still running fine so nobody is touching it. Businesses don’t want to take the risk and spend the money to upgrade it. Maybe you don’t realize the insane amount of code that is in this state!!

JimDabell · on April 21, 2020

If you depend on an unmaintained codebase where the original developers are no longer available, then that's a substantial business risk by itself.

Too many software development projects are treated as one-off events where people commission them and assume they will work forever without updates. Software requires maintenance, and people who commission software development projects without planning on how they are going to be maintained in the future are taking on risk. Any risk involved in updating that abandoned code in future is a consequence of that decision.

6gvONxR4sf7o · on April 21, 2020

In some ways that's the whole point of code. You want to get to forget about it doing all these things without you.

takeda · on April 21, 2020

If that's the case then 2.7.18 will continue working and probably it is a bad idea to port it. A lot of work for minimum gain.

But if you actively changing the code, the maintenance will get more and more expensive. With packages dropping python 2 support if you discover a bug in one of your dependencies and fix is in package that no longer work on python 2 you'll need to backport the fix (and maintain your fork) or migrate your code.

kelnos · on April 20, 2020

This is in part due to a lack of foresight, but you can run into all sorts of weird issues that you'd never think of, like this one: we have a feature in our REST API that can return lists of items as CSV instead of JSON (yes, I know, it sounds weird). It requires no effort from our backend services; the api proxy takes care of it. Unfortunately, something changed with dict enumeration order between python 2 and 3, and so when we first tried to upgrade, the CSV files being spit out had a new column ordering, which of course would have broken customer code that relied on it.

The string-handling changes, while necessary, are also a bear to deal with. Since python is dynamically typed, you need to work to find all the places where you need to add a ".decode()" or ".encode()". If you don't have excellent test coverage already, you're going to miss some, and it'll be a game of whack-a-mole until you get them all... assuming you have actually gotten them all.

JimDabell · on April 21, 2020

> something changed with dict enumeration order between python 2 and 3

Dicts were by definition unordered until Python 3.7 [0], so you were relying on undefined behaviour. If you need an ordered dictionary and support Python 3.6 or below, you should use OrderedDict [1].

[0] https://mail.python.org/pipermail/python-dev/2017-December/1...

[1] https://docs.python.org/3/library/collections.html#collectio...

toyg · on April 21, 2020

> something changed with dict enumeration order between python 2 and 3

Enumeration order in dict keys was never guaranteed (until 2019), even on 2. So basically that code relied on undocumented cpython behaviour that was strongly advised against, i.e. it was broken already. 3 simply made the brokenness more visible.

dataflow · on April 20, 2020

I imagine the reason for not upgrading from Python 2 is the same reason you don't upgrade your car just because there's a new model out. (Or maybe you're the type who does, but I guess you can hopefully understand why others don't do that.)

abathur · on April 20, 2020

The Oil shell, at least, reverted to (a forked version of) 2 after upgrading to 3. http://www.oilshell.org/blog/2018/03/04.html#faq

toyg · on April 21, 2020

How has that project not been sued into oblivion by Royal Dutch Shell yet...?

takeda · on April 21, 2020

I think the biggest issue seems to be that you need to migrate the whole thing at once. If it could be done incrementally it would be easier.

Although there are ways, you can still incrementally adapt code base to work on both pythons. Also pylint with py3k option, mypy can help. There's also six packages, but many people seem to had good luck with futurize.

There's also something that I tried a while ago and it surprisingly worked (although it might not work that well on larger codebase?), basically you can use Cython (not to be confused with CPython) to compile Python 2 code and then include it in python 3, this would enable migration file by file.

wiredfool · on April 20, 2020

Personally, the main software I use at $DAY_JOB doesn't support python 3 yet.

I'm not looking forward to the scramble when it does upgrade, as we're using a ton of community modules that may or may not be abandoned.

ies7 · on April 21, 2020

Your existing code work perfectly for now. Management failed to understand while they need to budget a team to upgrade when the project doesn't bring anything new to the table.

falcolas · on April 20, 2020

Our migration to Python 3 occurred with a new generation of the product. No new development is happening on the Python 2 product, and customers are being migrated off. The vendor who runs our old product (GAE) has promised ongoing Python 2 support, so there’s literally no reason to spend the time or money to migrate it, no matter how long it takes for the last customer to get off the old product.

TylerE · on April 20, 2020

Because giant codebase that would take effort++++ to do so for no perceived gain, with possible addition of new bugs.

mywittyname · on April 20, 2020

> effort++++

effort += 1 + 1

Ported to Python3 for you.

nxpnsv · on April 21, 2020

Clearly c got it wrong. The number of + should have been increment so e->e + 1 should be e+ e+->e+ +1 should be e++ ...

moonchild · on April 21, 2020

Correction:

  effort = (effort := effort + 1) + 1

pansa2 · on April 21, 2020

Executable pseudocode!

imtringued · on April 21, 2020

That's how we get decades old cobol codebases that nobody understands.

vharuck · on April 20, 2020

In my work, it's software that offers a Python 2 module for scripting. I tried the naïve "upgrade" of copying the module into my Python 3 module library, but no dice. The software checked the version of Python, saw it wasn't 2.7.10 (yeah), and raised an error.

Animats · on April 20, 2020

Some of the older hosting services still support only Python 2, or some early crappy version of Python 3.

bsder · on April 21, 2020

The same reasons that an even larger codebase never upgraded from Visual Basic 6.

a1369209993 · on April 20, 2020

Because there's nothing to update to. Everyone who was working Python circa 2.7.0 has switched to working on a new, different language which they insist on misleadingly calling "Python 3" rather than come up with a new name like the Perl -> Raku folks did.

abiogenesis · on April 21, 2020

I think you are trolling, but in case you aren't; can you elaborate what is so different in Python 3? Granted, it is not a drop-in replacement but it is 99.9% the same thing.

downerending · on April 20, 2020

There's a lot of code out there that's important, but not worth porting to a new language (which Python 3 is).

acdha · on April 20, 2020

I think describing it as porting to a new language is misleading: on most of my projects, most of the work is a few minutes — run modernize/futurize, check the tests, etc. If the original developers were really sloppy about how they handle encoding, it can take longer but most of the problems I've seen have very little to do with Python rather than the fact that something still running Python 2 likely has significant technical debt issues — especially things like not having test coverage which make it a lot harder to ship changes.

downerending · on April 20, 2020

> likely has significant technical debt issues

You're not wrong, but as a practical matter, that pretty much describes our entire industry.

acdha · on April 20, 2020

Completely agreed — I would just argue that the “Python 2 vs. 3” argument is a distraction. Java hasn't had as breaking a change but there are still a ton of places running Java 6 or 7 because they like skimping on developers more than getting security updates.

Joeboy · on April 20, 2020

Our product took about a year to port, from getting the go-ahead to the eventual production switch. Running modernize / futurize was like 0.01% of the work.

You're right though, we're fighting our way out of technical debt and switching to python 3 was absolutely necessary. It's forced us to sort out a lot of sketchy string / bytes handling. We do, mercifully, have ~90% test coverage.

abiogenesis · on April 21, 2020

A year is astronomical. Does it involve writing tests that did not exist before?

Joeboy · on April 21, 2020

I don't remember us writing any new tests specifically for the python 3 port.

One issue with the test suite was that it made heavy use of a thing called django_any that isn't supported in python 3, so decided to replace it with Factory Boy. We have about 500 django models that needed new factories. Factory Boy works quite differently and it was a lot of work to make the factories behave similarly to the old ones where possible, and update most of our ~4000 tests for the new behaviour.

So that was one issue. It was tempting to just patch django_any, but we decided to tackle the technical debt instead.

MattGaiser · on April 20, 2020

> something still running Python 2 likely has significant technical debt issues

Most codebases basically.

zitterbewegung · on April 20, 2020

There isn’t anything inherently wrong with still using 2.7.x . Just don’t expect updates. For new code using 3.7 is probably the best bet at this time.

azinman2 · on April 20, 2020

I thought there won't be any more security updates?

acdha · on April 20, 2020

Red Hat, Ubuntu, etc. are going to support Python 2 for the duration of the operating system releases which shipped it. I would assume that Anaconda, et al. will have similar options for paid customers.

DCKing · on April 20, 2020

Red Hat has committed to keeping Python 2 on life support until 2024 as part of Red Hat Enterprise Linux 8 [1] so you can get security fixes for Python 2 until then if you use CentOS 8.

Canonical will not provide long term support for Python 2 as part of Ubuntu 20.04 LTS. In Ubuntu 20.04, Python 2 is a "universe" package [2] that does not receive updates by Canonical. This means that the you will only get Python 2 security update guarantees with Ubuntu is on 18.04 LTS until April 2023.

Debian is making an active effort [3] to remove Python 2 and packages that depend on it for its next release. It'll likely support Python 2 as part of Debian Buster until 2024.

Note that if you're reading this to delay your move to Python 3 by another few years, you're doing it wrong. This list shows even all slow enterprise-y distros have a deadline for Python 2, not that you can stretch your stuff for a couple of more years :)

[1]: https://access.redhat.com/solutions/4455511

[2]: https://packages.ubuntu.com/focal/python2

[3]: https://wiki.debian.org/Python/2Removal

takeda · on April 21, 2020

I believe the biggest thing to worry is your application dependencies, if you also depend on packages that come with your system then probably fine (although I noticed that these are largely ignored even if there bugfixes they don't update them)

Otherwise even if your python has security patches for next 4 years, it won't do you any good when you find a bug in one of your dependencies and bugfix is in a version that's python 3 only

acdha · on April 20, 2020

Thank you for providing the extra details — I especially agree with your conclusion: go to your boss and say “even if we pay, we're looking at a drop dead date no later than 2024”.

NikolaeVarius · on April 20, 2020

So? Im sure my non internet facing scripts care

gnulinux · on April 20, 2020

Distro maintainers will be patching security bugs for the foreseeable feature. Do you seriously think if there is a security bug found today Debian maintainers will be like "ah, though luck, I suppose people need to upgrade to py3"?..

loeg · on April 20, 2020

Broadly, distros have been ripping out Python2 left and right in advance of 2020. Debian may have a longer support cycle than most and still have Python2 in stable or oldstable.

shakna · on April 21, 2020

> Do you seriously think if there is a security bug found today Debian maintainers will be like "ah, though luck, I suppose people need to upgrade to py3"?..

> During DebConf19 we¹ have tried to figure out how to manage Python 2 and PyPy module removal from Debian and below is our proposal. [0]

Debian are in the midst of a large project [1] to remove Python 2 as quickly as they possibly can. Whilst some bugfixes may happen, Debian are already telling you in no uncertain terms:

> port upstream package to python3

> remove any Python 2 use

[0] https://lists.debian.org/debian-python/2019/07/msg00080.html

[1] https://wiki.debian.org/Python/2Removal

chrisseaton · on April 20, 2020

Python's open source. Anyone can do security updates. Teams at RedHat, Debian, Oracle, etc, will be doing security updates for many decades I'm sure. You may have to pay.

loeg · on April 20, 2020

Can't do it under the "Python" trademark name, though.

habitue · on April 20, 2020

A huge amount of the work of distro maintainers is actually just this kind of backporting and applying security fixes. You're right that technically, the python foundation (or whoever owns the trademark) could come after redhat for making these kinds of changes but it's very doubtful they would.

If redhat decided to add new features to python 2.7, I'm sure the PSF would make a stink

takeda · on April 21, 2020

You can, that's what RedHat is doing. It will be still Python 2.7.18 + security patches.

You probably confusing it with Tauton (a Python 2.7 with backported Python 3 features) that tried to place itself as Python 2.8. By backporting these changes they created essentially 3rd version of Python that was incompatible with other 2.

cvwright · on April 20, 2020

Oh no, it’s the whole “IceWeasel” fiasco all over again.

Maybe they could call the new 2.7 interpreter IceSnake.

chupasaurus · on April 21, 2020

Ice was used as antonym for fire, thunder and others. Should be some other reference to "Monty Python's Flying Circus" imo.

takeda · on April 21, 2020

dead parrot

chrisseaton · on April 20, 2020

Not sure that's true?

> As such, stating accurately that software ... is compatible with the Python programming language, or that it contains the Python programming language, is always allowed.

loeg · on April 20, 2020

There weren't many before this, either.

grandphuba · on April 20, 2020

3.8 seems to have some new keywords and jazz introduced, so might be better to start with that instead of 3.7 for new projects.

wiredfool · on April 20, 2020

3.8 seems to be much more twitchy about exact versions of dependencies, so I've had problems running the AWS cli stuff on 3.8 at times, because there's no set of non-conflicting dependencies. (oftentimes due to minor/patch level version mismatches)

MattGaiser · on April 20, 2020

I keep having issues with 3.8 and many dependencies. Two months back, I started out a new project in 3.8 and two days in was downgrading it due to compatibility problems with Pillow and a couple others.

takeda · on April 21, 2020

What typically is happening is that a wheel package was missing. When that happens python tries to compile the package, to do that it requires extra dependencies, such as compiler, python-devel, and other *-devel packages, because they weren't available it failed. This is very common when a new major version is released, it requires authors of the C based packages to build wheels to make installation easier and not requiring extra dependencies.

Looks like Pillow has wheel for 3.8 now since April 2nd, so might work now (no compilation is needed). I don't know other packages so can't check them. Psycopg2 would probably be another one with this issue (also fixed on April 6th).

[1] https://pypi.org/project/Pillow/#files look for cp38 wheel files.

u801e · on April 20, 2020

The CentOS base yum repository only has python 3.6.x. I hope they will make later 3.x branches available for installation in the near future.

lmns · on April 21, 2020

CentOS / RHEL 8.2 will ship with 3.8.

Izkata · on April 20, 2020

Just discovered a week or two ago a pretty important part of one of our internal systems is running on python 2.6

loeg · on April 20, 2020

Caveat about 2.7: HTTPS switches from default-non-validating to default-validating. If your internal system uses SSL, and you have a bunch of self-signed certs, for example.

Izkata · on April 20, 2020

I think we're prooobably good there, this system is primarily slinging around and transforming data between databases.

The far bigger worry would be unexpected encoding issues corrupting the data when going from 2->3.

BeetleB · on April 20, 2020

> Even know one researcher using it for a new project.

Knowing researchers (at least in academia), they're the last people I would expect to change.

takeda · on April 21, 2020

And those are least worrying, once they publish their thesis, they don't give a damn about their code anymore and it just dies natural death ;)

smitty1e · on April 20, 2020

"So?" said COBOL.

mullen · on April 20, 2020

"Hold my beer!" - COBOL

linsomniac · on April 20, 2020

We've switched off Python 2, but I really miss it. For a glorious several years we were done with "production has Python X.Y, but the code needs X.Z" and always chasing the latest minor version on LTS OS releases.

But now we're back on that treadmill...

brianwawok · on April 20, 2020

How long does python upgrading take you? I think Python 3.7 to 3.8 was change my docker version and push to CI... 5 minutes later it told me all tests passed and I deployed to production??

mywittyname · on April 20, 2020

I upgraded from 3.7 to 3.8 and found out they removed a function in the time package required by crypto.

I do strongly feel that removing functions from the core library is a huge no-no for point releases.

elcritch · on April 20, 2020

From afar, Python 3.x releases seem more like major updates than point releases. There’s entirely blocks of functionality/syntax between 3.x releases.

brianwawok · on April 20, 2020

Except it is mostly additive.

Adding new features will seldom break old stuff. It is the removing part that is hard.

(With the exception being when like a variable broke because it became a keyword, but if you made a variable something like async I am not sure you are entirely innocent).

xxpor · on April 20, 2020

You're supposed to pay attention to DeprecationWarnings.

MattGaiser · on April 20, 2020

The problem is that the entire ecosystem needs to pay attention to such warnings and it doesn't happen. As a result, these changes end up breaking code in places the program authors never touched.

xxpor · on April 20, 2020

That's true in one respect, but you as an end user can use them to know not to upgrade before your deps have changed what they need to.

scoot_718 · on April 21, 2020

What? You upgrade your dependencies when there's a feature or improvement you need. Not because they don't have some warnings. Who cares?

The old version you implicitly claim is better isn't, it just doesn't have the warnings from their dependencies in place yet.

xxpor · on April 21, 2020

I mean, if you're going to upgrade your python version, you should check your logs to see if you have had any DWs recently. If you have, upgrade your deps before upgrading python. If theres no new version available, you can't upgrade python yet.

scoot_718 · on April 21, 2020

In my experience DeprecationWarnings get turned off so frequently because I'm not forking big library to fix all of it's Deprecated uses of their dependencies.

The warnings are useless if they're not from my code, so they get turned off once globally.

takeda · on April 21, 2020

They are still important. Because even if dependency has this warning, it will still break your code.

afiori · on April 21, 2020

but they are also not actionable

sambe · on April 21, 2020

You can file an issue/PR in most cases, and if you don't get a response you may need to worry about moving libraries.

takeda · on April 21, 2020

well actually they are, they essentially saying that in future version your application will break.

Yes, it is not your package that is responsible, but it still affects your application, you could open a ticket, or submit a PR. If you had that message you should also hold of with upgrading to newer python until this is resolved.

scoot_718 · on April 29, 2020

> well actually they are, they essentially saying that in future version your application will break.

No, you idiot. It's a deprecated usage in a dependency.

What happens in the future is I update pandas, it stops using the deprecated numpy method, and the warning just disappears with very little action on my part.

It's a useless warning.

And submit a PR every time this happens? How about I request you pull me?

> If you had that message you should also hold of with upgrading to newer python until this is resolved.

1. Upgrading to newer python? Who fucking cares

2. Upgrading resolves the warning entirely.

sigzero · on April 20, 2020

"The function time.clock() has been removed, after having been deprecated since Python 3.3"

Python 3.3 was release in 2012. You've had 8 years.

mywittyname · on April 20, 2020

Well, technically we already moved to 3.8, after which needed a library that only works up until 3.7.

It would be nice if library developers kept their code up to date, but that doesn't always happen. Python core devs know this; well all know this, yet they consciously screw with the core libraries with the principle, caveat emptor.

I don't understand why they don't ear-mark these changes for 4.0. These kind of things are a universal frustration with the community and they are so easily avoidable.

nitroll · on April 21, 2020

What difference would it make whether the change had been called 3.8 or 4.0?

dataflow · on April 20, 2020

I'm confused, who exactly has had 8 years to do what? Should he have dumped the crypto module because it was using a deprecated feature?

glyph · on April 21, 2020

The reason the crypto module was using this particular deprecated feature is that it hasn't been updated at all in 8 years.

The OP should have dropped it because it's unmaintained, and a maintained replacement has existed for a long time: https://cryptography.io/

This is an especially important consideration for security-critical libraries like cryptographic libraries.

dataflow · on April 21, 2020

See, when you explain what's wrong it's so much better than just blaming the victim with "You've had 8 years"!

toyg · on April 21, 2020

One issue the ecosystem currently has, really (and its not the only one, I believe it's difficult almost everywhere), is that tracking dependency-rot is hard. Unless something breaks outright, you'll never know if a library has been abandoned; and manually checking dozens of github/gitlab repos is expensive and tedious.

Pypi has an api (https://pypi.org/pypi/<pkg-name>/json) that can be leveraged to implement alerts like "this pkg last released 5 years ago, it might be dead!". I guess that's what the "security" package uses already. It would be cool if they added an option to report on this sort of thing.

mywittyname · on April 21, 2020

OP here, thanks a bunch for this! I will take your advice and dump the crypto library for this one.

takeda · on April 21, 2020

this is text from maual:

> Deprecated since version 3.3, will be removed in version 3.8: The behaviour of this function depends on the platform: use perf_counter() or process_time() instead, depending on your requirements, to have a well defined behaviour.

I would be wary of any crypto library that continued to work with a warning for 8 years and no one bothered to fix it. Most likely no one was maintaining it.