Sure it did. Here's an easy test, using your own test case with stdin: Python 2....

dataflow · on April 22, 2020

> The fact that sys.stdout encoding varies between platforms is a feature, not a bug. [...] This is exactly how every other library or framework that deals with Unicode text works; how is that "insanity"?

No, it's utterly false that every other framework does it. Where do you even get this idea? Possibly the closest language to Python is Ruby. Have you tried to see what it does? Run ruby -e "$stdout.write(\"\u2713\")" > temp.txt in the Command Prompt and then tell me you face the same nonsensical Unicode error as you do in Python (python -c "import sys; sys.stdout.write(u\"\u2713\")" > temp.txt)? The notion that writing text on one platform and reading it back on another should produce complete garbage is absolute insanity. You're literally saying that even if I write some text to a file in Windows and then read it back on Linux with the same program on the same machine from the same file system, it is somehow the right thing to do to have an inconsistent behavior and interpret it as complete garbage?? Like this means if you install Linux for your grandma and have her open a note she saved in Windows, she will actively want to read mojibake?? I mean, I guess people are weird, so maybe you or your grandma find that to be a sane state of affairs, but neither me, nor my grandma, nor my programs (...are they my babies in this analogy?) would expect to see gibberish when reading the same file with the same program...

As for "Python 3 forced developers to think whether the data that they're working with is text or binary", well, it made them think even more than they already had to, alright. That happens as a result of breaking stuff even more than it happens as a result of fixing stuff. And what I've been trying to tell you repeatedly is that this puristic distinction between "text" and "binary" is a fantasy and utterly wrong in most of the scenarios where it's actually made, and that your "well then just use bytes" argument is literally what I've been pointing out is the only solution, and it's much closer to what Python 2 was doing. This isn't even something that's somehow tricky. If you write binary files at all, you know there's absolutely no reason why you can't mix and match encodings in a single stream. You also know it's entirely reasonable to record the encoding inside the file itself. But regardless, just in case this was a foreign notion, I gave you multiple examples of this that are incredibly common—HTML, XML, stdio, text files... and you just dodged my point. I'll repeat myself: when you read text—if you can even guarantee it's text in the first place (which you absolutely cannot do everywhere Python 3 does)—it is likely to have an encoding that neither you nor the user can know a priori until after you've read it and examined its bytes. XML/HTML/BOM/you name it. You have to deal with bytes until you make that determination. The fact that you might read complete garbage if you read back the same file your own program wrote on another platform just adds insult to the injury.

But anyway. You know full well that I never suggested everything was fine in Python 2 and that everything broke in Python 3. I was extremely clear that a lot of this was already a problem, and that some stuff did in fact improve. It's the other stuff got worse and even harder to address that's the problem I've been talking about. So it's a pretty illegitimate counterargument to cherrypick some random bit about some implicit conversion that actually happened to improve. At best you'll derail the argument into a discussion about alternative approaches for solving those problems (which BTW actually do exist) and distract me. But I'm not about to waste my energy like this, so I'm going to have to leave this as my last comment.

int_19h · on April 22, 2020

Every other language and framework as in Java, C#, everything Apple, and most popular C++ UI frameworks.

Ruby is actually the odd one out with its "string is bytes + encoding" approach; and that mostly because its author is Japanese - Japan is not all sold on Unicode for some legitimate reasons. This approach also has some interesting consequences - e.g. it's possible for string concatenation to fail, because there's no unified representation for both operands.

dragonwriter · on April 22, 2020

> Possibly the closest language to Python is Ruby.

Not really; they are similar in that they are dynamic scripting languages, but philosophically and in terms of almost every implementation decision, they are pretty radically opposed.