...then str.strip and variants could be cleanly and logically extended to allow this functionality, because passing a string and a sequence of strings would be distinguishable.
Alas, clean and logical function design can be hard to do late in a languages life.
PEP 593 and PEP 585 are clean and logical... glad to see that :)
I agree with most of the article you link, but there's one thing I don't understand: The article quickly dismisses the obvious fix for recursive iterability, to make strings be composed of "characters":
> And an obvious "fix" for this is worse than the original problem: Common Lisp says that strings are composed of characters, a totally different type, which doesn't implement the same methods and has to be handled specially. It's really annoying.
It seems to me this contradicts most of what the article says. Sure, strings are rarely collections, so they should not be iterable by default. But the final solution offered admits that sometimes they are, and then you want to be able to iterate over something. For most instances of something, It does not make sense for the individual "elements" to be strings. Bytes are clearly not strings, code points are clearly not strings, grapheme clusters are clearly not strings. Each of those will provide very different methods, because they are very different things. Only after that point (words, sentences, etc.) does the idea of the element being the same type start making sense again.
Clearly the concept of a "character" is too ambiguous, and there is no clear "default" for what it should mean, but the idea of a string consisting of some kind of element that is not string appears obviously correct.
The basic idea is to have sum(string.foobars()) == string. Bytes and characters ('grapheme clusters') are then a specific subset of strings, that can therefore support additional operations like byte.ord(), the same way eg positive numbers support num.sqrt().
So I don't think that's a good argument for not accepting iterables of strings in str methods. Things like replace() would benefit a lot and it's not that hard to do, you can even accept regexes optionally: https://wonderful-wrappers.readthedocs.io/en/latest/string_w...
I agree that iterating on string is not proper design however. It's not very useful in practice, and the O(1) access has other performance consequences for more important things.
Swift did it right IMO, but it's a much younger language.
I also wish we stole the file api concepts from swift, and that open() would return a file like object that always gives you bytes. No "b" mode. If you want text, you to open().as_text(), and get a decoding wrapper.
The idea that there are text files and binary files has been toxic for a whole generation of coders.
will clutter code that is otherwise clean. A single type has to be specially handled, which sticks out like a sore thumb.
As a second point, do you have more on your last sentence? ("The idea that there are text files and binary files has been toxic for a whole generation of coders.").
I have been thoroughly confused about text vs. bytes when learning Python/programming.
The two types are treated as siblings, when text files are really a child of binary files. Binary files are simply regular files, and sit as the single parent, without parents itself, in the tree. Text files are just one of the many children, that happen to yield text when their byte patterns happen to be interpreted using the correct encoding (or, in the spirit of Python, decoding when going from bytes to text), like UTF8. This is just like, say, audio files yielding audio when interpreted with the correct encoding (say MP3).
Is this a valid way of seeing it? I have to ask very carefully because I have never seen it explained this way, so that is just what I put together as a mental model over time. In opposition to that model, resources like books always treat binary and text files as polar opposites/siblings.
This leads me to the initial question of whether you know of resources that would support the above model (assuming it is correct)?
That sounds completely like a correct way to look at it. I'd put "stream of bytes" and "seekable stream of bytes" above files, but that's just nitpicking.
For me the toxic idea about text files is that they're a thing at all. They're just binary files containing encoded text, without any encoding marker making them an ideal trap. Is a utf16 file a text file? Is a shift-jis file a text file? Have fun guessing edge cases. We've already accepted with unicode that the "text" or letters are something separate from the encoding.
Totally agree that everything should be a byte stream. Even with Python 3.x text files are still confusing - if you open a UTF-8 file with a BOM in the front as a text file - should that BOM be part of the file contents, or transparently removed? By default, Python treats it as actual content, which can screw all sorts of things up. In my ideal world, every file is a binary file, and that if you want it to be a text file - just open it with whatever encoding scheme you think appropriate (typically UTF-8).
If you don't know the Encoding? Just write a quick detect_bom function (should be part of the standard library, no idea why it isn't) and then open it with that encoding. I.E.:
encoding = detect_bom(fn)
with open (fn, 'r', encoding = encoding) as f:
...
That also has the benefit of removing the BOM from your file.
Ultimately, putting the responsibility for determining the CODEC on the user at least makes it clear to them what they are doing -opening a binary file and decoding it. That mental model prepares them for the first time they run into, say, a cp587 file.
I understand why Python doesn't do this - it adds a bit of complexity - though you could have an "auto-detect" encoding scheme that tried to determine the encoding schemes, and defaults to UTF-8 - not perfect, as you can't absolutely determine the CODEC of a file by reading it - but better than what we have today - where your code crashes when you have a BOM that upsets UTF-8 decoder.
The open() API is inherited from the C way, where the world is divided between text files and binary files. So you open a file in "text" mode, and "binary" mode, "text" being the default behavior.
This is, of course, utterly BS.
All files are binary files.
Some contains sound data, some image data, some zip data, some pdf data, and some raw encoded text data.
But we don't have a "jpg" mode for open(). We do have higher API we pass file objects to in order to decode their content as jpg, which is what we should be doing to text. Text is not an exceptional case.
VSCode does a lot of work to turn those bytes into pretty words, just like VLC into videos. They are not like that in the file. It's all a representation for human consumption.
The reasoning for this confusing API is that reading text from a file is a common use, which is true. Espacially on Unix, from which C is from. But using a "mode" is the wrong abstraction to offer it.
If fact, Python 3 does it partially right. It has a io.FileIO object that just take care of opening the stuff, and a io.BufferedReader that wraps FileIO to offer practical methods to access its content.
This what what open(mode="b") returns.
If you do open(mode="t"), which is the default, it wraps the BufferedReader into a TextStream that does the decoding part transparently for you, and returns that.
open() would always return BufferedReadfer, as_text() would always return TextStream.
This completly separates I/O from decoding, removing confusion in the mind of all those coders that would otherwise live by the illusionary binary/text model. It also makes the API much less error prone: you can easily see where to the file related arguments go (in open()) and where to text related arguments go (in as_text()).
You can keep the mode, but only for "read", "write" and "append", removing the weird mix with "text" and "bytes" which are really related to a different set of operations.
I suspect it’s also to do with Python’s history as a scripting language. Because of Perl’s obvious strengths in this area, any scripting language pretty much has to make it very easy to work with text files. Ruby does something similar for instance.
Even languages like Java now recognise the need to provide convenient access to text files as part of the standard API, with Files.readAllLines() in 7, Files.lines() in 8, and Files.readString() in 11.
My first mistake I made as a beginner was dumping a bunch of binary data as text. Something would happen in the way and not the whole data would be written because I was writing it in text mode.
It just never appeared to me that the default mode of writing the file would _not_ write the array I was passing it.
It’s much more important for beginners to be able to learn clear recipes rather than having double standards with a bunch of edge cases.
I’ve done worse. Using MySQL from php and not having the encoding right somewhere along the way so all my content was being mojibaked on the way in and un-mojibaked on the way out so I didn’t notice it until deep into a project when I needed to extract it to another system.
EDIT thanks, I knew that didn't look quite right. "Mojibaked" - such a great term.
More to the point, it's so common that it ought to be supported out of the box by any decent programming language, the same way you'd expect any language to support IEEE floats. That doesn't mean the mechanism for it shouldn't be (effectively) textfile(file("foo.txt")), though.
Strings being it stable can cause problems, and another commenter has pointed out that Swift handles it well.
However I think strings being iterable is one of the core ergonomics in the language and basic types of Python that make it so nice for many applications. Scripting, scraping, data cleanup, data science, even basic web development, all benefit hugely from little features like this. Without this sort of thing Python would be a different language with different uses.
While I normally like safety and types, I’m personally happy with things like this because it fits with Pythons strengths.
I disagree, I don’t think there’s any meaningful benefit. For example, lets say we iterated over strings as follows.
for char in my_str.chars():
foo()
That wouldn’t sacrifice any ergonomics, being consistent with how we already iterate over dictionary contents with d.items(), and it’d address all the concerns in the parent comment link
On non-iterable strings: The recursive type problem can be solved, with something like what is proposed in [0]. (I have an implementation of a fix on github linked from that thread, there's edge case fixes and PEP's scare me, but technically it's feasible).
> Eric Fahlgren amusingly summed up the name fight this way:
> > I think name choice is easier if you write the documentation first:
> > cutprefix - Removes the specified prefix.
> > trimprefix - Removes the specified prefix.
> > stripprefix - Removes the specified prefix.
> > removeprefix - Removes the specified prefix. Duh. :)
I actually don't agree that it's so obvious, since it returns the prefix-removed string rather than modifying in-place. I think Fahlgren's argument would work better for `withoutprefix`.
If I have a string x = "this is a very long string..." and do y = x[:10], then it's a whole new string? If x is near my memory limits, and I do y = x[:-1] will it basically double my memory usage? Is that what you meant by every string is a new string?
> If I have a string x = "this is a very long string..." and do y = x[:10], then it's a whole new string?
Yes. And doing otherwise is pretty risky as the Java folks discovered, ultimately deciding to revert the optimisation of substring sharing storage rather than copying its data.
The issue is that while data-sharing substringing is essentially free, it also keeps the original string alive, so if you slurp in megabytes, slice out a few bytes you keep around and "throw away" the original string, that one is still kept alive by the substringing you perform, and you basically have a hard to diagnose memory leak due to completely implicit behaviour.
Languages which perform this sharing explicitly — and especially statically (e.g. Rust) — don't have this issue, but it's a risky move when you only have one string type.
Incidentally, Python provides for opting into that behaviour for bytes using memory views.
> it also keeps the original string alive, so if you slurp in megabytes, slice out a few bytes you keep around and "throw away" the original string, that one is still kept alive by the substringing you perform, and you basically have a hard to diagnose memory leak due to completely implicit behaviour.
You can get around that with a smarter garbage collector, though. On every mark-sweep pass (which you need for cycle detection even if you use refcounts for primary cleanup), add up the number of bytes of distinct string objects using the buffer. If it's less than the size of the buffer, you can save memory by transparently mutating each substring to use it's own buffer. If it's not, then you actually are saving memory by sharing storage, so you should probably keep doing that.
The cpython mark-and-sweep garbage collector does very little on purpose. It basically only gets involved for reference cycles. Anything else is dealt with by reference counting. This way you prevent long GC pauses.
True, but that's by no means a inherent characteristic of garbage collectors, or even garbage collectors operating on objects of a python implementation in particular.
> You can get around that with a smarter garbage collector, though.
That complexifies the GC (already a complex beast) as it requires strong type-specific specialisation, more bookkeeping, and adds self-inflicted edge cases. Even the JVM folks didn't bother (though mayhaps they did not because they have more than one garbage collector).
Scheme does the right thing here (by convention), that mutating procedures end with a bang: (string-upcase str) returns a new string, whereas (string-upcase! str) mutates the string in place.
The details for mutation of data in scheme go beyond that, though. Sometimes procedures are "allowed but not required to mutate their argument". Most (all?) implementations do mutate, but it is still considered bad for to do something like:
(define a (list 1 2 3))
(append! a (list 4))
(display a)
As append! returns a list that is supposed to supercede the binding to a. Using a like that "is an error", as a valid implementation of append! may look like this
(define append! append)
Which would make the earlier code snippet invalid.
IMO, this is a defect in the language: the lack of a "must_use" annotation or similar. If that annotation existed, and the .upper() method was annotated with it, the compiler could warn in that situation.
Notice in the first example, right after CALL_METHOD the return value on the stack is just immediately POP'd away. The parent is saying that when you run `python example.py` CPython should see that the return value is never used and emit a warning. This would only happen because `upper()` was manually marked using the suggested `must_use` annotation.
Ternaries don't discard results that are generated, they are just special short-circuiting operators;
x if y else z
Is effectively syntax sugar for:
y and x or z
Nothing is discarded after evaluation, one of three arms is never evaluated, just as one of two arms of a common short-circuiting Boolean operator often (but not always) is not. That's essentially the opposite of executing and producing possible side effects and then discarding the results.
That byte code is then interpreted at runtime, so the meaning of s.upper() could change. What something does, when it’s parsed, is not fixed.
You can definitely catch most cases at runtime. I’ve done something like this, in an library, to catch a case where people were treating the copy of data as a mutable view.
> Python is interpretted, not compiled, and completly dynamic. You cannot check much statically.
The existence of mypy and other static type checkers for Python disproves that; given their existence, warning of an expression producing a type other than “any” or strictly “None” was used in a position where it would neither be passed to another function or assigned to a variable that is used later should be possible. Heck, you could be stricter and only allow strictly “None” in that position.
> And honestly, I would be rich if I got a dollar every time a student does this:
> msg.upper()
> Instead of:
> msg = msg.upper()
> And then call me to say it doesn't work.
On this, isn't the student's reasoning sensible? E.g. "If msg is a String object that represents my string, then calling .upper() on it will change (mutate) the value, because I'm calling it on itself"?
If the syntax was upper(msg) or to a lesser extent String.upper(msg) then the new-to-programming me would have understood more clearly that msg was not going to change. Have you any insights into what your students are thinking?
That was the original syntax [0], before the string functions became methods. I agree that a method more strongly implies mutation than a function does.
Also, for consistency with list methods like `reverse` (which acts in place) and `reversed` (which makes a copy), shouldn’t the method be called `uppered`?!
My favorite example of something similar to this, since you brought it up:
>>> a = [254, 255, 256, 257, 258]
>>> b = [254, 255, 256, 257, 258]
>>> for i in range(5): print(a[i] is b[i])
...
True
True
True
False
False
In Python, integers in the range [-5, 256] are statically constructed in the interpreter and refer to fixed instances of objects. All other integers are created dynamically and refer to a new object each time they are created.
I mean, people should be using `==` for this. The fact that `is` happens to work for small numbers is an implementation detail that shouldn't be relied upon.
Absolutely. But because it does work they might start using it without knowing it's wrong, then be surprised when it doesn't work. Python has other areas where the common English definition of a word leads to misunderstandings about how they're to be used.
What is the rationale behind this? '==' works all the time, and 'is' only works sometimes. Using 'is' wherever possible requires the user to know some rather arbitrary language details (which objects are singletons and which are not), wheras '==' will always give the correct answer regardless.
Correct. In Numpy the slices are views on the underlying memory. That’s why they’re so fast, there’s no copying involved. Incidentally that’s also why freeing up the original variable doesn’t release the memory (the slices are still using it).
That's discussed in the article: the "strip" methods don't interpret strings of multiple characters as a single prefix or suffix to be removed, so it was felt to be too confusing to use "strip" type names for methods that do interpret strings that way.
> Another kind of clean up comes in PEP 585 ("Type Hinting Generics In Standard Collections"). It will allow the removal of a parallel set of type aliases maintained in the typing module in order to support generic types. For example, the typing.List type will no longer be needed to support annotations like "dict[str, list[int]]" (i.e.. a dictionary with string keys and values that are lists of integers).
I think this will go a long way toward making type annotations feel less like a tacked-on feature.
Looking "back" now, it never occurred to me that importing List when there is list is particularly strange. Now it sticks out sorely. Very glad this change is happening.
That's because we're conditioned to think of constructors as functions rather than as types. I think that's not that odd honestly but I do see how counterintuitive it is for people that don't work much in typed languages. I'm not a Haskellite but there you can clearly see the distinction when defining/instantiating sum types (where the type and data constructor live in different namespaces).
I think getting them out there has already helped the ecosystem, both in terms of using the types to make working in python better, and in terms of figuring out what the typing system should really look like. This is the next iteration, and I think it's going in the right direction, so I don't really want to criticize the devs for it.
I'm extremely excited about this. I had been using a short-hand literal-based syntax like [int] for a while, but list[int] is obviously so much better.
> I had been using a short-hand literal-based syntax like [int] for a while,
Do you mean you'd been using that in comments?
Just to be clear, this isn't about ad-hoc syntaxes for use in comments, this is about syntax that parses when used in python code, and which can be used by type-checkers.
Thank you. I didn't know that the python grammar was so permissive regarding what goes in the annotation slots. I did wonder whether I was saying something wrong / sticking my neck out because I had a feeling that the person I was replying to knew what they were talking about.
In practice though we should probably all write annotations that do work with an existing type checker. False negatives are bad enough in mypy without people writing annotations for non-existent type checkers! (IMO --check-untyped-defs should always be used; mypy is misleading without it.)
Am I the only one who wants multi-lined anonymous functions in Python? I find myself really wanting to reach for arrow functions sometimes while writing Python, and end up disappointed that they aren't available.
> Am I the only one who wants multi-lined anonymous functions in Python?
Lots of people want them (lots of people don't, too), but no one has come up with a great syntax that plays nice with the rest of Python and saves you much over named functions.
It's more likely than you think. You need some end delimiter, it doesn't matter what it is as long as there is one. Just like list literals end with ] and dict literals end with } and str literals end with " or ' etc.
> it doesn't matter what it is as long as there is one
of course it matters, there are design decisions in languages. a certain pattern or syntax may feel just right in one language but very much wrong in another.
Throwing in an `end)` like that feels wrong in python
It doesn't matter functionally. You can make the end token "waffleiron" or "mariahcarey" or "%^*~$". But there does need to be an end token to make multiline lambdas work and recognizing that is the first step to solving the problem.
Once you have the scaffolding of the syntax you can turn it over to the bikeshedders on the mailing list to make it pretty.
I think the basic idea is "if you need multiple lines,you should declare a proper fuction", so I wouldn't stand on one foot until multiline anonymous functions in python.
It's wrong because naming is hard. When writing inline, it is possible that not having a name does not impact readability. When defining the function out of line, naming it casually may confuse readers.
It's just more pleasant to be able to write anonymous functions and to be able to extend them past a single line. If I had to name all of my anonymous functions that I write in other languages, I'd find another way to accomplish what I was trying to do with them.
I believe Guido was against it as he is mostly opposed to the functional style, as a matter of fact he was opposed to lambas but begrudgingly added then after many requests.
"Guido was against multi line lambdas" is always brought up in these discussions and then someone digs up an old email from 10 years ago where it happens to be mentioned in passing by.
Given the recent success of anonymous functions and it's widespread use and impact in other languages, like Javascript, C#, Java, C++11, maybe it's time to re-evaluate that opinion. I mean, what would javascript be today without promises and multi-line anonymous functions?
I dug up that “old email from 10 years ago” which I hadn’t seen before.
Guido laid out the challenges of multiline lambdas on a mailing list [1] and then followed up with a blog post [2] [3]. His chain of thought is worth reading in full, but the crux is lack of “Pythonicity” and his gut feel that named functions avoid the complexity and possible ambiguity of multiline lambdas:
def callback(x, y):
print x
print y
a = foo(callback)
`{key: val}` does look nice indeed, but then it takes more effort to replace e.g. `dict` with `Mapping`. Everything would look much nicer with Haskell-like syntax: `dict key val` or `list val`. Or maybe even prefer `{} key val` and `[] val` (no that doesn't look good, I agree).
I was quite surprised — only a few days ago in fact — to discover the standard Python library has no support for Olson (as in tzdata) timezones. Time arithmetic is impossible without them.
The ipaddress library also has no support for calculating subnets. It is quite hard to go from 2a00:aaaa:bbbb::/48 to 2a00:aaaa:bbbb:cccc::/64. It would be less weird if the essence of the documentation didn’t make it sound like the library was otherwise very thorough in the coverage of its implementation.
Can anyone write a PEP? Maybe I should get off my behind and actually submit a patch for proper IP calculations? Or maybe I missed it in the documentation (which, aside, I wish wasn’t written with such GNU-info style formality.)
Unless I misunderstand what you're looking for, I think that functionality is in there.
original_net_48 = ip_network("2a00:aaaa:bbbb::/48")
desired_subnet = ip_network('2a00:aaaa:bbbb:cccc::/64')
subnets_64 = original_net_48.subnets(16)
print(f"{desired_subnet} is one of the computed subnets: {desired_subnet in subnets_64}")
#=> 2a00:aaaa:bbbb:cccc::/64 is one of the computed subnets: True
Oh, yeah, you're right. That's a shame — that function does exactly what you want, but it has to do it for every possible subnet up to the one you want, and the logic isn't included as a separate function.
Could someone explain to me what kind of new language features the new parser will allow? I'm curious and very incompetent when it comes to understanding what LL(1) grammar would imply for the end-user (the python programmer like me).
The linked LWN article[1] mentions context-sensitive keywords, ie. a way to treat certain words as language keywords only in specific contexts. For example, a new match statement that wouldn't require reserving the `match` word as a language keyword, which would require a breaking change and break all existing code that uses `match` as a variable name.
One good example (for those who do not want to read the full article) is the async keyword. Introducing it as keyword broke a few libraries which were already using them as kwarg in some functions (e.g. pytorch).
I wonder if Python will go back and address other shortcoming which I assume are tied to the parser, such as the inability to use quotes inside the interpolated segments.
PEG parsers are definitely easier to implement than any of the LR(k)-and-ilk parsers. So if you're writing or debugging a PEG parser, that will be easier.
However, while shift-reduce conflicts are confusing, they are there to give the strong guarantee that the grammar is unambiguous. And the parser generator will tell you this as soon as you've defined the grammar, before you've even used it. PEG grammars instead remove the guarantee, and let you deal with any confusion that arises much later.
Here are some methods of reasoning that standard parsers will give you that PEG parers will not:
1. A ::= B | C is exactly the same as A ::= C | B.
2. If A ::= B | C and you have a program containing a fragment that parses as an "A" because it matched "B", then you can replace that fragment with something that matches "C" and it the program will still parse.
Neither of these rules hold in PEGs.
Here's a practical concern that (2) helps with. Say you have a grammar for html, and a grammar for js. And you want to be able to parse html with embedded JS. So you stick the js grammar into the html grammar at the right places. If you're using a standard (e.g. LR(k)) parser, and you don't get any shift-reduce (or other) conflicts, then the combined grammar works. In contrast, if you're using a PEG grammar, it's possible that you've ordered things wrong and there are valid JS programs that will never parse because they're clobbered by html parsing rules outside of them. Or vice-versa.
Also, realistically if you're using a PEG parser you'll want one that handles left recursion, because working without left recursion turns your grammar into a mess. And left recursion in PEGs can have some weird behavior.
> PEG is not unambiguous in any helpful sense of that word. BNF allows you to specify ambiguous grammars, and that feature is tied to its power and flexibility and often useful in itself. PEG will only deliver one of those parses. But without an easy way of knowing which parse, the underlying ambiguity is not addressed -- it is just ignored.
Can you provide an example of where Python has broken backwards compatibility recently between 3.x version? I'll admit (despite googling for 5 or so minutes) that i don't actually know if it does. It obviously breaks forward compatibility continuously all the time - new language features are landing, and they just aren't present in previous versions - but I don't know if I've ever run into people being tripped up by that.
I know some Python Libraries break backwards compatibility (Pandas being a big one) - but, for the most part, hasn't the language been backwards compatible since at least Python 3.4? (And possibly further back, for all I know).
Keep in mind they have deferred a number of them because of the impending EOL of Python 2.7. There are fewer breaking changes during the latter 3.X series, which should resume in 3.9 or 3.10 now that Python2 has passed on.
Here's a commonly mentioned one:
Changes in Python Behavior: async and await names are now reserved keywords. Code using these names as identifiers will now raise a SyntaxError. (Contributed by Jelle Zijlstra in bpo-30406.)
Note: I think this is a bad idea, I'd rather all these small breaking changes and parser be deferred to 4.X. But they need to be small breaking changes, of course, not a new language.
If we look back in python history, the rolling breaking changes have been handled mostly fine, and the actual Python 3 caused a lot of pain in the ecosystem. So I hope they stay away from major versions and keep up the other things they are doing.
That was due to the scope of the breakage, not number format. A good way to handle that and maintain predictability is to constrain breaking changes, yet defer them to 4.X.
First, there are not a lot of interpretted (not compiled, that's another matter entirely) languages that are as old as Python.
And there are really few that are even near Python popularity, or used with such diversity as Python.
I mean, you can get away with keeping AWK the way it was 2 decades ago, nobody is going to use it for for machine learning or to teach computing in all universities in the world on 3 operating systems, utilizing C extensions, or processing Web API.
Among the few that would even compare, there are the ones that have accumulated so much cruft that they became unusable from today's standard (E.G: bash). Then you have those who have done like Python (E.G: perl 6). The ones that just tried and failed (PHP 6). The ones that broke compat and told everybody move or die (Ruby in a point release, gave basically 2 years). And the ones that created a huge pile of horror they called full stack to keep going (E.G: JS). Also those that got hijacked by vendors and just exploded in miriads of proprietary syntaxes (E.G: SQL) or completely new concepts (E.G: lisp).
At least, in Python you CAN write Python 2/3 compatible code, and you have a LOT of tooling to help you with that, or migrating.
So, yes, the Python 2 -> 3 transition could have been better. Insight is 20/20.
But I'm struggling to think of any other language in a similar situation that has done better.
Ruby did something like this around the same time Python did. Ruby's was a bit smaller, but overall a roughly similar amount of breaking changes. They forced you to think about encodings more with Strings, they changed the signatures of several operators, they changed some of the syntax for case statements, they drastically changed the scope rules for block variables, they restructured the base object hierarchy, etc. In both cases, it was a deliberate decision to make a clean break. I think Ruby's big break didn't make as big a schism mainly because Rails was very supportive, and Rails holds an enormous amount of influence in the Ruby world.
If Python 3 had been introduced as a separate language, I'm pretty sure everyone would have said "Why isn't this just called Python 3? It's 99.9% the same as Python and it's by the same people and they're deprecating Python in favor of it."
how will that number look when you first autoconvert via 2to3?
I did two migrations of >500k loc projects in an afternoon each, and admittedly some days of testing to gain confidence since there where few unit test. But i found it to be very smooth sailing.
I was very familiar with both projects, so that helped a lot.
EDIT: I also want to add that i did this using python3.5, when to ecosystem seemed to be at a sweet spot of dependencies supporting both 2 and 3 mostly. I guess if one has been waiting until now, the divide between library versions will be a lot bigger.
As a big user of logging and little to do with character coding, all of my admin/daemon stuff moved over with almost no changes necessary for 3.0 (actually ~3.3).
For some projects I did bigger refactors for 2.6/7 (exceptions) and 3.6 (fstrings).
Really? The same .py file runs under python2 and python3?
Googling quickly, I find this, which does a bit better than 2to3. I suppose one could write to a somewhat constrained intersection of Python2 and Python3, if one is willing to make at least some boilerplate changes to the original Python2 code.
That said, if you bring a Python2 script and feed it to a Python3 interpreter, no, in general that will not work. They simply aren't the same language. Even a simple "print x" will do you in.
> The same .py file runs under python2 and python3?
Sure, as long as it doesn't contain any syntax or spellings which are incompatible between the two. That's a fairly large subset of the language.
> if you bring a Python2 script and feed it to a Python3 interpreter, no, in general that will not work. They simply aren't the same language. Even a simple "print x" will do you in.
But this will work:
from __future__ import print_function
print(x)
This is valid under both Python 2 and Python 3.
Also, as I said above, there is a pretty large subset of the Python language that has the same syntax and spellings in both Python 2 and Python 3, and any script or module or package that only uses that subset will run just fine under both interpreters. You are drastically underestimating both the size and the usage of this subset of the language.
Say Django 1.11, a massive amount of .py files, works completely fine under both 2 and 3. As do many other libraries.
Yes you often need some precautions like "from __future__ import" statements and sometimes libraries like `six`, but it's been perfectly normal practice for most of the last decade.
A lot of projects write in that style e.g. compatible with both python2 and python3, it's really common because there's so much py2 deployed (was default on centOS until very recently, still default on osx, etc.)
Nearly every py3 feature was backported to 2 you just need to write it in a compatible way. I'm seeing some drop py2 support now though. Which I'm fine with, I haven't written python2 code in maybe 6 or 7 years now.
PHP which tried to address unicode in version 6, but then abandoned it and went straight to 7. Perl, which amusingly also at version 6 decided on a huge re-write, but then just decided to rename the version as an actual new language "Raku".
> Eventually, removeprefix() and removesuffix() seemed to gain the upper hand, which is what Sweeney eventually switched to.
Great naming...missed their chance to make the functionality of strip/lstrip/rstrip more clear by name the new methods stripword/lstripword/rstripword. Which would also had the benefit of consistence.
this article sums it up better than I ever could https://www.xanthir.com/b4wJ1
...then str.strip and variants could be cleanly and logically extended to allow this functionality, because passing a string and a sequence of strings would be distinguishable.
Alas, clean and logical function design can be hard to do late in a languages life.
PEP 593 and PEP 585 are clean and logical... glad to see that :)