Hacker News new | past | comments | ask | show | jobs | submit login
The PEPs of Python 3.9 (lwn.net)
250 points by zdw on May 28, 2020 | hide | past | favorite | 185 comments



Sometimes I wish that python strings weren't directly iterable...

this article sums it up better than I ever could https://www.xanthir.com/b4wJ1

...then str.strip and variants could be cleanly and logically extended to allow this functionality, because passing a string and a sequence of strings would be distinguishable.

Alas, clean and logical function design can be hard to do late in a languages life.

PEP 593 and PEP 585 are clean and logical... glad to see that :)


I agree with most of the article you link, but there's one thing I don't understand: The article quickly dismisses the obvious fix for recursive iterability, to make strings be composed of "characters":

> And an obvious "fix" for this is worse than the original problem: Common Lisp says that strings are composed of characters, a totally different type, which doesn't implement the same methods and has to be handled specially. It's really annoying.

It seems to me this contradicts most of what the article says. Sure, strings are rarely collections, so they should not be iterable by default. But the final solution offered admits that sometimes they are, and then you want to be able to iterate over something. For most instances of something, It does not make sense for the individual "elements" to be strings. Bytes are clearly not strings, code points are clearly not strings, grapheme clusters are clearly not strings. Each of those will provide very different methods, because they are very different things. Only after that point (words, sentences, etc.) does the idea of the element being the same type start making sense again.

Clearly the concept of a "character" is too ambiguous, and there is no clear "default" for what it should mean, but the idea of a string consisting of some kind of element that is not string appears obviously correct.


> for the individual "elements" to be strings.

The basic idea is to have sum(string.foobars()) == string. Bytes and characters ('grapheme clusters') are then a specific subset of strings, that can therefore support additional operations like byte.ord(), the same way eg positive numbers support num.sqrt().


You can easily distinguish them:

    if isinstance(msg, str)
So I don't think that's a good argument for not accepting iterables of strings in str methods. Things like replace() would benefit a lot and it's not that hard to do, you can even accept regexes optionally: https://wonderful-wrappers.readthedocs.io/en/latest/string_w...

I agree that iterating on string is not proper design however. It's not very useful in practice, and the O(1) access has other performance consequences for more important things.

Swift did it right IMO, but it's a much younger language.

I also wish we stole the file api concepts from swift, and that open() would return a file like object that always gives you bytes. No "b" mode. If you want text, you to open().as_text(), and get a decoding wrapper.

The idea that there are text files and binary files has been toxic for a whole generation of coders.


The issue is that

    if isinstance(msg, str)
will clutter code that is otherwise clean. A single type has to be specially handled, which sticks out like a sore thumb.

As a second point, do you have more on your last sentence? ("The idea that there are text files and binary files has been toxic for a whole generation of coders."). I have been thoroughly confused about text vs. bytes when learning Python/programming.

The two types are treated as siblings, when text files are really a child of binary files. Binary files are simply regular files, and sit as the single parent, without parents itself, in the tree. Text files are just one of the many children, that happen to yield text when their byte patterns happen to be interpreted using the correct encoding (or, in the spirit of Python, decoding when going from bytes to text), like UTF8. This is just like, say, audio files yielding audio when interpreted with the correct encoding (say MP3).

Is this a valid way of seeing it? I have to ask very carefully because I have never seen it explained this way, so that is just what I put together as a mental model over time. In opposition to that model, resources like books always treat binary and text files as polar opposites/siblings.

This leads me to the initial question of whether you know of resources that would support the above model (assuming it is correct)?


That sounds completely like a correct way to look at it. I'd put "stream of bytes" and "seekable stream of bytes" above files, but that's just nitpicking.

For me the toxic idea about text files is that they're a thing at all. They're just binary files containing encoded text, without any encoding marker making them an ideal trap. Is a utf16 file a text file? Is a shift-jis file a text file? Have fun guessing edge cases. We've already accepted with unicode that the "text" or letters are something separate from the encoding.


Totally agree that everything should be a byte stream. Even with Python 3.x text files are still confusing - if you open a UTF-8 file with a BOM in the front as a text file - should that BOM be part of the file contents, or transparently removed? By default, Python treats it as actual content, which can screw all sorts of things up. In my ideal world, every file is a binary file, and that if you want it to be a text file - just open it with whatever encoding scheme you think appropriate (typically UTF-8).

If you don't know the Encoding? Just write a quick detect_bom function (should be part of the standard library, no idea why it isn't) and then open it with that encoding. I.E.:

   encoding = detect_bom(fn)
   with open (fn, 'r', encoding = encoding) as f:
      ...
That also has the benefit of removing the BOM from your file.

Ultimately, putting the responsibility for determining the CODEC on the user at least makes it clear to them what they are doing -opening a binary file and decoding it. That mental model prepares them for the first time they run into, say, a cp587 file.

I understand why Python doesn't do this - it adds a bit of complexity - though you could have an "auto-detect" encoding scheme that tried to determine the encoding schemes, and defaults to UTF-8 - not perfect, as you can't absolutely determine the CODEC of a file by reading it - but better than what we have today - where your code crashes when you have a BOM that upsets UTF-8 decoder.

I finally wrote a library function to guess codecs and read text files, inspired by https://stackoverflow.com/a/24370596/1637450 and haven't been tripped up since.

But Python does not make it easy to open "text" files - and I know data engineers who've been doing this for years who are still tripped up.


Chardet, written by mozilla, already detect encoding if you need such thing.


The open() API is inherited from the C way, where the world is divided between text files and binary files. So you open a file in "text" mode, and "binary" mode, "text" being the default behavior.

This is, of course, utterly BS.

All files are binary files.

Some contains sound data, some image data, some zip data, some pdf data, and some raw encoded text data.

But we don't have a "jpg" mode for open(). We do have higher API we pass file objects to in order to decode their content as jpg, which is what we should be doing to text. Text is not an exceptional case.

VSCode does a lot of work to turn those bytes into pretty words, just like VLC into videos. They are not like that in the file. It's all a representation for human consumption.

The reasoning for this confusing API is that reading text from a file is a common use, which is true. Espacially on Unix, from which C is from. But using a "mode" is the wrong abstraction to offer it.

If fact, Python 3 does it partially right. It has a io.FileIO object that just take care of opening the stuff, and a io.BufferedReader that wraps FileIO to offer practical methods to access its content.

This what what open(mode="b") returns.

If you do open(mode="t"), which is the default, it wraps the BufferedReader into a TextStream that does the decoding part transparently for you, and returns that.

There is an great explanation of this by the always excellent David Beazley: http://www.dabeaz.com/python3io_2010/MasteringIO.pdf

What it should do is offering something this:

    with open('text.txt').as_text():
open() would always return BufferedReadfer, as_text() would always return TextStream.

This completly separates I/O from decoding, removing confusion in the mind of all those coders that would otherwise live by the illusionary binary/text model. It also makes the API much less error prone: you can easily see where to the file related arguments go (in open()) and where to text related arguments go (in as_text()).

You can keep the mode, but only for "read", "write" and "append", removing the weird mix with "text" and "bytes" which are really related to a different set of operations.


Let’s be clear here that the fault is not with Python but with Windows.

Python uses text mode by default to avoid surprising beginners on Windows. If you only use Unix-like OSs you will never have this problem.


The problem is not "text mode by default". The problem is that the API offers a text mode at all.

Opening a file should return an object that gives you bytes, and that's it.

This "mode" thing is idiotic, and leak a low level API that makes no sense in a high level language with a strong abstraction for text like Python.

Text should decoded from a wrapping object. See my ohter comments.


Splitting it into two parts like that would make seek() kind of funky, but I suppose it is already.


Sadly, there is no possible migration path. Because text is the default "mode".


How would this work

    with open('text.txt', 'w').as_text():


    with open('text.txt','w').as_text() as f:
       f.write("text")


it's just too weird and open-ended.

the next thing will be a bunch of "open" functions:

   with open_binary("filename") as f:
       ...


    with open_text("filename") as f:
        ...
How do I open these files in writeable mode?

    with open_text("filename").writeable() as f:
        ...
This is getting absurd.


Ultimately this 'frustration' is always caused by loose typing/inexistent data model and not by the iterability of strings itself.


If that bothers you, use sigle dispatch.


> The idea that there are text files and binary files has been toxic for a whole generation of coders.

It really is nonsense isn't it? Its like asking a low level api for opening files as a .doc, or as a pdf. Why would that be part of the file io layer?


Well, I guess it’s easy to argue that it’s so common that beginners would expect to open files as text. You can see how it would evolve that way.

Now I’m more familiar with it I’m careful to be explicit with the decoding when using text to make it super obvious what’s going on.


I suspect it’s also to do with Python’s history as a scripting language. Because of Perl’s obvious strengths in this area, any scripting language pretty much has to make it very easy to work with text files. Ruby does something similar for instance.

Even languages like Java now recognise the need to provide convenient access to text files as part of the standard API, with Files.readAllLines() in 7, Files.lines() in 8, and Files.readString() in 11.


You can make it easy to deal with text file without lying to your API users. open("foo", mode="t") could become open("foo").as_text().

Besides, Python has pathlib now, which allows you to do Path("foo").read_text() for quick and dirty text handling already.


My first mistake I made as a beginner was dumping a bunch of binary data as text. Something would happen in the way and not the whole data would be written because I was writing it in text mode.

It just never appeared to me that the default mode of writing the file would _not_ write the array I was passing it.

It’s much more important for beginners to be able to learn clear recipes rather than having double standards with a bunch of edge cases.


I’ve done worse. Using MySQL from php and not having the encoding right somewhere along the way so all my content was being mojibaked on the way in and un-mojibaked on the way out so I didn’t notice it until deep into a project when I needed to extract it to another system.

EDIT thanks, I knew that didn't look quite right. "Mojibaked" - such a great term.


The term is actually "Mojibake", not "emoji baked". https://en.wikipedia.org/wiki/Mojibake#Etymology


More to the point, it's so common that it ought to be supported out of the box by any decent programming language, the same way you'd expect any language to support IEEE floats. That doesn't mean the mechanism for it shouldn't be (effectively) textfile(file("foo.txt")), though.


Strings being it stable can cause problems, and another commenter has pointed out that Swift handles it well.

However I think strings being iterable is one of the core ergonomics in the language and basic types of Python that make it so nice for many applications. Scripting, scraping, data cleanup, data science, even basic web development, all benefit hugely from little features like this. Without this sort of thing Python would be a different language with different uses.

While I normally like safety and types, I’m personally happy with things like this because it fits with Pythons strengths.


I disagree, I don’t think there’s any meaningful benefit. For example, lets say we iterated over strings as follows.

for char in my_str.chars(): foo()

That wouldn’t sacrifice any ergonomics, being consistent with how we already iterate over dictionary contents with d.items(), and it’d address all the concerns in the parent comment link


Didn't Haskell make this move? But I can't find it now so might be misremembering.

In Haskell, String is an alias for [Char].

Maybe it's not changing String rather, encouraging other types which don't have this difficulty.


On non-iterable strings: The recursive type problem can be solved, with something like what is proposed in [0]. (I have an implementation of a fix on github linked from that thread, there's edge case fixes and PEP's scare me, but technically it's feasible).

[0]: https://mail.python.org/archives/list/typing-sig@python.org/...


While that may solve the recursive type problem, it doesn't really solve the "iterating over strings is rarely what you actually want to do" problem.


why not:

    assert(not isinstance(x, str))


> Eric Fahlgren amusingly summed up the name fight this way:

> > I think name choice is easier if you write the documentation first:

> > cutprefix - Removes the specified prefix.

> > trimprefix - Removes the specified prefix.

> > stripprefix - Removes the specified prefix.

> > removeprefix - Removes the specified prefix. Duh. :)

I actually don't agree that it's so obvious, since it returns the prefix-removed string rather than modifying in-place. I think Fahlgren's argument would work better for `withoutprefix`.


Strings are immutable in Python, and all strings operations return new strings, including all string methods.

So there is no possible confusion.


If I have a string x = "this is a very long string..." and do y = x[:10], then it's a whole new string? If x is near my memory limits, and I do y = x[:-1] will it basically double my memory usage? Is that what you meant by every string is a new string?


> If I have a string x = "this is a very long string..." and do y = x[:10], then it's a whole new string?

Yes. And doing otherwise is pretty risky as the Java folks discovered, ultimately deciding to revert the optimisation of substring sharing storage rather than copying its data.

The issue is that while data-sharing substringing is essentially free, it also keeps the original string alive, so if you slurp in megabytes, slice out a few bytes you keep around and "throw away" the original string, that one is still kept alive by the substringing you perform, and you basically have a hard to diagnose memory leak due to completely implicit behaviour.

Languages which perform this sharing explicitly — and especially statically (e.g. Rust) — don't have this issue, but it's a risky move when you only have one string type.

Incidentally, Python provides for opting into that behaviour for bytes using memory views.


> it also keeps the original string alive, so if you slurp in megabytes, slice out a few bytes you keep around and "throw away" the original string, that one is still kept alive by the substringing you perform, and you basically have a hard to diagnose memory leak due to completely implicit behaviour.

You can get around that with a smarter garbage collector, though. On every mark-sweep pass (which you need for cycle detection even if you use refcounts for primary cleanup), add up the number of bytes of distinct string objects using the buffer. If it's less than the size of the buffer, you can save memory by transparently mutating each substring to use it's own buffer. If it's not, then you actually are saving memory by sharing storage, so you should probably keep doing that.


The cpython mark-and-sweep garbage collector does very little on purpose. It basically only gets involved for reference cycles. Anything else is dealt with by reference counting. This way you prevent long GC pauses.


True, but that's by no means a inherent characteristic of garbage collectors, or even garbage collectors operating on objects of a python implementation in particular.


True, but that is the current philosophy behind the garbage collector in the most used python implementation and it's unlikely to change.


> You can get around that with a smarter garbage collector, though.

That complexifies the GC (already a complex beast) as it requires strong type-specific specialisation, more bookkeeping, and adds self-inflicted edge cases. Even the JVM folks didn't bother (though mayhaps they did not because they have more than one garbage collector).


If x is near your memory limits, and you do y = x[:-1], you will get a MemoryError :)

For those situations, bytes() + memoryview() or bytearray() can be used, but then you are on your own.


Huh, I've had a wrong understanding of that for over a decade! TIL, thanks.


Hey!

https://xkcd.com/1053/

And honestly, I would be rich if I got a dollar every time a student does this:

    msg.upper()
Instead of:

    msg = msg.upper()
And then call me to say it doesn't work.


Scheme does the right thing here (by convention), that mutating procedures end with a bang: (string-upcase str) returns a new string, whereas (string-upcase! str) mutates the string in place.

The details for mutation of data in scheme go beyond that, though. Sometimes procedures are "allowed but not required to mutate their argument". Most (all?) implementations do mutate, but it is still considered bad for to do something like:

    (define a (list 1 2 3))
    (append! a (list 4))
    (display a)
As append! returns a list that is supposed to supercede the binding to a. Using a like that "is an error", as a valid implementation of append! may look like this

    (define append! append)
Which would make the earlier code snippet invalid.


IMO, this is a defect in the language: the lack of a "must_use" annotation or similar. If that annotation existed, and the .upper() method was annotated with it, the compiler could warn in that situation.


But you are free to do

  if title == user_input.upper():
That is, you convert a string to upper without binding the result to a name. You just use it in-place and discard the result, which is fine.

With compiler, you mean mypy or linters?


That's still "using" the resulting value for a comparison. CPython isn't an optimizing compiler, or it would completely remove the call to upper().

    >>> def up(v):
    ...     v.upper()
    ...
    >>> dis.dis(up)
    2           0 LOAD_FAST                0 (v)
                2 LOAD_METHOD              0 (upper)
                4 CALL_METHOD              0
                6 POP_TOP
                8 LOAD_CONST               0 (None)
                10 RETURN_VALUE

    >>> def up(v):
    ...     if v.upper() == "HelloWorld":
    ...        return True
    ...
    >>> dis.dis(up)
    2           0 LOAD_FAST                0 (v)
                2 LOAD_METHOD              0 (upper)
                4 CALL_METHOD              0
                6 LOAD_CONST               1 ('HelloWorld')
                8 COMPARE_OP               2 (==)
                10 POP_JUMP_IF_FALSE       16

    3          12 LOAD_CONST               2 (True)
                14 RETURN_VALUE
            >>   16 LOAD_CONST               0 (None)
                18 RETURN_VALUE
Notice in the first example, right after CALL_METHOD the return value on the stack is just immediately POP'd away. The parent is saying that when you run `python example.py` CPython should see that the return value is never used and emit a warning. This would only happen because `upper()` was manually marked using the suggested `must_use` annotation.


He meant that writing a line of code with only contents:

    msg.upper()
should trigger a warning as this clearly doesn't do anything.


Python is interpretted, not compiled, and completly dynamic. You cannot check much statically.

In fact, any program can replace anything on the fly, and swap your string for something similar but mutable.

It's the trade off you make when choosing it.


I agree, there’s no way to issue a warning about a bare `s.upper()` at compile time. I wonder if it would be possible at runtime?


Don't think so, Python doesn't really care if you dispose of the results of an expression. Think about the problems you'd have with ternaries.


Ternaries don't discard results that are generated, they are just special short-circuiting operators;

  x if y else z
Is effectively syntax sugar for:

  y and x or z
Nothing is discarded after evaluation, one of three arms is never evaluated, just as one of two arms of a common short-circuiting Boolean operator often (but not always) is not. That's essentially the opposite of executing and producing possible side effects and then discarding the results.


What's the problem with ternaries?


One of the two possible sub-expressions isn't used.


It's also not evaluated. There is no discarding, so there would be no problem.


What is this “compile time” you speak of?


When the Python source code is compiled into bytecode.


That byte code is then interpreted at runtime, so the meaning of s.upper() could change. What something does, when it’s parsed, is not fixed.

You can definitely catch most cases at runtime. I’ve done something like this, in an library, to catch a case where people were treating the copy of data as a mutable view.

    interface[address][slice] = new_values # fancy noop
Where a read, modify, write was required:

    byte_values = interface[address]
    byte_values[slice] = new_values
    interface[address] = byte_values
It would log/raise a useful error if the there was no assignment/passing of the return value.


> Python is interpretted, not compiled, and completly dynamic. You cannot check much statically.

The existence of mypy and other static type checkers for Python disproves that; given their existence, warning of an expression producing a type other than “any” or strictly “None” was used in a position where it would neither be passed to another function or assigned to a variable that is used later should be possible. Heck, you could be stricter and only allow strictly “None” in that position.


so what are these annoying pyc files about?


> And honestly, I would be rich if I got a dollar every time a student does this:

> msg.upper()

> Instead of:

> msg = msg.upper()

> And then call me to say it doesn't work.

On this, isn't the student's reasoning sensible? E.g. "If msg is a String object that represents my string, then calling .upper() on it will change (mutate) the value, because I'm calling it on itself"?

If the syntax was upper(msg) or to a lesser extent String.upper(msg) then the new-to-programming me would have understood more clearly that msg was not going to change. Have you any insights into what your students are thinking?


> String.upper(msg)

That was the original syntax [0], before the string functions became methods. I agree that a method more strongly implies mutation than a function does.

Also, for consistency with list methods like `reverse` (which acts in place) and `reversed` (which makes a copy), shouldn’t the method be called `uppered`?!

[0] https://docs.python.org/2/library/string.html#deprecated-str...


'uppercased'


Ah, of course.

Also, it looks like that’s the name that Swift uses.


A student don't know anything about mutability, and since Python signatures are not explicit, there is no way to know they have to do that.

It's just something to be told. A design decision, like there are thousands to learn in IT, that you just can't guess.


Yes. Although you can use `islice` from itertools to get around this problem, when a problem.


Slicing in Python always create a new object. You can test it with a list of integers..


My favorite example of something similar to this, since you brought it up:

  >>> a = [254, 255, 256, 257, 258]
  >>> b = [254, 255, 256, 257, 258]
  >>> for i in range(5): print(a[i] is b[i])
  ...
  True
  True
  True
  False
  False
In Python, integers in the range [-5, 256] are statically constructed in the interpreter and refer to fixed instances of objects. All other integers are created dynamically and refer to a new object each time they are created.


Leaky abstractions at its finest.


I mean, people should be using `==` for this. The fact that `is` happens to work for small numbers is an implementation detail that shouldn't be relied upon.


Absolutely. But because it does work they might start using it without knowing it's wrong, then be surprised when it doesn't work. Python has other areas where the common English definition of a word leads to misunderstandings about how they're to be used.


Though "when" the object is created isn't always so straightforward:

  >>> x = 257
  >>> y = 257
  >>> x is y
  False
  >>> def f():
  ...     x = 257
  ...     y = 257
  ...     return x is y
  ...
  >>> f()
  True
The lesson being that `is` is essentially meaningless for immutable objects, and to always use `==`.


> `is` is essentially meaningless for immutable objects

OTOH it’s recommended to use `is` rather than `==` when comparing to “singletons like `None`”.

https://www.python.org/dev/peps/pep-0008/#programming-recomm...


What is the rationale behind this? '==' works all the time, and 'is' only works sometimes. Using 'is' wherever possible requires the user to know some rather arbitrary language details (which objects are singletons and which are not), wheras '==' will always give the correct answer regardless.


Classes can overload `==`:

    class C:
        def __eq__(self, other):
            return True

    print(C() == None)  # True
    print(C() is None)  # False


> Slicing in Python always create a new object.

It always creates a new object but it doesn't necessarily copy the contents (even shallowly).

For instance slicing a `memoryview` creates a subview which shares storage with its parent.


It'll always create a new object but my understanding is that at least in numpy the new and old object will share memory. Am I wrong there too?


Correct. In Numpy the slices are views on the underlying memory. That’s why they’re so fast, there’s no copying involved. Incidentally that’s also why freeing up the original variable doesn’t release the memory (the slices are still using it).


CPython is pretty terrible. Numpy has the concept of views, cpython doesn’t do anything sophisticated.


> Slicing in Python always create a new object.

Since you said "always", I'll be "that guy"...

For the builtin types, yeah. But not in general; user code can do whatever it wants:

    class MyType:
      def __getitem__(self, *args):
        return self
    my_obj = MyType()
    my_other_obj = my_obj[3:7]
    assert my_other_obj is my_obj


Yes.


Python strings are immutable, so in-place modification would violate lots of rules and conventions.


I would have preferred 'stripprefix' for unity with 'strip', 'rstrip', and 'lstrip'


That's discussed in the article: the "strip" methods don't interpret strings of multiple characters as a single prefix or suffix to be removed, so it was felt to be too confusing to use "strip" type names for methods that do interpret strings that way.


They work differently. Strip removes all given characters from the end in any order. Trim sounds better to me, like one operation.


Ugh, I do like withoutprefix a lot better. That makes it obvious it returns something new and that there’s no reason to raise ever.


> Another kind of clean up comes in PEP 585 ("Type Hinting Generics In Standard Collections"). It will allow the removal of a parallel set of type aliases maintained in the typing module in order to support generic types. For example, the typing.List type will no longer be needed to support annotations like "dict[str, list[int]]" (i.e.. a dictionary with string keys and values that are lists of integers).

I think this will go a long way toward making type annotations feel less like a tacked-on feature.


Looking "back" now, it never occurred to me that importing List when there is list is particularly strange. Now it sticks out sorely. Very glad this change is happening.


That's because we're conditioned to think of constructors as functions rather than as types. I think that's not that odd honestly but I do see how counterintuitive it is for people that don't work much in typed languages. I'm not a Haskellite but there you can clearly see the distinction when defining/instantiating sum types (where the type and data constructor live in different namespaces).


There have been so many improvements in the typing area, I wish they'd waited a while to get it right.


I think getting them out there has already helped the ecosystem, both in terms of using the types to make working in python better, and in terms of figuring out what the typing system should really look like. This is the next iteration, and I think it's going in the right direction, so I don't really want to criticize the devs for it.


I'm extremely excited about this. I had been using a short-hand literal-based syntax like [int] for a while, but list[int] is obviously so much better.


> I had been using a short-hand literal-based syntax like [int] for a while,

Do you mean you'd been using that in comments?

Just to be clear, this isn't about ad-hoc syntaxes for use in comments, this is about syntax that parses when used in python code, and which can be used by type-checkers.


technically, you can use any valid python expression in annotations.

  [int]
  {str: int}
  {'x': float, 'x': float}
are all valid python syntax, so e.g. these are all valid:

  def f(xs: [int]) -> {str: int}:
    ...

  def g(x: 1+2+3, y: what('ever')) -> foo / bar:
    ...
it's just that tools like mypy won't be able to use that, because they expect a class or a `typing` type.

i'm pretty sure you could even write a mypy plugin that'd interpret `[int]` into `list[int]` etc


Thank you. I didn't know that the python grammar was so permissive regarding what goes in the annotation slots. I did wonder whether I was saying something wrong / sticking my neck out because I had a feeling that the person I was replying to knew what they were talking about.

In practice though we should probably all write annotations that do work with an existing type checker. False negatives are bad enough in mypy without people writing annotations for non-existent type checkers! (IMO --check-untyped-defs should always be used; mypy is misleading without it.)


> In practice though we should probably all write annotations that do work with an existing type checker.

oh, sure! "freeform annotations" that break mypy in a published library should be a punishable offense ;)


Am I the only one who wants multi-lined anonymous functions in Python? I find myself really wanting to reach for arrow functions sometimes while writing Python, and end up disappointed that they aren't available.


> Am I the only one who wants multi-lined anonymous functions in Python?

Lots of people want them (lots of people don't, too), but no one has come up with a great syntax that plays nice with the rest of Python and saves you much over named functions.


Hey Pythonistas

    x('.b', onclick=func ev:
        print("clicked")
    end)
You're welcome.


`end`? In my Python?!


It's more likely than you think. You need some end delimiter, it doesn't matter what it is as long as there is one. Just like list literals end with ] and dict literals end with } and str literals end with " or ' etc.


> it doesn't matter what it is as long as there is one

of course it matters, there are design decisions in languages. a certain pattern or syntax may feel just right in one language but very much wrong in another.

Throwing in an `end)` like that feels wrong in python

https://www.python.org/dev/peps/pep-0020/


It doesn't matter functionally. You can make the end token "waffleiron" or "mariahcarey" or "%^*~$". But there does need to be an end token to make multiline lambdas work and recognizing that is the first step to solving the problem.

Once you have the scaffolding of the syntax you can turn it over to the bikeshedders on the mailing list to make it pretty.


I think the basic idea is "if you need multiple lines,you should declare a proper fuction", so I wouldn't stand on one foot until multiline anonymous functions in python.


You can do it if you really want to in 3.8 :p :

  def begin(*args):
    return args[-1]

  begin(
      func := lambda x, y: begin(
          z := int(input()),
          x + y + z
      ),
      func(1, 2)
  )


What's wrong with using a nested named functions instead?

You may already be aware, but not everyone is: they capture variables from outer scopes in exactly the same way that lambdas do.


It's wrong because naming is hard. When writing inline, it is possible that not having a name does not impact readability. When defining the function out of line, naming it casually may confuse readers.


When we use lambdas, don't we usual end up assigning them to variables? In those cases, your point about naming will still hold!


Lambdas are assigned to function arguments/parameters. Pretty sure assigning a lambda to a variable is an anti-pattern.


It's just more pleasant to be able to write anonymous functions and to be able to extend them past a single line. If I had to name all of my anonymous functions that I write in other languages, I'd find another way to accomplish what I was trying to do with them.


I believe Guido was against it as he is mostly opposed to the functional style, as a matter of fact he was opposed to lambas but begrudgingly added then after many requests.


Guido’s original plan for Python 3 included removing lambda (along with reduce, filter and map) from the language.

https://www.artima.com/weblogs/viewpost.jsp?thread=98196


"Guido was against multi line lambdas" is always brought up in these discussions and then someone digs up an old email from 10 years ago where it happens to be mentioned in passing by.

Given the recent success of anonymous functions and it's widespread use and impact in other languages, like Javascript, C#, Java, C++11, maybe it's time to re-evaluate that opinion. I mean, what would javascript be today without promises and multi-line anonymous functions?


I dug up that “old email from 10 years ago” which I hadn’t seen before.

Guido laid out the challenges of multiline lambdas on a mailing list [1] and then followed up with a blog post [2] [3]. His chain of thought is worth reading in full, but the crux is lack of “Pythonicity” and his gut feel that named functions avoid the complexity and possible ambiguity of multiline lambdas:

    def callback(x, y):
        print x
        print y
    a = foo(callback)
[1]: https://mail.python.org/pipermail/python-dev/2006-February/0... "[Python-Dev] Let's just keep lambda"

[2]: https://www.artima.com/weblogs/viewpost.jsp?thread=147358 "Language Design Is Not Just Solving Puzzles"

[3]: https://news.ycombinator.com/item?id=20672739 "Language Design Is Not Just Solving Puzzles (2006) | Hacker News"


Thanks for putting this together.


Now that Guido is no longer BDFL nor on the Steering Committee, it doesn’t really matter what he thinks.

But, it’s still unlikely that we’ll see full anonymous functions in Python. They just don’t work with its significant-indentation syntax.


No PEP 554 (subinterpreters). That's been moved to 3.10: https://www.python.org/dev/peps/pep-0554/


Given how heated was the debate about those, it's good we didn't try to go too fast with it.

I'm full of hopes for this feature, but it's going to be a slow hard work, and we'll only rip the benefit on the long run. So no use to rush it.

I feel like we rushed asyncio and type hints, and it took years to make them usable after they were introduced.


I welcome the terser type hints for generics. I was wishing for something terser, like:

    {str: [int]}
being equivalent to what is currently dict[str, list[int]] in the PEP, but I guess it will have to do.


`{key: val}` does look nice indeed, but then it takes more effort to replace e.g. `dict` with `Mapping`. Everything would look much nicer with Haskell-like syntax: `dict key val` or `list val`. Or maybe even prefer `{} key val` and `[] val` (no that doesn't look good, I agree).


This was glorious with Clojure's Schema. I miss it every day I work with inferior languages and data specification systems.


I was quite surprised — only a few days ago in fact — to discover the standard Python library has no support for Olson (as in tzdata) timezones. Time arithmetic is impossible without them.

The ipaddress library also has no support for calculating subnets. It is quite hard to go from 2a00:aaaa:bbbb::/48 to 2a00:aaaa:bbbb:cccc::/64. It would be less weird if the essence of the documentation didn’t make it sound like the library was otherwise very thorough in the coverage of its implementation.

Can anyone write a PEP? Maybe I should get off my behind and actually submit a patch for proper IP calculations? Or maybe I missed it in the documentation (which, aside, I wish wasn’t written with such GNU-info style formality.)


Unless I misunderstand what you're looking for, I think that functionality is in there.

    original_net_48 = ip_network("2a00:aaaa:bbbb::/48")
    desired_subnet = ip_network('2a00:aaaa:bbbb:cccc::/64')
    subnets_64 = original_net_48.subnets(16)
    print(f"{desired_subnet} is one of the computed subnets: {desired_subnet in subnets_64}")
    #=> 2a00:aaaa:bbbb:cccc::/64 is one of the computed subnets: True


Thanks, but your second line kind of has the answer in it already. It’s more like...

  site = ip_network(‘2a00:aaaa:bbbb::/48’)
  subnet = f(site, 64, 0xcccc)
...and I don’t think f() is in the standard library. But maybe I just index the calculated subnets, from your example? I’ll give it a go!

Edit: yes! But it’s a bit slow...

https://repl.it/repls/PoshPapayawhipNaturaldocs#main.py


Well, you can speed it up slightly by using the iterator directly instead of forcing the whole generator into a list.

But on the other hand, you can make it WAY faster by doing it with bit manipulation instead like you're writing C:

https://repl.it/repls/ClearcutElaborateEmbeds


Oh, yeah, you're right. That's a shame — that function does exactly what you want, but it has to do it for every possible subnet up to the one you want, and the logic isn't included as a separate function.


The ipaddress library doesn’t do link-local IPv6 addresses either.


Yes, I feel like PEP 615 for timezones is about 20 years late.


Could someone explain to me what kind of new language features the new parser will allow? I'm curious and very incompetent when it comes to understanding what LL(1) grammar would imply for the end-user (the python programmer like me).


The linked LWN article[1] mentions context-sensitive keywords, ie. a way to treat certain words as language keywords only in specific contexts. For example, a new match statement that wouldn't require reserving the `match` word as a language keyword, which would require a breaking change and break all existing code that uses `match` as a variable name.

Such a feature requires support from the parser.

[1]: https://lwn.net/Articles/816922/


One good example (for those who do not want to read the full article) is the async keyword. Introducing it as keyword broke a few libraries which were already using them as kwarg in some functions (e.g. pytorch).


I wonder if Python will go back and address other shortcoming which I assume are tied to the parser, such as the inability to use quotes inside the interpolated segments.


Extensive explanation from Python creator and PEG parser implementor Guido van Rossum himself can be found in this video:

https://youtu.be/QppWTvh7_sI

It's also just a fun video on language parsers in general.


I used a PEG parser for a language I designed because I was attracted to the linear time parsing achievable using a packrat parser.

However, I also found that parsing is rarely a performance bottleneck so that wasn't a big plus.

PEGs are however are easier to reason about, no ambiguities, so that was a good enough reason.


> PEGs are however easier to reason about

Yes and no.

PEG parsers are definitely easier to implement than any of the LR(k)-and-ilk parsers. So if you're writing or debugging a PEG parser, that will be easier.

However, while shift-reduce conflicts are confusing, they are there to give the strong guarantee that the grammar is unambiguous. And the parser generator will tell you this as soon as you've defined the grammar, before you've even used it. PEG grammars instead remove the guarantee, and let you deal with any confusion that arises much later.

Here are some methods of reasoning that standard parsers will give you that PEG parers will not:

1. A ::= B | C is exactly the same as A ::= C | B.

2. If A ::= B | C and you have a program containing a fragment that parses as an "A" because it matched "B", then you can replace that fragment with something that matches "C" and it the program will still parse.

Neither of these rules hold in PEGs.

Here's a practical concern that (2) helps with. Say you have a grammar for html, and a grammar for js. And you want to be able to parse html with embedded JS. So you stick the js grammar into the html grammar at the right places. If you're using a standard (e.g. LR(k)) parser, and you don't get any shift-reduce (or other) conflicts, then the combined grammar works. In contrast, if you're using a PEG grammar, it's possible that you've ordered things wrong and there are valid JS programs that will never parse because they're clobbered by html parsing rules outside of them. Or vice-versa.

Also, realistically if you're using a PEG parser you'll want one that handles left recursion, because working without left recursion turns your grammar into a mess. And left recursion in PEGs can have some weird behavior.


> PEGs are however are easier to reason about, no ambiguities, so that was good enough reason.

I frequently hear this mentioned ("PEGs don't have ambiguity"). It is literally true, but I don't think it's true in the sense that actually matters.

I've blogged about this in the past (https://blog.reverberate.org/2013/09/ll-and-lr-in-context-wh...), but I'm not the only person saying this:

> PEG is not unambiguous in any helpful sense of that word. BNF allows you to specify ambiguous grammars, and that feature is tied to its power and flexibility and often useful in itself. PEG will only deliver one of those parses. But without an easy way of knowing which parse, the underlying ambiguity is not addressed -- it is just ignored.

https://jeffreykegler.github.io/Ocean-of-Awareness-blog/indi...


https://pyfound.blogspot.com/2020/04/replacing-cpythons-pars... gives "Parenthesized with-statements" as an example.


Is the next release still planned to be called Python 4? I seem to recall GvR saying that at one point, though I could be mistaken.



After that will be Python 3.11, and after that, Python for Workgroups 3.11.


Python 95 is gonna blow everyone's minds.


Yeah, but PythoNT will be around for a long time.


With semver, it's going to be too long to reach. I suggest we switch to the Firefox versioning scheme, it seems to already be near this goal.


the first release with Plug and Pep


Start me up!


You make a grown man cry...


Followed by inexplicably jumping to pithon.


And very importantly, would Python 4 be a new language, or compatible with Python 3? (compare Python 3 vs Python 2)


There is no Python 4 planned for now.

Python broke compat once in 25 years and gave 13 years to migrate.

It's a very conservative language.


Python breaks compatibility often on minor point releases. Only once as big as 3.0.0 but it happens regularly.

I argued on the list that these should be kept for major releases for planning reasons but they appear to be convinced it is too hard.


Can you provide an example of where Python has broken backwards compatibility recently between 3.x version? I'll admit (despite googling for 5 or so minutes) that i don't actually know if it does. It obviously breaks forward compatibility continuously all the time - new language features are landing, and they just aren't present in previous versions - but I don't know if I've ever run into people being tripped up by that.

I know some Python Libraries break backwards compatibility (Pandas being a big one) - but, for the most part, hasn't the language been backwards compatible since at least Python 3.4? (And possibly further back, for all I know).


On this page you'll see them. Check the "Porting to X.X" items under each release:

https://docs.python.org/3/whatsnew/index.html

Keep in mind they have deferred a number of them because of the impending EOL of Python 2.7. There are fewer breaking changes during the latter 3.X series, which should resume in 3.9 or 3.10 now that Python2 has passed on.

Here's a commonly mentioned one:

Changes in Python Behavior: async and await names are now reserved keywords. Code using these names as identifiers will now raise a SyntaxError. (Contributed by Jelle Zijlstra in bpo-30406.)

Note: I think this is a bad idea, I'd rather all these small breaking changes and parser be deferred to 4.X. But they need to be small breaking changes, of course, not a new language.


async keyword 3.5


If we look back in python history, the rolling breaking changes have been handled mostly fine, and the actual Python 3 caused a lot of pain in the ecosystem. So I hope they stay away from major versions and keep up the other things they are doing.


That was due to the scope of the breakage, not number format. A good way to handle that and maintain predictability is to constrain breaking changes, yet defer them to 4.X.


I'm struggling to think of any other language that has done something like this.

It might seem like a quibble, but it seems better to describe Python 3 as a different language versus Python 2. Newbies seem to get that.

(Or, alternatively, "How many Python 2 scripts will run on a Python 3 interpreter?" Answer: "None of them.")


First, there are not a lot of interpretted (not compiled, that's another matter entirely) languages that are as old as Python.

And there are really few that are even near Python popularity, or used with such diversity as Python.

I mean, you can get away with keeping AWK the way it was 2 decades ago, nobody is going to use it for for machine learning or to teach computing in all universities in the world on 3 operating systems, utilizing C extensions, or processing Web API.

Among the few that would even compare, there are the ones that have accumulated so much cruft that they became unusable from today's standard (E.G: bash). Then you have those who have done like Python (E.G: perl 6). The ones that just tried and failed (PHP 6). The ones that broke compat and told everybody move or die (Ruby in a point release, gave basically 2 years). And the ones that created a huge pile of horror they called full stack to keep going (E.G: JS). Also those that got hijacked by vendors and just exploded in miriads of proprietary syntaxes (E.G: SQL) or completely new concepts (E.G: lisp).

At least, in Python you CAN write Python 2/3 compatible code, and you have a LOT of tooling to help you with that, or migrating.

So, yes, the Python 2 -> 3 transition could have been better. Insight is 20/20.

But I'm struggling to think of any other language in a similar situation that has done better.


Hindsight 20/20, not insight.


Thanks. French here. Can't edit anymore though


Ruby did something like this around the same time Python did. Ruby's was a bit smaller, but overall a roughly similar amount of breaking changes. They forced you to think about encodings more with Strings, they changed the signatures of several operators, they changed some of the syntax for case statements, they drastically changed the scope rules for block variables, they restructured the base object hierarchy, etc. In both cases, it was a deliberate decision to make a clean break. I think Ruby's big break didn't make as big a schism mainly because Rails was very supportive, and Rails holds an enormous amount of influence in the Ruby world.

If Python 3 had been introduced as a separate language, I'm pretty sure everyone would have said "Why isn't this just called Python 3? It's 99.9% the same as Python and it's by the same people and they're deprecating Python in favor of it."


> Or, alternatively, "How many Python 2 scripts will run on a Python 3 interpreter?" Answer: "None of them."

That's obviously untrue. For example, consider the following Python 2 script:

    with open('a.txt', 'r') as a, open('b.txt', 'w') as b:
        for line in a:
            b.write(line)
It works identically on a Python 3 interpreter, and it doesn't even use "from __future__ import ...".


Not quite. If your a.txt contains utf-8 byte blocks you are screwed in Python 2.


Yes, I should have said, "a vanishingly small proportion, and even then mostly by accident".

My guess is that if you sweep GitHub for Python2 code and push it into Python3, that proportion would be under one percent.


how will that number look when you first autoconvert via 2to3?

I did two migrations of >500k loc projects in an afternoon each, and admittedly some days of testing to gain confidence since there where few unit test. But i found it to be very smooth sailing.

I was very familiar with both projects, so that helped a lot.

EDIT: I also want to add that i did this using python3.5, when to ecosystem seemed to be at a sweet spot of dependencies supporting both 2 and 3 mostly. I guess if one has been waiting until now, the divide between library versions will be a lot bigger.


As a big user of logging and little to do with character coding, all of my admin/daemon stuff moved over with almost no changes necessary for 3.0 (actually ~3.3).

For some projects I did bigger refactors for 2.6/7 (exceptions) and 3.6 (fstrings).


Fair point, but:

  int main(int c,char** v)
    {
    printf("Hello, World!\n");
    return 0;
    }
doesn't make C++ the same language as C.


> Or, alternatively, "How many Python 2 scripts will run on a Python 3 interpreter?" Answer: "None of them."

That is completely false. So many libraries have support both for 2 and 3. That's code that run just as well under both interpreter.


Really? The same .py file runs under python2 and python3?

Googling quickly, I find this, which does a bit better than 2to3. I suppose one could write to a somewhat constrained intersection of Python2 and Python3, if one is willing to make at least some boilerplate changes to the original Python2 code.

https://python-future.org/overview.html

That said, if you bring a Python2 script and feed it to a Python3 interpreter, no, in general that will not work. They simply aren't the same language. Even a simple "print x" will do you in.


> The same .py file runs under python2 and python3?

Sure, as long as it doesn't contain any syntax or spellings which are incompatible between the two. That's a fairly large subset of the language.

> if you bring a Python2 script and feed it to a Python3 interpreter, no, in general that will not work. They simply aren't the same language. Even a simple "print x" will do you in.

But this will work:

    from __future__ import print_function
    print(x)
This is valid under both Python 2 and Python 3.

Also, as I said above, there is a pretty large subset of the Python language that has the same syntax and spellings in both Python 2 and Python 3, and any script or module or package that only uses that subset will run just fine under both interpreters. You are drastically underestimating both the size and the usage of this subset of the language.


Say Django 1.11, a massive amount of .py files, works completely fine under both 2 and 3. As do many other libraries.

Yes you often need some precautions like "from __future__ import" statements and sometimes libraries like `six`, but it's been perfectly normal practice for most of the last decade.


No, but it is possible to write Python 2 code than runs in Python 3.

In fact, the vast majority of popular libs had a 2/3 compatible code base for a few years.

The hard part was not the syntax in fact. It's pretty trivial: the language are not that different.

The hard part is the I/O stack, because the stdlib is very different, espacially for this part.


A lot of projects write in that style e.g. compatible with both python2 and python3, it's really common because there's so much py2 deployed (was default on centOS until very recently, still default on osx, etc.)

Nearly every py3 feature was backported to 2 you just need to write it in a compatible way. I'm seeing some drop py2 support now though. Which I'm fine with, I haven't written python2 code in maybe 6 or 7 years now.


PHP which tried to address unicode in version 6, but then abandoned it and went straight to 7. Perl, which amusingly also at version 6 decided on a huge re-write, but then just decided to rename the version as an actual new language "Raku".


Re Raku, in hindsight, this was the right call. Give a new language a new name, and the story stays simple.


Yep. Also people are not shamed and ridiculed online because they did not upgraded to new language.


It if was such a righ call, why did it die ?


Why do you think it is dead?


> Or, alternatively, "How many Python 2 scripts will run on a Python 3 interpreter?" Answer: "None of them."

That is not true though, a lot of 2 scripts will run no problem, especially if import future is used


Swift has had 3 backwards incompatible versions of larger scope in less time.


> I'm struggling to think of any other language that has done something like this.

Perl 6 (now Raku), of course, but they at least had the decency to admit it.


They decided they didn't like the backwards-incompatible changes they made in 3, so Python 4 will go back to how things were in 2.


> Eventually, removeprefix() and removesuffix() seemed to gain the upper hand, which is what Sweeney eventually switched to.

Great naming...missed their chance to make the functionality of strip/lstrip/rstrip more clear by name the new methods stripword/lstripword/rstripword. Which would also had the benefit of consistence.


stripwords could (/would) imply that it acts on words. As in, whitespace separated things.


re.sub?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: