> >>> list(b'abc')
> [97, 98, 99]
>That's not a list of numbers... that's a list of bytes!
No, it's a list of numbers:
>>> type(list(b'abc')[0])
<class 'int'>
I think the GP mis-typed his last example. First, he showed that ''.join('abc') takes a string, busts it up, then concatenates it back to a string. Then, with ''.join(b'abc'), he appears to want to bust up a byte string and concatenate it back to a text string. But I suspect he meant to type this:
>>> b''.join(b'abc')
That is, bust up a byte string and concatenate back to what you start with: a byte string. But that doesn't work, when you bust up a byte string you get a list of ints; and you cannot concatenate them back to a byte string (at least not elegantly).
Well, yes. Python chooses the decimal representation by default. So? It could be hex or octal and still be a byte.
My example was merely meant to be illustrative and not an example of real code. The byte object is simply not a str; so I don't understand where this frustration with them not being str is coming from. If you use the unicode objects in Python 3 you get the same API as before. The difference is now you can't rely on Python implicitly converting up ASCII byte strings to unicode objects and have to explicitly encode/decode from one to the other. It removes a lot of subtle encoding bugs.
Perhaps it's just that encoding is tricky for developers who never learned it in the first place when they started learning the string APIs in popular dynamic languages? I don't know. It makes a lot of sense to me since I've spent years fixing Python 2 code-bases that got unicode wrong.
You are not making useful comment because you don't understanding the use case. Python 2 is very useful in handling binary data. This complain is not about unicode. This is about binary files manipulation.
I'm thrill about the unicode support. If they only add unicode string and leave the binary string alone and just require an additional literal prefix b, it will be an easy transition. Instead the design is changed for no good reason and the code are broken too.
I have a hard time believing that the design was arbitrarily changed.
The request to add string-APIs to the bytes object have been brought up before [0]. I think the reasoning is quite clear: byte-strings are not textual objects. If you are sending textual-data to a byte-string API, build your string as a string and encode it to bytes.
For working with binary data there's a much cleaner API in Python 3 that is less prone to subtle encoding errors.
edit: I realize there is contention but I'm of the mind that .format probably isn't the right method and that if there were one it'd need it's own format control string syntax.