The open() API is inherited from the C way, where the world is divided between text files and binary files. So you open a file in "text" mode, and "binary" mode, "text" being the default behavior.
This is, of course, utterly BS.
All files are binary files.
Some contains sound data, some image data, some zip data, some pdf data, and some raw encoded text data.
But we don't have a "jpg" mode for open(). We do have higher API we pass file objects to in order to decode their content as jpg, which is what we should be doing to text. Text is not an exceptional case.
VSCode does a lot of work to turn those bytes into pretty words, just like VLC into videos. They are not like that in the file. It's all a representation for human consumption.
The reasoning for this confusing API is that reading text from a file is a common use, which is true. Espacially on Unix, from which C is from. But using a "mode" is the wrong abstraction to offer it.
If fact, Python 3 does it partially right. It has a io.FileIO object that just take care of opening the stuff, and a io.BufferedReader that wraps FileIO to offer practical methods to access its content.
This what what open(mode="b") returns.
If you do open(mode="t"), which is the default, it wraps the BufferedReader into a TextStream that does the decoding part transparently for you, and returns that.
open() would always return BufferedReadfer, as_text() would always return TextStream.
This completly separates I/O from decoding, removing confusion in the mind of all those coders that would otherwise live by the illusionary binary/text model. It also makes the API much less error prone: you can easily see where to the file related arguments go (in open()) and where to text related arguments go (in as_text()).
You can keep the mode, but only for "read", "write" and "append", removing the weird mix with "text" and "bytes" which are really related to a different set of operations.
This is, of course, utterly BS.
All files are binary files.
Some contains sound data, some image data, some zip data, some pdf data, and some raw encoded text data.
But we don't have a "jpg" mode for open(). We do have higher API we pass file objects to in order to decode their content as jpg, which is what we should be doing to text. Text is not an exceptional case.
VSCode does a lot of work to turn those bytes into pretty words, just like VLC into videos. They are not like that in the file. It's all a representation for human consumption.
The reasoning for this confusing API is that reading text from a file is a common use, which is true. Espacially on Unix, from which C is from. But using a "mode" is the wrong abstraction to offer it.
If fact, Python 3 does it partially right. It has a io.FileIO object that just take care of opening the stuff, and a io.BufferedReader that wraps FileIO to offer practical methods to access its content.
This what what open(mode="b") returns.
If you do open(mode="t"), which is the default, it wraps the BufferedReader into a TextStream that does the decoding part transparently for you, and returns that.
There is an great explanation of this by the always excellent David Beazley: http://www.dabeaz.com/python3io_2010/MasteringIO.pdf
What it should do is offering something this:
open() would always return BufferedReadfer, as_text() would always return TextStream.This completly separates I/O from decoding, removing confusion in the mind of all those coders that would otherwise live by the illusionary binary/text model. It also makes the API much less error prone: you can easily see where to the file related arguments go (in open()) and where to text related arguments go (in as_text()).
You can keep the mode, but only for "read", "write" and "append", removing the weird mix with "text" and "bytes" which are really related to a different set of operations.