Hacker News new | past | comments | ask | show | jobs | submit login

The open() API is inherited from the C way, where the world is divided between text files and binary files. So you open a file in "text" mode, and "binary" mode, "text" being the default behavior.

This is, of course, utterly BS.

All files are binary files.

Some contains sound data, some image data, some zip data, some pdf data, and some raw encoded text data.

But we don't have a "jpg" mode for open(). We do have higher API we pass file objects to in order to decode their content as jpg, which is what we should be doing to text. Text is not an exceptional case.

VSCode does a lot of work to turn those bytes into pretty words, just like VLC into videos. They are not like that in the file. It's all a representation for human consumption.

The reasoning for this confusing API is that reading text from a file is a common use, which is true. Espacially on Unix, from which C is from. But using a "mode" is the wrong abstraction to offer it.

If fact, Python 3 does it partially right. It has a io.FileIO object that just take care of opening the stuff, and a io.BufferedReader that wraps FileIO to offer practical methods to access its content.

This what what open(mode="b") returns.

If you do open(mode="t"), which is the default, it wraps the BufferedReader into a TextStream that does the decoding part transparently for you, and returns that.

There is an great explanation of this by the always excellent David Beazley: http://www.dabeaz.com/python3io_2010/MasteringIO.pdf

What it should do is offering something this:

    with open('text.txt').as_text():
open() would always return BufferedReadfer, as_text() would always return TextStream.

This completly separates I/O from decoding, removing confusion in the mind of all those coders that would otherwise live by the illusionary binary/text model. It also makes the API much less error prone: you can easily see where to the file related arguments go (in open()) and where to text related arguments go (in as_text()).

You can keep the mode, but only for "read", "write" and "append", removing the weird mix with "text" and "bytes" which are really related to a different set of operations.




Let’s be clear here that the fault is not with Python but with Windows.

Python uses text mode by default to avoid surprising beginners on Windows. If you only use Unix-like OSs you will never have this problem.


The problem is not "text mode by default". The problem is that the API offers a text mode at all.

Opening a file should return an object that gives you bytes, and that's it.

This "mode" thing is idiotic, and leak a low level API that makes no sense in a high level language with a strong abstraction for text like Python.

Text should decoded from a wrapping object. See my ohter comments.


Splitting it into two parts like that would make seek() kind of funky, but I suppose it is already.


Sadly, there is no possible migration path. Because text is the default "mode".


How would this work

    with open('text.txt', 'w').as_text():


    with open('text.txt','w').as_text() as f:
       f.write("text")


it's just too weird and open-ended.

the next thing will be a bunch of "open" functions:

   with open_binary("filename") as f:
       ...


    with open_text("filename") as f:
        ...
How do I open these files in writeable mode?

    with open_text("filename").writeable() as f:
        ...
This is getting absurd.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: