Hacker News new | past | comments | ask | show | jobs | submit login

In this example the dichotomy is between String (which is guaranteed by the type system to be valid UTF-8) and OsStr (which might be in an unknown encoding or otherwise not decodable to valid Unicode).

This is exactly when you want a systems language to require explicit conversions, rather than converting things silently and possibly losing or corrupting data.




rather than converting things silently and possibly losing or corrupting data

Exactly. Python3 went down the "silently converting" route, and it's not pretty[1]. I would go so far as to call it harmful.

http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/

I understand the difficulty in this space; much of it is caused by forcing the Windows unicode filesystem API onto python as its world-view, rather than sticking to the traditional Unix bytes world-view. I'm unixy, so I'm completely biased, but I think adopting the Windows approach is fundamentally broken.


The problem there is overblown - it's basically all due to the idea that sys.stdin or sys.stdout might get replaced with streams that don't have a binary buffer. The simple solution is just not to do that (and it's pretty easy; instead of replacing with a StringIO, replace it with a wrapped binary buffer). Then the code is quite simple

    import sys
    import shutil

    for filename in sys.argv[1:]:
        if filename != '-':
            try:
                f = open(filename, 'rb')
            except IOError as err:
                msg = 'cat.py: {}: {}\n'.format(filename, err)
                sys.stderr.buffer.write(msg.encode('utf8', 'surrogateescape'))
                continue
        else:
            f = sys.stdin.buffer

        with f:
            shutil.copyfileobj(f, sys.stdout.buffer)
Python's surrogateescape'd strings aren't the best solution, but I personally believe that treating unicode output streams as binary ones is even worse.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: