Hacker News new | past | comments | ask | show | jobs | submit login

Whats the difference between the approaches?



In Python 3, all strings are Unicode, period. When you need a sequence of bytes that's something else, there's a separate types for that (called, unsurprisingly, "bytes"). If you need to treat it as a string, you use the decode() method and pass the encoding should be used to interpret it. If you need to get byes out of a string, it's the reverse process - you call encode(), and, again, specify the encoding.

In Ruby, strings are byte sequences with encoding attached. Unicode is not special - it's just one of many available encodings. And different strings in the program can have different encodings. This makes it possible to represent data richer than what Unicode allows (e.g. the various East Asian encodings that avoid CJK unification issues). But it also means that it might be impossible to e.g. concatenate two random strings, or even compare them for equality in a meaningful way, because their encodings are incompatible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: