Hacker News new | past | comments | ask | show | jobs | submit login

Why? Why should it be conceptually different if it's easy to work with the encoded form?

Many unicode languages work with UTF-8 or UTF-16 internally. So working with the "transfer format" is common practice.

While it may not be necessary to know who the languages your program in work under the hood, expert programmers do want/need to know. That way they can write better code, or switch to another language or get the language devs to improve their internal handling.




Think of JSON.

The programmer shouldn't have to know that a newline character is written as \n in a JSON string.

The JSON string "a\nb" take 6 characters to write, but it's length should be given as 3.

99% people want to manipulate a JSON model, not the JSON (or BSON) serialization itself. The 1% can still use a byte array and do whatever hacks they like.


Bad example. If you want to embed that string in your code you have to type those 6 characters anyways.

A better example is if you want to find a newline in a string. If you do a find it in a UTF-16 string it may be position 8 and a find in UTF-8 may be position 12. Does it matter what the actual number is? NO. You just pass it to the next function or whatever.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: