Hacker News new | past | comments | ask | show | jobs | submit login

> in cases where you know that the string contains only characters from some simple restricted subset

But I'm saying, even in cases where you don't!

A string function that works with byte characters, written before Unicode existed, can do useful processing on UTF-8 data which contains characters that didn't exist when that code was written.




That's true, but in most of those cases you don't need to be able to use numeric character-count-based indexes into the string (which is what the article is arguing that you don't need).

You'd typically be happy if the parsing function that you're using to find the location of (say) each comma in the string gives you an opaque token for each such location, with a way to use those tokens to get slices of the string back.

So in practice we can use byte offsets into a UTF-8 string as those tokens, while the programmer doesn't really have to care that that's what they are.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: