Hacker News new | past | comments | ask | show | jobs | submit login

I think that throwing people into the deep end is the only way to get them to do it right in this case. Defaulting to graphemes is too often the wrong answer, as well (e.g. you don't want that in a parser).

Really, this is not dissimilar to "what do you mean, there are more letters than A to Z?" issue that plagued software written in US back before Unicode became dominant. The way we (my perspective on this is as a native speaker of a language with a non-Latin alphabet) have eventually solved it is by basically forcing Unicode onto those people. It broke their simple and convenient picture of the world, and replaced it with something much more complicated. But it was necessary.

My position is that letting programmers get away with a simplistic view of text processing (by allowing defaults that "mostly" work) is what creates those issues. So adjusting the abstractions such that they expose more of the underlying complexity is a good thing. People SHOULD believe that doing text processing the right way is hard, because it is.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: