Hacker News new | past | comments | ask | show | jobs | submit login

Right. "Character" is almost never what you meant, unless your goal was to be as vague as possible. In human languages I like the word "squiggle" to mean this thing you have fuzzy intuitive beliefs about, rather than "character". In Unicode the Code Unit, and Code Point are maybe things to know about, but neither of them is a "character".

In programming languages or APIs where precision matters, your goal should be to avoid this notion of characters as much as practical. In a high level language with types, just do not offer a built-in "char" data type. Sub-strings are all anybody in a high level language actually needs to get their job done, "A" is a perfectly good sub-string of "CAT" there's no need to pretend you can slice strings up into "characters" like 'A' that have any distinct properties worth inventing a whole datatype.

If you're writing device drivers, once again, what do you care about "characters"? You want a byte data type, most likely, some address types, that sort of thing, but who wants a "character" ? However, somewhere down in the guts a low-level language will need to think about Unicode encoding, and so eventually they do need a datatype for that when a 32-bit integer doesn't really cut it. I think Rust's "char" is a little bit too prominent for example, it needn't be more in your face than say, std::num::NonZeroUsize. Most people won't need it, most of the time and that's as it should be.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: