Hacker News new | past | comments | ask | show | jobs | submit login

If you want to do Unicode correctly, you shouldn't ask for the "length" of a string. The is no true definition of length. If want to know how many bytes it uses in storage, ask for that. If you want to know how wide it will be on the screen, ask for that. Do not iterate over strings character by character.



How many dots/stars should one display for a password? That's a question that can't be answered by your two valid question. Are you suggesting that dots/stars shouldn't be displayed for passwords, since you can't ask how many "characters" it is?


You could divide the length of the string by the length the '*' character in a monospaced font. It doesn't really make sense for a combining or other invisible character to get its own asterisk.


If you have an entry indicator, it should probably be about the same width as the entered text; or if you're concerned about leaking precise length information for fields that aren't monospaced, you could add a dot each time the rendered text would increase in width.


I'd avoid the term "character", but I'd argue there are valid reasons to consume a Unicode string grapheme by grapheme. For example, a regex engine trying to match "e + combining_acute_accent" wants to match both the pre-combined version and the version that uses combining characters.

The main thrust of your point — that "length" without clarification of what measure of length is meaningless — I agree with.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: