Hacker News new | past | comments | ask | show | jobs | submit login

Confusingly, there are two different collation standards when treating German umlauts: One for lists of names and one for everything else.

When reciting the alphabet German school kids don't add the umlauts (at least I didn't – I don't think that has changed in the last decades) and if you ask someone how many letters the alphabet has they will answer "26" without hesitation while in Sweden they are treated as distinct characters of the alphabet in every sense.




> Confusingly, there are two different collation standards when treating German umlauts: One for lists of names and one for everything else.

Yes, this alone is confusing. My point is that it is even more complicated because strictly the standard only applies to umlauts but not to letters with diaeresis. Unicode offers a way to treat these two cases differently utilizing the combining grapheme joiner (CGJ).

>> The CGJ can also be used in German, for example, to distinguish in sorting between “ü” in the meaning of u-umlaut, which is the more common case and often sorted like <u,e>, and “ü” in the meaning u-diaeresis, which is comparatively rare and sorted like “u” with a secondary key weight. [1, page 850]

> When reciting the alphabet German school kids don't add the umlauts (at least I didn't – I don't think that has changed in the last decades) and if you ask someone how many letters the alphabet has they will answer "26" without hesitation while in Sweden they are treated as distinct characters of the alphabet in every sense.

You have a point here and maybe the English Wikipedia isn't so wrong in listing them as special characters. Being special doesn't make A umlaut a funky A though. It still is a letter in it's own right.

[1] Unicode Standard 10.0: http://www.unicode.org/versions/Unicode10.0.0/UnicodeStandar...


For what it worth, Dutch has "ij", which could be a letter, or diphthong, or two unrelated letters (in words borrowed from French, for example), and could also be written as "ij", or as "ÿ", or even "y".

There is no dedicated key on computer keyboard for it, but you are supposed to remember it's a unit when capitalizing, for example IJmuiden.

https://en.wikipedia.org/wiki/IJ_(digraph)


> When reciting the alphabet German school kids don't add the umlauts (at least I didn't – I don't think that has changed in the last decades) and if you ask someone how many letters the alphabet has they will answer "26" without hesitation while in Sweden they are treated as distinct characters of the alphabet in every sense.

Meanwhile, the Swedish alphabet has 29 letters. Since 2006. Before 2006, "W" wasn't considered a letter of the alphabet, since it was just a double "V", and only existed in German and English loan-words and names.

So before that, this would have been the correct sort order of some names: "Valter, William, Viktor", but these days they would probably be sorted like "Valter, Viktor, William"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: