> German has 4 (7 if you consider cases) non-ASCII characters: äüöß(and upper-case umlauts). All of these are unique, well-defined codepoints.
German does not consider "ä", "ö" and "ü" letters. Our alphabet has 26 letters none of which are the ones you mentioned. In fact, if you go back in History it becomes even clearer that those letters used to be ligatures in writing.
They still are collated as the basic letters the represent, even if they sound different. That we use the uncomposed representation in Unicode usually, is merely a historical artifact because of iso-8859-1 and others, not because it logically makes sense.
When you used an old typewriter you usually did not have those keys either, you composed them.
I'm confused by your use of 'our' and 'we'. It seems you're trying to write from the general point of view of a German, answering .. a German?
Are umlauts letters? Yes. [1] [2] Maybe not the best source, but please provide a better one if you disagree so that I can actually understand where you're coming from.
I understand - I hope? - composition. And I tend to agree that it shouldn't matter much if the input just works. If I press a key labeled ü and that letter shows up on the screen, I shouldn't really care if that is one codepoint or a composition of two (or more).
I do think that the history you mention is an indicator that supports the author's argument. There IS a codepoint for ü (painful to type..). For 'legacy reasons' perhaps. And it feels to me that non-ASCII characters - for legacy reasons or whatever - have better support than the ones he is complaining about, if they originate in western Europe/in my home country.
(basically I searched for old typewriter models, 'Adler Schreibmaschinen' results in lots of hits like that). Note the separate umlaut keys. And these are typewriters from .. the 60s? Maybe?)
I am not entirely sure if Germans count umlauts as distinct characters or modified versions of the base character. And maybe it is not so important; they still do deserve their own code points.
Note BTW that in e.g. Swedish and German alphabets, there are some overlapping non-ASCII characters (ä, ö) and some that are distinct to each language (å, ü). It is important that the Swedish ä and German ä are rendered to the same code point and same representation in files; this way I can use a computer localised for Swedish and type German text. Only when I need to type ü I need to compose it from ¨ and u, while ä and ö are right on the keyboard.
The German alphabetical order supports the idea that umlauts are not so distinct from their bases: it is
AÄBCDEFGHIJKLMNOÖPQRSßTUÜVWXYZ
while the Swedish/Finnish one is
ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ
This has the obvious impacts on sorting order.
BTW, traditionally Swedish/Finnish did not distinguish between V and W in sorting, thus a correct sorting order would be
Vasa
Westerlund
Vinberg
Vårdö
- the W drops right in the middle, it's just an older way to write V. And Vå... is at the end of section V, while Va... is at the start.
German has valid transcriptions to their base alphabet for those, e.g "Schreoder" is a valid way to write "Schröder".
ß, however, is a separate character that is not listed in the german alphabet, especially because some subgroups don't use it. (e.g. swiss german doesn't have it)
1) To avoid confusing readers that don't know German or are used to umlauts: The correct transcription is base-vowel+e (i.e. ö turns to oe - the example given is therefor wrong. Probably just a typo, but still)
2) These transcriptions are lossy. If you see 'oe' in a word, you cannot (always) pronounce it as umlaut. The second e just might indicate that the o in oe is long.
3) ß is a character in the alphabet, as far as I'm aware and as far as the mighty Wikipedia is concerned, as I pointed out above. If you have better sources that claim something else, please share those (I .. am a native speaker, but no language expert. So I'm genuinely curious why you'd think that this letter isn't part of the alphabet).
Fun fact: I once had to revise all the documentation for a project, because the (huge, state-owned) Swiss customer refused perfectly valid German, stating "We don't have that letter here, we don't use it: Remove it".
1) It's a typo, yes. Thanks!
2) Well, they are lossy in the sense that pronunciation is context-sensitive. The number of cases where you actually turn the word into another word is very small: http://snowball.tartarus.org/algorithms/german2/stemmer.html has a discussion.
3) You are right, I'm wrong. ß, ä, ö, ü are considered part of the alphabet. It's not tought in school, though (at least not in mine).
Thanks a lot for making the effort and fact-checking better then I did there :).
Yes, that transcription approach is familiar; here the result of German-Swedish-Finnish equivalency of "ä" is sometimes not so good.
For instance, in skiing competitions, the start lists are for some reason made with transcriptions to ASCII. It's quite okay that Schröder becomes Schroeder, but it is less desirable that Söderström becomes Soederstroem and quite infuriating that Hämäläinen becomes Haemaelaeinen. We'd like it to be Hamalainen, just drop the dots.
German does not consider "ä", "ö" and "ü" letters. Our alphabet has 26 letters none of which are the ones you mentioned. In fact, if you go back in History it becomes even clearer that those letters used to be ligatures in writing.
They still are collated as the basic letters the represent, even if they sound different. That we use the uncomposed representation in Unicode usually, is merely a historical artifact because of iso-8859-1 and others, not because it logically makes sense.
When you used an old typewriter you usually did not have those keys either, you composed them.