Now let's take the lower case of "BAFFLE" - should we get "baffle" or should the string class/function/wtfe attempt to recognize that a ligature can replace "ffl" and return to us "baffle"? More generally, should the string library ever attempt to replace letter with ligatures? Should this be yet another option?
And as I type this, another issue manifests: the spelling correction can't even recognize baffle as a properly spelled word; it highlights the 'ba' and ignores the rest.
Uppercasing and lowercasing is inherently lossy. E.g. the German ß becomes SS when uppercased, yet there is no way to know whether SS should be lowercased to ss or ß again. That's a reason why those things should be used, if at all, only as display transformations. Same goes for ligatures, but even those actually shouldn't be applied automatically, depending on the language. E.g. in German ligatures cannot span syllables and few layout engines can detect that.
I feel like I should learn German only so that I would be able to comment on the ß issue every time a Unicode thread pops up. From my uninformed point of view it is not really clear if ß should really be handled as a separate character/grapheme, or just as a ligature in rendering phase and stored as 'ss'. Or even if current-day orthography should be held at such a sacrosanct position that it shouldn't be changed to save significant amount of collective effort.
> or just as a ligature in rendering phase and stored as 'ss'.
Probably.
> to save significant amount of collective effort
I've seen this kind of suggestion a number of times on HN, and I find it highly amusing. When confronted with a difficult challenge in representing the world on a computer, apparently the answer is to instead change the world.
OK, but then how are you going to handle hundreds of years of legacy texts?
In German, 'ß' is definitely not just a ligature of 'ss'.
Consider 'Masse' (mass) vs. 'Maße' (dimensions).
Uppercasing these words will necessarily produce ambiguity.
It would be equally tempting -- and wrong -- to treat the German characters 'ä', 'ö' and 'ü' as ligatures of 'ae', 'oe' and 'ue'. They're pronounced the same, and the latter forms commonly occur as substitutions in informal writing, but they also occur in proper names, where it would be incorrect to substitute them with the former. However, if you want to sort German strings, 'ä', 'ö' and 'ü' sort as 'ae', 'oe' and 'ue'.
The point is, while it may have started out as a ligature (of either ſs or ſz, no one really knows for sure), it has long become a letter in its own right. You cannot treat it like a display-only ligature without throwing away information, e.g. the difference between Maße (measurements) and Masse (mass). People in Switzerland made a conscious decision not to use ß anymore, but that's not the case in other countries where the language is used.
As "ß" vs. "ss" changes pronunciation of preceding vowels, I can't see how it could be anything other than its own letter.
* "Fuß" ("foot") roughly rhymes with "loose."
* "Fluss" ("river") roughly rhymes with… um, nothing I can think of. It has the vowel sound of "look" and "book," at least as pronounced in the American Northeast.
Since the orthographic reform of 1996, this has become a big deal.
And as I type this, another issue manifests: the spelling correction can't even recognize baffle as a properly spelled word; it highlights the 'ba' and ignores the rest.