Hacker News new | past | comments | ask | show | jobs | submit login

This document is good, but doesn't mention the case of ligatures. German's "ß" is a problem, and it is not obvious how go handles it.

In javascript:

     "ß".toUpperCase().length !== "ß".length;
Does weiss == weiß ?



1) It is ligature only in historic sense, so it's not;

2) Ligatures (e.g. ffi as ffi) are deprecated in Unicode;

3) weiss ≠ weiß in any sense

Edit: 4) x.toUpperCase().length ≢ x.length, upcasing can change length;

5) length in JS (in 100000 other languages) count codepoints (at best), it's useful for nothing here


Does weiss == weiß ?

Yes and no. The swiss would write the former, other German speaking (writing) countries would write the latter. It is incorrect in Germany (after ie, au, eu, ... you must not write ss, unless it's a name, such as the city Neuss)

The upper case of weiß would be WEISS. But it's hard from the upper case WEISS to determine if the lower case is weiss or weiß. (This is why one should never write people's names in bibliographies in small caps.)


Well, toUpperCase() is kind of a broken API. It should be something like "weiß".toUpperCase("de-DE") to distinguish to "weiß".toUpperCase("de-CH").


You can upper case of weiß as WEIß. It is mandatory for taxes and other documents and recommended by the Post.

Technically, Unicode has a capital sharp s since 5.1.0, so we could write WEIẞ.


Yes, you can do that. But that's evil and ugly (mixing uppercase and lowercase letters that way). I know it has do be done sometimes.

And I am glad that U+1E9E (LATIN CAPITAL LETTER SHARP S) is not official part of German orthography.


>Does weiss == weiß ?

You need a case folding function/method to check for this.

For eg. in Perl, see the fc function - http://perldoc.perl.org/functions/fc.html

  fc("weiss") eq fc("weiß");   # true


For a normal ligature, if http://golang.org/src/pkg/unicode/letter_test.go?h=ToLower is anything to go by, then no for your question, but yes to your code, just not the way you think it works. Which is to say, strings.ToUpper("\u0133") appears to produce "\u0132" as a result.

But \u00DF appears to be a special case, as there's no uppercase for it. If I had to guess, I'd say it should return \u00DF. I mean, if I uppercase "+", do I expect something else back? Doubtful.


Unicode tells me:

• Special Casing: Lowercase: 00DF [ ‌ß ] Uppercase: 0053 0053 [ ‌S ‌S ] Titlecase: 0053 0073 [ ‌S ‌s ]

• NamesList: = Eszett • German • uppercase is "SS" • in origin a ligature of 017F and 0073 → (greek small letter beta - 03B2) → (latin capital letter sharp s - 1E9E)

(in origin a ligature of 017F and 0073 is not undisputed)

U+1E9E (LATIN CAPITAL LETTER SHARP S ẞ) is not officially allowed in German orthography • NamesList: • lowercase is 00DF → (latin small letter sharp s - 00DF) • Designated in Unicode 5.1




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: