A nice synchronicity here, I was only checking Māori words today because The Guardian's cryptic crossword was set by "Pangakupu" (which means, logically enough, "crossword"). This crossword setter always includes a hidden Māori word or phrase in the puzzle.
I see you've posted about Maori stuff a couple of times. I want to congratulate you, this is really, really great. Thank you for working to preserve a language and culture! You're presenting resources that are tough to find, and that's an amazing thing.
The Māori word "Māori" can be transcribed into the International Phonetic Alphabet (IPA) as:
/ˈmaːɔɾi/
Here’s a breakdown:
/ˈ/ – indicates primary stress on the first syllable
/m/ – a voiced bilabial nasal, like the "m" in "man"
/aː/ – a long open front unrounded vowel, similar to the "a" in "father," but held longer (the macron indicates length)
/ɔ/ – a mid-open back rounded vowel, like the "o" in "thought"
/ɾ/ – a tapped or flapped "r," similar to the quick "r" sound in Spanish "pero"
/i/ – a close front unrounded vowel, like the "ee" in "see"
This transcription represents the most common pronunciation of the word "Māori."
Sort-of. Because In Anglo world "aa" is "ä". Even ChatGPT thinks that it ok to use "AA" when making a Finnish morse generator.
In hindsight Maaori is not so bad. Some American Indian writing systems are just pronunciation quides for Anglos (or French). I tried to study Haida some 30 years ago, but it was too complex and miserable, because there was no actual audio clips available at that time.
ChatGPT doesn't think, and I fail to see how it is in any way relevant to the discussion.
Marking a long vowel with a macron has a long heritage, dating back to Ancient Greece at least. Yes, some other writing systems, such as Greenlandic, use a double vowel.
Finnish seems to use ä, ö and å as independent letters, rather like Swedish and Danish, unlike German, were ä, ö and ü are regarded as a, o and u with a diacritical mark. These do not seem to be symbols which mark vowel length.
I don't know Māori, but the Wikipedia page gives the alphabetical order for the language and does not list the long vowels separately, so I assume that, as with German or French, they're regarded as the standard letters with a diacritic mark added.
They are indeed standard letters with diacritics added - but macrons are the only diacritical marks used for Māori. Some people do use double vowels but it's less common than using macrons.
I was going to disagree with you, because most kiwis have no idea how to write the special o (myself included), so they’d end up typing toreo.nz instead.
Which as it turns out, redirects to xn--treo-l3a.nz anyway.
I’m 2y into having an iPhone, generally liking it better but autocorrect alone keeps me on the edge of switching back to android, that’s how bad it is. It’s not even about comparing to android, it’s just that iOS is bad. If I had a bigger phone I’d turn it off entirely but on a mini it’s juuuust useful enough that it makes me want to throw my phone against the wall _less_ than if it was off.
I honestly want to have a coffee chat with the PM in charge of autocorrect at apple, I need to understand what the hell they are thinking!
The ō is an o with a macron. It's pretty easy to install a keyboard layout that supports it: https://kupu.maori.nz/about/macrons-keyboard-setup. Many mobile keyboards support it by default with long presses to pop up an accent/variant chooser.
> It's pretty easy to install a keyboard layout that supports it
Only if you don't need anything else from your keyboard layout. I use Dvorak and need to type Japanese, and I think either of those makes it impossible to enter macrons on Windows.
You can have multiple keyboard layouts installed, and it takes a fraction of a second to switch with a shortcut key (on Windows it's Win-space). I have a Māori keyboard layout installed on my work computer, but I only switch into it to type words with macrons, then switch back (I use ` more often and don't like having to double-press it).
Sure, but even if I switched back and forth, the keys would be in the wrong place (because you can only have a qwerty layout with macrons, right?). Somehow the model is wrong compared to Linux where I can do compose to get a macron on any "underlying" layout.
Unfortunately the macron is the one missing dead-key accent on the US "ABC" layout. It's easy enough to hit the globe key when this comes up, but it annoys me a bit that Opt-y is ¥, and Shift-Opt-Y is Á, which is a duplicate: Opt-e-A will also produce it. I'd be happier if Opt-y was the macron dead key and Shift-Opt-Y took over for ¥: I can go a year without needing the Yen symbol, but it makes sense to have it. I don't think the English layout needs two ways to type Á though, it's excessive.
Slightly off-topic, but it would be nice if HN interpreted punycode in link descriptions. Especially given that the links go through a redirect, which means that the browser status bar sees them as part of the query and not the domain, so the browser's own interpretation of punycode never gets applied.
Seeing the Punycode link is actually a security feature, because it means you aren't tricked into visiting, say, pple-06g.com (apple with a Cyrillic a).
There are conventions around that. https://chromium.googlesource.com/chromium/src/+/main/docs/i... Generally, if all the characters are from one script, then it is decoded. There are lots of exceptions detailed there, but it's harder to make a homoglyph attack work using only characters from one script to impersonate another.
That's not a convention, it's a specification for how Google Chrome does it.
And it's not even a full specification. Several of its 13 steps link to other documents that need to be read to implement the spec fully. Step 12 refers to a list of "dangerous patterns" which appears only to exist in the Chromium source. Step 5 refers vaguely to "any characters used in an unusual way".
It's not OK to say that because Chromium does it, it's some internet standard that random website maintainers should implement.
I think you're ignoring the conversation. There is a lot of discussion to be had, and we don't have to say that decoding punycode is a security risk and simply do without. I also said "conventions" specifically to avoid meaning that these are hard-and-fast rules. And Firefox does something pretty similar. https://wiki.mozilla.org/IDN_Display_Algorithm#Algorithm
Someone always says this when a punycode link shows up.
I'm glad they don't. What you see? That's the link. It's what the browser sends, it's what DNS resolves: it's the link. Displaying it as Unicode is just a display option, and it's one which opens up all manner of mischief through confusables.
It's a hacker culture choice, and it's one I appreciate.
It is! So kind of you to notice. Perhaps you could also notice that English is the language used on Hacker News.
I'm quite sure a website centered in a different cultural landscape might choose a different convention. Good for them, I say.
If URLs start being Unicode, and not an ASCII encoding which is sometimes displayed as Unicode, that would be a different story. But that's not how things are.