Hacker News new | past | comments | ask | show | jobs | submit login
Tō Reo – A Māori Spellchecker (xn--treo-l3a.nz)
170 points by firstbabylonian 17 days ago | hide | past | favorite | 43 comments



A nice synchronicity here, I was only checking Māori words today because The Guardian's cryptic crossword was set by "Pangakupu" (which means, logically enough, "crossword"). This crossword setter always includes a hidden Māori word or phrase in the puzzle.


I see you've posted about Maori stuff a couple of times. I want to congratulate you, this is really, really great. Thank you for working to preserve a language and culture! You're presenting resources that are tough to find, and that's an amazing thing.


Wow neat! There's a great collection of Māori-made technology for te reo Māori. I'm thinking also of Te Hiku Media's work building a Māori speech recognition system: https://blogs.nvidia.com/blog/te-hiku-media-maori-speech-ai/


My favourite is PAHU! https://pahu.maori.nz/ - a lorem ipsum generator in te reo Māori.


https://www.maoridictionary.co.nz/ This is the dictionary I use most often


I get "Sorry, something went wrong. If this error persists, contact us." every time I type something.


Thanks — there was a cookie-related bug, which should now be resolved.


I can't type anything in the text area on Firefox. Works in Chrome (macOS).


Yeah, that element should be 'textarea' instead of 'div' or at least the 'contenteditable' should be true.


Also doesn't work in Firefox on Windows but does in Chrome.


Thanks, Firefox support is in the works.


This is very nice and important. We need more tools for small languages.


Is this true that Maaori is crapped by Ænglish spelling? In all other languages long vowel is just two wovels, not some stupid umlaut on top.


Yes, says ChatGPT:

The Māori word "Māori" can be transcribed into the International Phonetic Alphabet (IPA) as:

/ˈmaːɔɾi/

Here’s a breakdown:

  /ˈ/ – indicates primary stress on the first syllable
  /m/ – a voiced bilabial nasal, like the "m" in "man"
  /aː/ – a long open front unrounded vowel, similar to the "a" in "father," but held longer (the macron indicates length)
  /ɔ/ – a mid-open back rounded vowel, like the "o" in "thought"
  /ɾ/ – a tapped or flapped "r," similar to the quick "r" sound in Spanish "pero"
  /i/ – a close front unrounded vowel, like the "ee" in "see"
  This transcription represents the most common pronunciation of the word "Māori."


It's certainly not an umlaut. Nor yet is it a trema, which is what you probably mean. It's a macron, which is commonly used to mark long vowels.

Sort-of. Because In Anglo world "aa" is "ä". Even ChatGPT thinks that it ok to use "AA" when making a Finnish morse generator.

In hindsight Maaori is not so bad. Some American Indian writing systems are just pronunciation quides for Anglos (or French). I tried to study Haida some 30 years ago, but it was too complex and miserable, because there was no actual audio clips available at that time.


ChatGPT doesn't think, and I fail to see how it is in any way relevant to the discussion.

Marking a long vowel with a macron has a long heritage, dating back to Ancient Greece at least. Yes, some other writing systems, such as Greenlandic, use a double vowel.

Finnish seems to use ä, ö and å as independent letters, rather like Swedish and Danish, unlike German, were ä, ö and ü are regarded as a, o and u with a diacritical mark. These do not seem to be symbols which mark vowel length.

I don't know Māori, but the Wikipedia page gives the alphabetical order for the language and does not list the long vowels separately, so I assume that, as with German or French, they're regarded as the standard letters with a diacritic mark added.


They are indeed standard letters with diacritics added - but macrons are the only diacritical marks used for Māori. Some people do use double vowels but it's less common than using macrons.

excellent use of a Punycode domain


I was going to disagree with you, because most kiwis have no idea how to write the special o (myself included), so they’d end up typing toreo.nz instead.

Which as it turns out, redirects to xn--treo-l3a.nz anyway.

Nice!


> kiwis have no idea how to write the special o (myself included)

I’m in New Zealand too. I work in MRI and have to type ‘TE’ (echo time) regularly, as well as the Māori word ‘te’.

Whatever secret sauce Apple sprinkles into iOS is actually malignant and it takes about 3 edits to type te/TE whenever I try.


Yeah, Apple's autocorrect implementation is shockingly bad. Android is much better in this regard.


I’m 2y into having an iPhone, generally liking it better but autocorrect alone keeps me on the edge of switching back to android, that’s how bad it is. It’s not even about comparing to android, it’s just that iOS is bad. If I had a bigger phone I’d turn it off entirely but on a mini it’s juuuust useful enough that it makes me want to throw my phone against the wall _less_ than if it was off.

I honestly want to have a coffee chat with the PM in charge of autocorrect at apple, I need to understand what the hell they are thinking!


I type what I want then add a sacrificial letter at the end, then delete it, then carry on.

- I just tested and this is the way, but it still took me a couple of tries due to it thinking it screwing with the capitalisation.


The ō is an o with a macron. It's pretty easy to install a keyboard layout that supports it: https://kupu.maori.nz/about/macrons-keyboard-setup. Many mobile keyboards support it by default with long presses to pop up an accent/variant chooser.


> It's pretty easy to install a keyboard layout that supports it

Only if you don't need anything else from your keyboard layout. I use Dvorak and need to type Japanese, and I think either of those makes it impossible to enter macrons on Windows.


You can have multiple keyboard layouts installed, and it takes a fraction of a second to switch with a shortcut key (on Windows it's Win-space). I have a Māori keyboard layout installed on my work computer, but I only switch into it to type words with macrons, then switch back (I use ` more often and don't like having to double-press it).


Sure, but even if I switched back and forth, the keys would be in the wrong place (because you can only have a qwerty layout with macrons, right?). Somehow the model is wrong compared to Linux where I can do compose to get a macron on any "underlying" layout.

I'm a fan of MacOS for making it real easy to type vowels with umlauts / macrons etc.


Unfortunately the macron is the one missing dead-key accent on the US "ABC" layout. It's easy enough to hit the globe key when this comes up, but it annoys me a bit that Opt-y is ¥, and Shift-Opt-Y is Á, which is a duplicate: Opt-e-A will also produce it. I'd be happier if Opt-y was the macron dead key and Shift-Opt-Y took over for ¥: I can go a year without needing the Yen symbol, but it makes sense to have it. I don't think the English layout needs two ways to type Á though, it's excessive.


I just hold down the vowel, then hit 9 for the macron.


a proper ő is also missing unfortunately


Awesome work, love to see the effort on the technical front of bringing a language into broader use!


Is the source code of this somewhere?


Slightly off-topic, but it would be nice if HN interpreted punycode in link descriptions. Especially given that the links go through a redirect, which means that the browser status bar sees them as part of the query and not the domain, so the browser's own interpretation of punycode never gets applied.


Seeing the Punycode link is actually a security feature, because it means you aren't tricked into visiting, say, pple-06g.com (apple with a Cyrillic a).


There are conventions around that. https://chromium.googlesource.com/chromium/src/+/main/docs/i... Generally, if all the characters are from one script, then it is decoded. There are lots of exceptions detailed there, but it's harder to make a homoglyph attack work using only characters from one script to impersonate another.


That's not a convention, it's a specification for how Google Chrome does it.

And it's not even a full specification. Several of its 13 steps link to other documents that need to be read to implement the spec fully. Step 12 refers to a list of "dangerous patterns" which appears only to exist in the Chromium source. Step 5 refers vaguely to "any characters used in an unusual way".

It's not OK to say that because Chromium does it, it's some internet standard that random website maintainers should implement.


I think you're ignoring the conversation. There is a lot of discussion to be had, and we don't have to say that decoding punycode is a security risk and simply do without. I also said "conventions" specifically to avoid meaning that these are hard-and-fast rules. And Firefox does something pretty similar. https://wiki.mozilla.org/IDN_Display_Algorithm#Algorithm


You can easily write a Tampermonkey Userscript for that. As HN doesn't update the CSS that often, should be quite low-maintenance solution.


Someone always says this when a punycode link shows up.

I'm glad they don't. What you see? That's the link. It's what the browser sends, it's what DNS resolves: it's the link. Displaying it as Unicode is just a display option, and it's one which opens up all manner of mischief through confusables.

It's a hacker culture choice, and it's one I appreciate.


On the other hand, that's a rather ango-centric viewpoint.


It is! So kind of you to notice. Perhaps you could also notice that English is the language used on Hacker News.

I'm quite sure a website centered in a different cultural landscape might choose a different convention. Good for them, I say.

If URLs start being Unicode, and not an ASCII encoding which is sometimes displayed as Unicode, that would be a different story. But that's not how things are.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: