Hacker News new | past | comments | ask | show | jobs | submit login

Somewhat off topic: any chance you know how come Google doesn't have an explicitly cantonese model for translation?



Not a Googler so I can only guess. But it seems like Google did try to treat Cantonese as a Chinese variant in the past, eventually they dropped it probably because they realised they're too different.

I know Google is actively working on the Cantonese version of Google Assistant, though not sure when it'll be officially released.


It is a variant of Chinese though. Chinese is a language family, not a language -- which includes Mandarin, Cantonese, Hakka, et. al.


Whatever it is, Cantonese has different pronunciation, vocabulary and even grammar from Mandarin. Which means it takes a non trivial amount of work to adapt a language model designed for one to the other.

Source: I'm a native speaker of one and fully fluent in the other.


afaik Google needs a multilingual corpus. so if Cantonese is mostly written using Chinese characters, the corpus will be in Chinese characters.

and if written Cantonese is mostly informal (conversation, shop signs) it will not often be multilingual. so the approach that has worked for most languages wouldn't work then.

and it surely wouldn't work for a completely different, lossy orthography - without independent training.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: