Hacker News new | past | comments | ask | show | jobs | submit login

afaik Google needs a multilingual corpus. so if Cantonese is mostly written using Chinese characters, the corpus will be in Chinese characters.

and if written Cantonese is mostly informal (conversation, shop signs) it will not often be multilingual. so the approach that has worked for most languages wouldn't work then.

and it surely wouldn't work for a completely different, lossy orthography - without independent training.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: