Hacker News new | past | comments | ask | show | jobs | submit login

I read the article and I found it quite lacking. Why on Earth would you force your LLM to translate sentence by sentence? It ruins the whole interest of LLM, which is to use large contexts to drive your generation. I used Deepl a lot in the past and I had a recurrent problem when translating from French into English, computer related texts. In French, a "chaine" in the context of computer science is mostly translated as "string", however, when translating with Deepl (or Google translate) since the model would not take previous sentences into account, the system would loose the computer context and translate "chaine" into "chain", which of course was usually wrong.

But the funniest part was when I wanted to translate "jeûner" in Greek. "jeûner" in French means "to fast", in the sense of not eating. However, Google translated "jeûner" into "gregoria" in Greek, which means fast in the sense of speed... It went through English to translate "jeûner" into "fast" then "fast" into "gregoria"...




I'm one of the authors on the paper. Actually, sentence-by-sentence translation is important in a machine translation system because in many cases users will only provide single sentences. We also test document-level translation in Section 5, and find large improvements (but it isn't the focus of our paper).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: