Hacker News new | past | comments | ask | show | jobs | submit login
DeepL's LLM Outperforms Google Translate, ChatGPT-4, and Microsoft (deepl.com)
43 points by geox 3 months ago | hide | past | favorite | 18 comments



> DeepL’s models are trained on the highest-quality linguistic data, with an unwavering focus on excellence. Combining world-class AI with the expertise of thousands of hand-picked language specialists who “tutor” the model, DeepL consistently provides best-in-class translation.

> In fact, recent blind tests show that language experts routinely prefer DeepL's translations. According to the data, DeepL’s translation is preferred: 1.3x more often than Google Translate, 1.7x more often than ChatGPT-4, 2.3x more often than Microsoft

There's no way that's true, because GPT-4 is vastly superior to Google Translate in general -- so if Google Translate is "preferred by experts" over GPT-4, to that extent, this would imply that the test they're using is somehow contrived or unrepresentative of the average use case.


It’s true. DeepL is based in part on the linguee data, which has long been the highest quality real world language use database. It excels at not translating literally, but actual meaning. While not being impaired by also having to be a general chat model with a conservative system prompt and fine tuning (GPT4)


> GPT-4 is vastly superior to Google Translate in general

You mean in translation? It might depend on the chosen language pair but for my use cases DeepL > Google Translate > ChatGPT4


Yeah, in translation.

In English-German translation, Google Translate will make at least 1-2 errors per document, and it's sometimes worse than that. GPT-4 generally produces error-free documents.

In English-Japanese translation, Google Translate's results are very poor, and GPT-4, though not perfect, is much better across the board.

Another thing is that Google Translate only produces word-for-word translations. GPT-4 can be told to modulate its results, like "translate this in an informal, casual voice," or "translate this in sparse and exact language for a corporate legal memo." Or even, "translate this into [specific dialect]."


Averaged paired preference is not necessarily transitive.


Presumably Google have not updated their Translate product to use the latest LLM technology - I gave it some Russian text and it produced a very poor translation that was literal garbage in parts. When I fed the same text to Perplexity I got an excellent translation.


While it's very good, and I do use Linguee frequently, by being a LLM it also confabulates. Invent a word in the source language, and it will try very hard to invent a translated word by using parts of the source. Use it but don't blindly trust it.


Yeah exactly, and surely it would also vary depending on the languages being translated from/to. I don't see how you can make claims like that and not link to the studies backing them up, this is so disingenuous.


I've used DeepL as an attempt to de-Google my use of Google Translate.

Prior to LLMs, they're pretty good.

The Google Translate UI is still better for Chinese, since it provides the Pinyin underneath both the target and the source language.

The quality of translation using the traditional DeepL varies; sometimes it is better than Google Translate, other times it is worse. They are both quite terrible for Chinese, since they provide literal translation: The product is essentially someone speaking English with Chinese words, disregarding natural expression, nuance, and distiction between spoken and written language.

I would still recommend DeepL for these reasons:

  1) If you pay them money, they say they will respect your privacy to some degree.
  2) They allow for API integration, so you can embed them into your tools and apps.
  3) Google Translate + DeepL (when they disagree) give a better result than either one alone.
I've used LLMs (GPT, Claude) to express succinct statements in other languages.

This has worked amazingly well. I look forward to try DeepL's LLMs.


> The quality of translation using the traditional DeepL varies; sometimes it is better than Google Translate, other times it is worse. They are both quite terrible for Chinese, since they provide literal translation: The product is essentially someone speaking English with Chinese words, disregarding natural expression, nuance, and distiction between spoken and written language.

There's no machine translation that can churn natural sounding results from Chinese->English, but I find Google Translate better than DeepL for certain niches, because DeepL seems to have been fed nothing but traditional literature, while Google has a certain awareness of many niche topics and can accurately translate made up names from video games for example.

Here's a paragraph from the Chinese wikipedia on a video game :

>《崩坏3》[b]是中国大陆游戏开发商米哈游开发的的手机3D角色扮演动作游戏。《崩坏》系列的第3作,沿用了前作《崩坏学园2》角色[4]。故事背景、剧情和世界观与《崩坏学园2》有所不同。讲述了女主角琪亚娜·卡斯兰娜和她的朋友们的冒险[5]。为ACT类型游戏

Google Translate:

>Honkai Impact 3[b] is a 3D role-playing action game for mobile phones developed by Chinese game developer MiHoYo. It is the third game in the Honkai series and uses the characters from the previous game, Honkai Gakuen 2[4]. The story background, plot and world view are different from Honkai Gakuen 2. It tells the story of the adventures of the heroine Kiana Kaslana and her friends[5]. It is an ACT game.

DeepL:

>Crackdown 3[b] is a 3D role-playing action game for cell phones developed by mainland Chinese game developer Miha Tour. The 3rd installment of the Crackdown series follows the characters of its predecessor, Crackdown Academy 2[4]. The setting, plot, and worldview are different from that of Wipeout 2. It follows the adventures of the female protagonist Kiana Kaslana and her friends[5]. For the ACT genre of games

Crackdown, Wipeout, Miha Tour are all wrong in this context.

And here's the free chatGPT (3.5?)

>'Honkai Impact 3rd' is a mobile 3D action role-playing game developed by miHoYo, a game developer from Mainland China. It is the third installment in the 'Honkai' series, continuing the characters from the previous game 'Honkai Gakuen 2'. The story background, plot, and world view are different from 'Honkai Gakuen 2'. It tells the adventures of the heroine Kiana Kaslana and her friends. It is an action-type game.

Since chatGPT is more than a tool for translation, here's the result from asking it to rewrite the paragraph into more natural sounding English :

> 'Honkai Impact 3rd' is a 3D action role-playing mobile game developed by miHoYo, a game developer from Mainland China. It is the third installment in the 'Honkai' series, featuring characters from the previous game, 'Honkai Gakuen 2'. The story, plot, and world are different from 'Honkai Gakuen 2'. The game follows the adventures of the heroine Kiana Kaslana and her friends. It belongs to the action game genre

I'd consider ChatGPT a winning tool overall, were it not for the occasional (rare in practice, but it happening even once is enough to make me angry) "I can't do that, Dave" from the AI safety bull. Google Translate doesn't attempt to filter speech. I can only speak from trying the free chat version of GPT. I have no idea as to whether the API is prone to this nonsense or not.


I use deepl regularly and it's been far better at translation and context awareness than anything else. The fact they look to be applying llms in a tarhetted way for their business model and product is, to me at least, a great use of LLMs.


A while back I tried using GPT4 for translation, when it did the translation it worked great. But there were many times when the model would refuse to translate the text, which was unpredictable. And so then I had to have another model check if the translation actually worked, which was a giant waste of time. Ended up just using google translate.


I think it’s notable that they left out Claude, which in my experience has been the best general purpose hosted LLM to date. Given how well known Claude is in the LLM space, it makes me wonder if Claude is actually better than the DeepL specialized LLM at translation tasks.


Claude suffered a bit for me in Italian, not translation directly but asking generic tasks such as "reproduxlce this graph in vanilla JavaScript and html"

That was a few weeks ago however and I didn't test if for direct translations.


It is not fully clear from the article, but it seems like their new model is enabled only for Japanese-English, Chinese-English, German-English and only on paid DeepL Pro at the moment.


I pretty much use this exclusively outside of documents which Google can generously translate for free.


Misleading title. It outperforms those models for translation tasks.


I guess it's because of character limit. In this case the title should be edited as not everybody is able to guess the context.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: