Hacker News new | past | comments | ask | show | jobs | submit login

Comparing this to IBM Watson and Google Translate:

Chinese Input: 古荡塘苗路华星现代产业园E座正门

LibreTranslate: Ordinary gate of the modern industrial plant of the Hyong Chung Chung

IBM Watson: Ancient Slut Pond Miao Luhua Star Modern Industrial Park E Zhengmen

Google Translate: Main entrance of Block E, Huaxing Modern Industrial Park, Miao Road, Gudangtang

None is accurate, but it is nice to have options.




This is a really interesting example. I can't read Chinese but I'm assuming the input is an address or location?

Such domain knowledge would be vital in providing a good translation and sanity checking the output. But a straight sequence to sequence machine translation would not capture that context. It looks like thay is what is happening with the first two translations, while Google's may have actually realized its and address (but maybe not as you say the answer is wrong, maybe their ML is just better).

Your example highlights the point that naked ML models can only ever be so good, and that it's really as part of a system that they can be truly effective. You can imagine in the translation some combination of a classifier or NER that identifies an address, a translation model, and an english model that detects a sensible answer.


What is the correct translation? That a bilingual human might give?


Google's, I think. It's an address referring to the main gate of an area.

Actually it's pretty darn accurate from what I can tell. Maybe my Chinese just isn't very good, but I don't think I can do much better.


Yeah it's almost perfect, the only inaccuracy is that the district is just "Gudang" and "tang" is part of the road name (i.e. "Tangmiao Road").

If you check the location on Google Maps you can see it's Tangmiao Road and Gudang District (the "District" part is not present in the address): https://www.google.com/maps/place/Huaxing+Modern+Industrial+...

Without context it's a good attempt, only with deeper cultural knowledge would someone be able to guess that "Miao" on its own is too short for a street name and it's likely "Tangmiao" together, though without knowing the place I think it would also be reasonable to guess that the whole thing is the road name like "Gudangtangmiao Road".

I guess using Google Maps to validate Google Translate is kinda questionable so here's the same location on Bing Maps: https://www.bing.com/maps?osid=d3dfc1f0-f8ed-48cd-898d-0a765...


DeepL does get the segmentation correct:

"Main entrance of Block E of Huaxing Modern Industrial Park, Gudang Tangmiao Road"

https://www.deepl.com/translator#zh/en/%E5%8F%A4%E8%8D%A1%E5...


Actually DeepL seems to be the best of the bunch. Here is another test:

Input: 北京市西城区地安门西大街49号

DeepL: No.49, Di'anmen West Street, Xicheng District, Beijing

Google: 49 Di'anmen West Street, Xicheng District, Beijing

Watson: No. 49 Avenue West Main Street in Xicheng District, Beijing

Libre: 49th Anniversary Street, Westtown, Beijing

(Libre is the worst for all address type input - which is what I'm interested in - shameless plug - I'm building a Geocoder from China at https://geocode.xyz/CN . I've so far tested over 3k addresses)


Deepl seems to have a bit of an issue with uncommon street suffixes in Chinese. For example: 江苏省苏州市姑苏区东中市374号槔桥头

Google: Bridge Head, No. 374, Dongzhong City, Gusu District, Suzhou City, Jiangsu Province Deepl: Pulley Bridge, No. 374, Dongzhong City, Gusu District, Suzhou, Jiangsu Province Libre: Cambridge No. 374 in the eastern part of the city of Jiangsu, province of Jiangsu Watson: No. 374 Bridge Head in East China, Suzhou, Suzhou, Jiangsu Province.

You can kind of see which service goes for the literal more than the interpretive. It might be a bit unfair to use an address that is more than just the literal street address, although in actual speech, this address is at an intersection with two bridges and it's enough of a local landmark that the addition would make sense. Except for the uncertainty over the proper name of the bridge that actually doesn't throw Google or Deepl off. It's the 市suffix for a street, not unique in this city but certainly rare, that gets all of the services. Libre at least gives it a try, Watson just pretends like it doesn't exist and for whatever reason translates the old name of the city to the new name of the city, which obviously now encompasses a much greater area and exists in a different context. Deepl seems to have figured out that you aren't really supposed to have two cities in one address and tries to rectify that in spite of the literal. I would imagine that a human translator would use the entirety of the street name and add "street" in English to the end. Definitely interesting to see how these services handle somewhat nonstandard and much older address patterns that don't originate necessarily in Mandarin and frequently relates to local landmarks that no longer exist, all of which requires some contextual work beyond your standaard post-1949 naming of streets that tends to be fairly standardized both thematically and in form.


Deepl is the best in 99% of the case. I have no idea how they managed to get so much better than other players in the translating game.


It's better than Watson, shrug.


I haven't seen a machine translation as hilariously bad as Watson's in years.


should have included bing's their chinese translate is actually more developed than I had previously thought until someone showed me




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: