Hacker News new | past | comments | ask | show | jobs | submit login

Now machine translation generates fluent text (at least for en-ja), but my dissatisfaction for machine translation is that it sometimes miss negative word that's very important for information. IMO don't miss negate word is important than fluent.



I fully agree with this. As a data scientist, I always think that this is a "natural" consequence of one of the main (if not _the_ main) metric used to evaluate machine translation algorithms, which is BLEU: https://en.wikipedia.org/wiki/BLEU

According to this metric, if you have a moderately long sentence like "I am not the person who said the president should be reelected" and your translation missed the "not", you would still get a score of 11/12 ~ 92%. And, as far as I know, word order doesn't even matter, so "I am the person who said the president should not be reelected", while wrong, would get a perfect score.

Of course these are rather artificial examples, and in general machine translation algorithms and their evaluation work because it's "easier" to create an algorithm that gets the right translation than one that, unintentionally, fools the metric systematically. Nevertheless if the research community used a metric that punished this kind of mistakes more strongly, I suspect that over time a few new algorithms could come up that improve on this specific point.

Alas, I don't know of any such metric (nor I would know how to design one, of course, otherwise I'd publish it ;-) ).


BLEU is not what is being optimized here. There are plenty of alternative scoring metrics, WMT has a competition on it every year.

It is also just false that BLEU does not care about order.


I tried using en-ja on Google Translate for: "fall through the cracks".

It's an extremely common idiom and I used it in a proper context. It fails, horribly. I don't know why I even bother checking on Google Translate. It basically fails every single time to create natural Japanese.

Our teachers could tell in a second if it was made by Google Translate.

Just don't rely on it. Ever.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: