This is a fun project. Even without seeing the original script next to it, it's fun to try to reverse engineer what each text box is meant to mean. Surprisingly enough considering how supposedly sophisticated the machine translation is, a lot of these are the types of basic mistakes that you would expect from a beginner or a really bad computer translation.
I suspect they translated each text box separately, instead of joining multiple text boxes into one string - because I frequently get much better translations than this from deepl. Japanese is a language that is often difficult to understand without context, so the smaller the chunks of texts are the worse a computer will do.
Stuff like:
"Your brother is probably a man. Probably an adult." is just a bad translation of でしょう.
"Put the bomb in the ring in his hand" The second part is an overly literal translation of 手に入る, and the ring was probably called 爆弾の指輪 or something
”Very!" is probably just 全く
etc
These are things any intermediate learner would know, and yet AI models still struggle with them.
The original text is also completely in kana. This confuses AI models that mostly understand normal text with kanji. They don't really "know" that 手に入る and てにはいる are the same thing.
Eg, with DeepL:
ダークエルフなら ほくとうのしまの どうくつに すんでるよ! => If you're a Dark Elf, you'll find me in the caves of the Far East!
ダークエルフなら北東の島の洞窟に住んでるよ! => If you're a Dark Elf, you live in a cave on the northeast island!
DeepL also seems to be quite sensitive to the placement of spaces.
So that those of us who don't speak Japanese can get a feel for what you're saying - what do the original Japanese texts mean (i.e. could you give us a better translation)?
でしょう roughly means "don't you agree" or "I agree", I didn't play the FF games so I don't have enough context to translate its original meaning here.
手に入る literally translates to "put into hand", but it means "acquire" in the context of opening a chest and finding a potion. Japanese games say "potion put into hand" and English translations say "You received a potion".
全く literally means "truly" or "very" but a lot of the time people use to mean "seriously" in a sarcastic sense. Like, "seriously!?" or "jeeze!". A beginner would think "very" but somebody with experience would know they're just complaining in a way that doesn't elicit* a response.
re: でしょう, without seeing the original Japanese, my guess is that a more atural English translation for your first example would be "Your brother is supposed to be a man, right? He's supposed to be an adult, right?" or something more along those lines.
Mato's translation comparison doesn't cover this line, so I can't easily link to the full context, but this is a young girl (Rydia) scolding an older man (Edward/Gilbert) for being too overcome with grief to take action. In the original US translation, the dialogue is "Crybaby! You are a man! You are a grown-up! You are not the only one who has lost loved ones!" The more literal translation would be "Aren't you a man?" but the meaning is the same.
"Brother" seems to be coming into it because Rydia is addressing Edward as "o-nii-chan". Pretty classic example of something that you couldn't translate accurately without knowing who the characters are.
"o-nii-chan" is a great example of translation difficulty because it's very natural in Japanese to refer to your older brother as "o-nii-chan" and it sounds like bad exposition writing in English.
And in this case it's just being used to address an older but not elderly man, in a way that's totally normal in Japanese but would be utterly bizarre in English.
A little off topic, but 15 years ago maybe I saw a project that taught a player a foreign language by starting a game in their native language and gradually substituting localized strings of the target language. They applied the technique to The Sims and to Grim Fandango. Still seems like a good idea to me.
I assume that this would work better for games like Sims or SimCity, since generally you're always clicking the same things and reading the same text - for e.g. Grim Fandango, you'll only be reading the same text again when you replay the game, which takes a while.
Version 2 of the translation is also pretty bad, but in a different way. This version has logically-constructed sentences, but the actual phrase/word/term choices themselves are often far worse than Version 1. “Proper Japanese” sentences translate poorly compared to Version 1. Overall, this translation is much more playable than Version 1.
The neural network result looks better, but is less accurate, yet much more playable? That's an odd conclusion...
There was a pre-pub paper on automated translation of manga. I wonder if the same approach could be used in games, assuming you can find a big enough corpus of well-translated games?
I think the point on this project and this submission is to illustrate how garbage (machine)translations end up for sentences too short or missing contexts
I don't understand how some of the translations can be SO BAD. E.g. the one that mentions "fitness room" and "Thailand"; What was the original Japanese sentence for that one?
Looks like it got the dark elf and "I live in" parts, but "fitness room in Thailand" is probably what is commonly referred to as "neural network hallucination."
AFAIK the premise of NNMT is that it doesn't try to do much in the way of parsing, but tries to "learn" the association between phrases in the two languages by matching of the training data, so if it sees "fitness room" and "Thailand" and that happened to somehow align with an occurrence of ほくとうのしまのどうくつ , that's what it will think they translate to.
You can see some interesting "probing of Google Translate's guts" here:
Who knows how it came with up with that crazyness.
The original translates to something along the lines of "Dark elves live in the caves on the islands in the north east".
Maybe it took "north east" to mean Thailand and instead of separating しま and の it just took it as one block of text しまの (Shimano) like the bike company, and therefore fitness and どうくつ is cave which is in a way a type of room but that is one hell of a tortured translation.
GPT-3 will naively do much worse than DeepL at translation tasks; but may do pretty well (maybe better, though I don’t know that anyone’s rigorously compared them) if you contextualize the question carefully, e.g. make it a fill-in-the-blank line after a bunch of pre-made translation pairs, or even a “resolve this blank in the middle of the story” question with both prefixed and suffixed example translation.
But GPT-3 still hasn’t seen “much” non-English test (they only intentionally feed it mostly-English-language sources for now) so it’s definitely not as good at translation as a GPT-4 that had seen equal amounts of other-language corpus would be.
I suspect they translated each text box separately, instead of joining multiple text boxes into one string - because I frequently get much better translations than this from deepl. Japanese is a language that is often difficult to understand without context, so the smaller the chunks of texts are the worse a computer will do.
Stuff like: "Your brother is probably a man. Probably an adult." is just a bad translation of でしょう.
"Put the bomb in the ring in his hand" The second part is an overly literal translation of 手に入る, and the ring was probably called 爆弾の指輪 or something
”Very!" is probably just 全く
etc
These are things any intermediate learner would know, and yet AI models still struggle with them.