Seems like the paragraph matching is still pretty buggy.
For example I chose French as the language to get better at and selected "Tour du monde en 80 jours" as the text. The very first paragraph of the French text is
> En l'année 1872, la maison portant le numéro 7 de Saville-row, Burlington Gardens -- maison dans laquelle Sheridan mourut en 1814 --, était habitée par Phileas Fogg, esq., l'un des membres les plus singuliers et les plus remarqués du Reform-Club de Londres, bien qu'il semblât prendre à tâche de ne rien faire qui pût attirer l'attention.
and the English translation is
> Mr. Phileas Fogg lived, in 1872, at No. 7, Saville Row, Burlington Gardens, the house in which Sheridan died in 1814. He was one of the most noticeable members of the Reform Club, though he seemed always to avoid attracting attention; an enigmatical personage, about whom little was known, except that he was a polished man of the world.
Notice how there's an extra sentence at the end about him having an enigmatic personage not present in the French version. Indeed that's in the next paragraph. And the matching goes basically out of whack. The second paragraph of the English is
> People said that he resembled Byron--at least that his head was Byronic; but he was a bearded, tranquil Byron, who might live on a thousand years without growing old.
But the third paragraph of the French is
> On disait qu'il ressemblait à Byron -- par la tête, car il était irréprochable quant aux pieds --, mais un Byron à moustaches et à favoris, un Byron impassible, qui aurait vécu mille ans sans vieillir. Anglais, à coup sûr, Phileas Fogg n'était peut-être pas Londonner.
They don't tell where they took the translations from and who created these translations. Given your examples, I assume the whole thing is based on Gutenberg or some similar, not on machine translations. If it were humans who made these translations, deviations as you describe them are to be expected. Literary translations ofter are rather some kind of re-creation or interpretation of the original work.
Yes, the problem is most translations aren’t literal one-to-one translations. Literal translations are usually hard to read so some translators use “dynamic equivalency” while others paraphrase heavily. Unfortunately this can make machine linguistic matching difficult and unreliable. For instance, there are two translations of the Nordic classic Kristin Lavransdatter which both read differently and there are people who will argue passionately about which translation is best.
Statistical machine translation uses curated parallel texts for training, but they tend to match with multiple corpora so the translation is some sort of average I believe. I wonder if matching with just one translation might produce less reliable results?
Yeah, we took those translations off of a pre matched corpora of books. We did, however develop later a system for automatic matching of translations, but unfortunately didn't get to use it
We do have a tool that does real paragraph matching based on the Gale–Church alignment algorithm and offline dictionaries. On top of that we have an additional manual process to make sure that the alignment is correct. However many of the books were pre matched and we didn't align them using our tool.
For example I chose French as the language to get better at and selected "Tour du monde en 80 jours" as the text. The very first paragraph of the French text is
> En l'année 1872, la maison portant le numéro 7 de Saville-row, Burlington Gardens -- maison dans laquelle Sheridan mourut en 1814 --, était habitée par Phileas Fogg, esq., l'un des membres les plus singuliers et les plus remarqués du Reform-Club de Londres, bien qu'il semblât prendre à tâche de ne rien faire qui pût attirer l'attention.
and the English translation is
> Mr. Phileas Fogg lived, in 1872, at No. 7, Saville Row, Burlington Gardens, the house in which Sheridan died in 1814. He was one of the most noticeable members of the Reform Club, though he seemed always to avoid attracting attention; an enigmatical personage, about whom little was known, except that he was a polished man of the world.
Notice how there's an extra sentence at the end about him having an enigmatic personage not present in the French version. Indeed that's in the next paragraph. And the matching goes basically out of whack. The second paragraph of the English is
> People said that he resembled Byron--at least that his head was Byronic; but he was a bearded, tranquil Byron, who might live on a thousand years without growing old.
But the third paragraph of the French is
> On disait qu'il ressemblait à Byron -- par la tête, car il était irréprochable quant aux pieds --, mais un Byron à moustaches et à favoris, un Byron impassible, qui aurait vécu mille ans sans vieillir. Anglais, à coup sûr, Phileas Fogg n'était peut-être pas Londonner.