As a native Swedish speaker, I was curious to see how well the auto-translator would handle this rather niche financial text.
The wording "The value of words corresponding to" in the second sentence confused me, but I first chalked it up to me not understanding finance-speak in English well enough.
Then I realized hovering shows the original text, and investigated. The original Swedish text has a typo! It says "orden" (="the words") where it should say "ordern" (="the [purchase] order").
Well, for a translation about finance it is (very!) surprising that it seems to have just transformed "kronor" (="crowns", the everyday name for the SEK, the Swedish currency) into "dollars". It's not as they two currencies are 1:1, or even close to it. Very strange.
Also there was a Swedish word that was just copied ("anullerades", meaning more or less "cancelled", "voided" or something along those lines) into the English text.
I can confirm the translating-names-of-currencies issue, but something even worse is that the names of languages can also be similarly converted.
For example, the word "Eesti" will often get Google-Translated to "English" rather than "Estonian".
This means that a film at my local cinema that my web browser assures me is in "English" will in fact be in Estonian. And an interview with a Russian saying that he doesn't speak Estonian gets translated so that he appears to say that he doesn't speak English.
The product designers special-cased language names, doing extra work to produce what will almost always be the wrong result.
(And what they can do to place names is often patently ridiculous. For example, "Peterburi tee" should either be left alone or maybe translated to "St Petersburg Road" but actually somehow becomes "Hertford Road". And the ZIP + City name "13415 Tallinn" becomes "thirteen thousand four hundred and fifteen Tallinn".)
> The product designers special-cased language names, doing extra work to produce what will almost always be the wrong result.
They absolutely did not do this. It's an artefact of statistical translation. In the corpus there are a lot of English documents saying "This document is in English", whose translated versions in Afrikaans (because I know Afrikaans) say "Hierdie dokument is in Afrikaans". Thus the translator learns the "hierdie" is Afrikaans for "this", "dokument" is "document, ..., and "English" is "Afrikaans".
The street name issue probably comes from an organisation whose Estonian office is in Peterburi tee and whose English office is in Hertford Road.
> The product designers special-cased language names, doing extra work to produce what will almost always be the wrong result.
Are you sure about that? It would sound quite likely to me that simply, the word for "English" tends to appear in the same context (N-gram etc.) as the word for "Estonian". For example the sentence "I speak English" would be common in English, while "I speak Estonian" would be common in Estonian, so it might associate the words together.
The machine learning behind Google's translations uses a corpus that consists of texts that have already been translated into many different languages. But if some of those documents start off with, say, "this is the <language> version of this press release" that's exactly the sort of thing that would confuse an algorithm that can't distinguish between translation and localization. Pure conjecture of course. I'm just not sure if anything is being special-cased.
One of the things Google Translator is horrible about is translating stuff like kroner into dollars, SEK into USD, etc. Granted, there are cases where translating kroner into dollars is ok, but I'm guessing this is the wrong translation most of the time.
Since it detects the value being qualified by the currency and localizes the unit in front of the value, I would understand that behavior if it would also do some cash value translation at the current rate (which would arguably be somehow less wrong), but it does not [0]. Also, weirdness comes when it changes it to DKK for 100 [1], and $ for 1000 [0]. It just makes no sense.
It makes no sense to the casual user, but you can understand why it does it when you learn that Google Translate uses statistical methods to learn. Google feed Translate with articles and pages that have been already been translated by a human, and the program learns the translations of words and sentences from that. The problem arises when the two documents it's taught with differer slightly. With translations, this often happens with currencies, country names (lots of examples of translate screwing those up too) and numbers.
Are there common expressions that are equivalent in english and swedish with the respective currencies?
E.g. stuff like "dollar to dollar", "bet someone dollars to doughnuts" ?
Anyway, plenty of text would probably match word-wise ("X bought Y for $AMOUNT $UNIT") in financial news, so the mapping of "dollar" over "kronor" seems a reasonable error.
For probably the same reason, google also translates hungarian "1000 forint" to english "1000 HUF" going from the full word to a quasi-acronym for "HUngarian Forint".
I would guess "annulled" to be the most direct translation of "anullerades" ("annulled" carries the same meaning as "voided" or "invalidated" but tends to be used more in legal contexts).
Just an FYI, there is no way to tell how much of that article is machine translated since Google Translate allows anyone with a Google account to "Contribute to the translation".
Though I imagine it is mostly machine translated and it requires several agreeing "contributions" before it accepts them as accurate.
Alright, but it has its faults - it missed the word "error":
"Instead, it is about a parsing [error] incurred in exchange system due to a technical error"
The funny thing about translations (or automated understanding of text-snippets) is that it gets easier when you know the subject domain since it removes a lot of ambiguity in the individual expressions. So it might be feasible for a translator to make a good guess about the topic, but I don't know wether google translate attempts that.
Actually, I find you should always use signed ints.
With signed ints, you can notice that '2-3' has below 0, and act accordingly. If you do '2-3' with unsigned ints, there is not really any way of finding out something went wrong (other than looking for very large numbers).
Of course, particularly in a financal system, you should probably use an integer type which throws/aborts if an out-of-bounds error occurs.
How about using programming language/number library that properly handles integer promotion or signals a condition when a number gets out of bounds. Obviously it's slower than unchecked processor integers, but as this story shows, in some circumstances it's well worth it.
No. Either the value started as a signed value, and then was converted to an unsigned ( something well warned about by observing compile warnings or Lint ), or it was unsigned all along. Either way the same result would have occurred.
Slightly better is 64bit signed. There can always be overflow though. The reason these values are POD's is because tickers move around at a very high speed.
That's quite a controversial statement. Using unsigned ints may just clutter code without providing any safety. The value of 4e9 is still invalid, even though it's positive. As you say, checking your inputs is important, and in this case values can be too large, and should be rejected.
Using "unsigned ints" for values that will never be negative allows you to perform one validation test instead of two. Use a typedef to avoid writing "unsigned" everywhere and you end up with less clutter in the code (for humans) and in the binary (for machines).
Before we jump into how the code might have been wrong, there are other issues that can impact mission-critical applications like this that have little to do with the high-level code. In financial applications it's somewhat common to build the hardware to tolerate SEUs. For example, I know of one application where two identical digital subsystems are used, given the exact same input, and the outputs are compared. If the outputs ever differ a logic error is assumed and operation halted.
Logic errors are rare but not that rare. At sea level you can expect ~1 bit flip in 4GB of memory every 24 hours. Often it's in an invalid line or gets overwritten anyway, but for applications like this where one logic error can cost you your shirt, application of hardware-level error checking (ECC, SEU detection and correction, etc).