I remember reading an interview with a British author, it might have been Neil Gaiman. He was just about to get his first book published in the US, and the American publisher contacted him and asked if it was OK if they changed a couple of British words to American, like "flat" to "apartment". Not wanting to risk the publishing deal, they said sure. A couple of month later they got the first edition of the US version back and found lines like.
- "Someone’s done a lot of find and replaces -- NEVER a good idea in galleys. Dave Langford put something in Ansible recently about how on the galleys of my novel Neverwhere someone Found-and-Replaced all the flats to apartments. People said things apartmently, and believed the world was apartment."
- "None of these were quite that bad – they were subtler..."
- "F’rinstance: All instances of the word round have become around. Fine for walking around the lake, less helpful for the around glasses, the around holes in the ice; blonde has uniformely become blond, and so blonder has become blondr; for ever has become, universally, forever, and for everything thus became foreverything, and we also got foreveryone, forevery time and so on. Each had to be found and caught."
- "Little things – the icelandic þú became bú, which won’t bother anyone who isn’t Icelandic. Blowjob had inexplicably become blow job again. (I think a blowjob is a unit of sexual currency, whereas a blow job is something you can get -- or indeed, give --instead of a wrist job, a sleeve job or a window job.) And once again every damn comma gets scrutinised. And I changed an Advertise to an advertize which was nice of me."
My South African sister in-law who recently immigrated to America said that her black peers in grade school would sometimes refer to themselves as African American.
It's not too surprising in America, but I know a person who would refer to any Black person as African American in Europe. In most cases, neither party would have had set their foot in America in their lives.
For good or bad, in many underdeveloped places where black people are rare, they are intuitively perceived as Americans (which stereotypically means rich, cool, free and capable of teaching you great English) and treated with additional respect because of this.
I had a South African friend in middle and high school. They fled the country when it became popular to put a tire around a white person's neck, full it with gasoline, and light it. Mandela's wife promoted doing this.
They gained American citizenship. She became my African American friend. Everyone told me I was terrible for talking that way.
Heard my daughter using the n-word for her school mates (they ain't black) - they all listened to rap music and obviously our schooling system don't cover American history i.r.o slavery.
Had to explain to her why it would not be good idea to do that in the US.
I never believed in the idea of embracing an offensive word and somehow "taking out the sting" like rappers did.
We have a similiar word - the k word for blacks - it is fortunaly banned and if you used it on social media you clearly identify as a racist.
I'm a little confused - are you talking about Black South Africans? Hmm.
I foolishly expected that people like Elon Musk could be truthfully called "African-American", but I did a survey of the definitions, and most all of them agreed that an African-American must be Black (and I'm not sure how that is recursively defined, or whether it has to do with race, ethnicity, or simply color of skin: it must be the latter if it has any meaning separate from "African"!)
Typically, African-American means black AND, unintuitively, American for generations IME. Maybe it shouldn't mean that but that's kind of assumed where I live at least. For first or second generation Americans you might use the country (eg Nigerian-American). For other races from Africa you'd also rarely hear them referred to as African American.
It's a bit weird from the outside tho cause if you're a white south-african and you hear the phrase used in America you'd be like "oh, I should refer to myself as an African-American" although unlike Americans, people generally only refer to themselves as being from the place once they have citizenship (and would therefore have learnt the intricacies of the American phrase) and even then many people choose not to.
Of course, part of the issue here is that there isn't really a good shorthand term for "the group of American Black people of diverse African origin whose cultures were oppressed and destroyed by the white majority for centuries, leading to a mélange culture intentionally constructed out of the remaining scraps".
Re>> "Someone’s done a lot of find and replaces -- NEVER a good idea in galleys.
Ah, the Scunthorpe problem. [0] It's never a good idea, but the immature 12 y/o in my always giggles at the examples.
One of my favorites is the word "ass" being replaced by "butt", resulting in this gem [1]...
"'Prohibition on buttbuttination' states, 'No employee of the United States government shall engage in, or conspire to engage in, political buttbuttination.'"
> In February 2004 in Scotland, Craig Cockburn reported that he was unable to use his surname ... In 2010, he had a similar problem registering on the BBC website
So the BBC was allowed to register their domain without issue, but Craig couldn't register his account with the BBC. That's Dick.
One positive outcome though is the invention of a collection of excellent words: foreverything, foreveryone and forevery. I don't know what they mean, but I like them.
Some of those could be avoided by using a search regex such as "\bblonde\b". I believe a sophisticated enough regex could solve all of these, if it were a central library used and maintained by many.
Maybe people can just deal with books having regional differences in the same way authors do. Is it so scary for F. Scott Fitzgerald to write about the color of the light, but Rowling to talk about the colours of the school houses?
This. Stop dumbing things down (like my browser's dictionary that apparently doesn't know the word "dumbing"). Let the readers find and delight in the differences of diction and usage.
I think the sort of people who read books can probably cope. It sounds like the publishers are making allowances for the sort of people who don’t do much reading anyway.
- can’t hear the differences in spelling between American and British English
- are listening to someone read. That person probably is able to read either American or British English.
That said, some words really are different and could cause confusion/disconnect so probably are potential candidates for being changed even in audio books.
- the one he gave “apartment” vs “flat” is a good example. I think British people would all be fine with “apartment” whereas I think “flat” might confuse some Americans who hadn’t heard that usage.
- “pavement” means the road surface in the US vs it means the pedestrian path on the side of the road in the UK (that in the US you would call a “sidewalk”). “Sidewalk” wouldn’t confuse a British person but it’s not a word a British person would use. “Pavement” meaning the road surface does confuse British people.
I'm not having that though - it's all one way in favour of keeping Americanisms because we're familiar enough with them to understand them. No thanks.
On the flipside, I was delighted to hear stories of US kids confusing their parents with British English because they'd watched so much Peppa Pig during lockdown. Very good.
I basically agree and wouldn't want things to be changed in general. That being said, the choice of words in a novel should generally be governed by the artistic intent of the author rather than the desire to teach people stuff.
But here's an example that's worth thinking through. This movie https://www.imdb.com/title/tt0110428/?ref_=fn_al_tt_1 was called "The Madness of King George". It's based on a play called "The Madness of George III". They changed the name when producing the movie because of the difference between Britain and the US.
In Britain pretty much everyone would know that George III refers to King George III. However when they first started discussing this project in the US, a common reaction was "I haven't seen "The Madness of George 1 and 2". Now the differences between the titles wasn't that important and clearly this was just an impediment to understanding with no benefit.
Now you could say "Well the people could go on wikipedia and find out that there was a King George and a King George II and then this guy, George III" and that's true but the problem is negative self-selection. People who don't know that don't realise that's what's at play here and so won't do that search so nobody learns anything.
It's not particularly reassuring that "George III" (spoken aloud as "George the Third" ), and "George 3" (spoken aloud as "George Three" ) are apparently thought to be interchangeable in the US.
This! One of the things that used to happen is you'd pick up a book, and encounter a word or phrase you don't know or understand, and you'd go look it up. This is easier than ever with the internet.
Don't know why someone would "rent a flat?" You go look the phrase up and discover that it's British usage. Confusion over... because you learned. Reading Moby Dick and don't know what "scraggy scoria" is? You look the words up and learn they're the perfect fit for the landscape Melville is painting.
My favourite anecdote is when an english bloke goes to the U.S. and asks if he can "bum a fag" from someone, meaning "ask for a free cigarette" in British.
That example isn't scary, of course. The -or/-our spelling difference is only the simplest example of what's different between American and British English. I think it's pretty clear to someone who's never even encountered the other spelling to infer what the word means.
But there are difficult examples, even in Rowling. She uses "revision" in a way that American readers don't know ("studying") and might have a hard time piecing together from context. It's not like the reader will be prompted to open a dictionary, since "revision" is a common word in AE. It just makes for a poor reader experience.
That's the most ridiculous thing I've come across in a while. It absolutely doesn't make for a poor reader experience, and that's some US Defaultism if I've ever heard it. And I'm neither British nor American! US writing is basically never localised to Australian English, why in the world must the converse be true?
> Righto, mate, gimme a shout as Ishmael. Few donkey's years ago—don't get your knickers in a twist 'bout when exactly—findin' me wallet as dry as a dead dingo's donger, and not a bloody thing worth a squiz on the land, reckon I'd chuck a U-ey on life, put a bit of water under me bridge. Wanted a stickybeak at the wet half of this great wide world, didn't I?
Do you imagine people without any dialect of English as a native language read books and articles on the internet with a dictionary at hand or do you think they somehow cope with the unknown and manage with context alone?
As someone who reads fantasy fiction a lot, honestly I'd prefer if I read everything in American English. The biggest thing that gets me is e.g. "defence" instead of "defense" and other places where British English uses a 'c' instead of an 's'. It seriously diminishes the immersion for me, even if I've read probably 50+ books with the version I'm not used to by now.
It's never stopped me from reading a book before, but it does diminish my enjoyment quite a bit. For nonfiction, I don't care, I'm reading actively regardless. But when reading passively for enjoyment, it's noticeable and irritating.
For fantaſy, eſpecially, I diſagree. When I'm in charge, the long-ſ will be mandatory in any ſtory involving magick, to better ſhew the hiſtorical baſis of the mythology (unleſs, of courſe, a ſkinwalker were to be involved, in which caſe American ſpellings are indeed preferred)
Incredible. I sent that to Google translate and it translated fairly well.
(For those who can't read it - the text is English transliterated into Hebrew script, not actually Hebrew. I'm impressed that Google managed to make sense of it)
- с in тектс is missed, it should be текстс or better теѯтс;
- ѵ is basically Greek upsilon which appeared in English mostly as u [auto] or y [system], therefore шѵд would be better written as шоѵд where /oѵ/ in old Cyrillic pronounsed as /u/ as in Greek, or just шꙋд;
No, never was a thing. Early Cyrillic had a bunch of unique letters that are long gone, but it never had "þ" in there.
> The "ѵ" is interesting too
It's from Greek "Y" (upsilon) and it - depending on the place - could've meant either /i/ or /v/ sound (and /u/ when in "оѵ" digraph, so parent comment has it wrong): https://en.wikipedia.org/wiki/Izhitsa
I have to say that the long-s looks quite horrible in Verdana, and probably most other sans-serif font. That makes sense, since most Sans Serif typefaces significantly postdate the abandonment of the long-s.
Roman typefaces with a long-s are a very 17th-18th century thing. A blackletter (gothic) typeface gets you closer, but all of that is still not very much medieval. What you really want is a meticulously manuscript in Carolingian Miniscule[1] or Uncial script[2], complete with killer rabbits[3] and knights fighting snails[4].
Upvoting because this is seriously interesting. I can see old or Tolkien English being a hurdle, so I relate to what you’re saying. But small changes in diction tend to draw me into the world, by othering it from the familiar.
I think it might be partially because I've done & currently do some copyediting work. So I'm pretty trained to see any grammar or spelling mistake. To be clear, it's not the difference in language that bothers me, just words that are spelled differently from American English. I literally can't not see them.
I've found this jarring only in one case: the dub of Mary and the Witch's Flower, where the setting was an emphatically English-style public school, the lead character's VA had (or was using) a strong Home Counties accent, but the lines were full of US-only terms like "Somophore". It just felt horribly fake (even in a fantasy story!) because you'd never see those things together in real life.
People knowing enough about regex replace to use word boundaries are most likely aware just how dangerous bulk replacing is, and will either be super careful, or just won't do it like that.
Troublingly, "blonde" isn't even a British vs. American English thing -- it's a gendered descriptor, of which we have relatively few remaining in English. Blonde is the feminine form, blond is the masculine/neuter form. Whether you want to preserve gendered adjectives is a different question entirely, but the definitions are rather clear, having carried over from French.
Surely, an LLM could handle this, if it were to get fed the entire book in chunks. "Change all mentions of the word 'flat' in its connotation as the British word for 'apartment', to say 'apartment'. Make no other changes to the manuscript." Then have it double-check its work afterwards. "Does every instance of the word 'apartment' in this text make sense? Does every instance of the word 'flat' in this text make sense in American English?"
and then run this prompt on each pair of words that you want it to do. When it's done, run a quick diffcheck on it & check its work to learn of any gotchas.
Surely the effort to manage LLMs is wasted on keeping a publisher from having to do the job they’re paid for.
Are we really going to let big corp automate away their role in the social contract? Ostensibly Random House needs to provide real output not just be a brand that farms out all the work to AI, and thus captures value on the fiat ledger.
The number of rent seeker non-contributors is too damn high. We cannot base society off dying peoples hallucinations about how the world works anymore. This is bonkers.
I never asked to exist. Why am I constrained to beliefs like Bill Gates and the like are divine in contemporary ways?
Why the hell does it have to be AI? A regex would be faster to run, test, and diff check. Even better,instead of paying a programmer to waste your time just pay an actual editor to proof read it
That said, I found myself doing little text mangling tasks with GPT-4 instead of appropriate command line tools, because fuck if I remember the flags to sort and unique, and it's much faster to just describe what I want and have the LLM take a crack at it.
This is why, I think, you'll see a lot of simple(ish) tasks being done by LLMs, at couple orders of magnitude more compute cost - it's just that much more convenient.
EDIT: nevermind. That was about just doing regex replace.
But if you want to do it correctly in an automated way, LLMs are indeed the best tool for the job in the general case, because they understand how language work, in a way that you can't really formalize in code. No, feeding the complete definition of English grammar to the computer won't help, because a) it probably doesn't exist, and b) even if it does, it's merely a suggestion - natural language is a living thing, and is not bound by fixed rulesets.
But the LLM already "understands" the changes that need to be made. A regex to figure out if "flat" means "apartment" or not? Consider the text (which I just made up & I'm really not a fiction writer so I apologize for it being terrible):
"Oh, I live near there too! House or flat?"
Stanley hesitated. These questions were getting more and more personal. Was this her idea of casual conversation? Or was she trying to get to know him personally? Well, he thought, what could go wrong if I treat this like a conversation. "Flat," he said. And then asked a question of his own. "What kind of pop do you like?"
How are you going to decide if "flat" is talking about non-fizzy pop/soda, or a dwelling space with a regex? It's a lot closer to the word "pop."
If you're going to use an ML model, you might as well use an existing language translation model rather than asking ChatGPT or the like to convert words one by one. You'd probably get better results treating American English and British English as entirely different languages rather than assuming one is the same as the other but with a handful of different words.
instead of feeding it into a black box and hoping it does the right thing, how about just paying someone to manually find each instance and only replace it if appropriate. it's tedious and grueling but assuming every other word isn't "flat" should only take a few hours.
You must be american if you think it's that easy. I think american don't realise how brits end up learning both our versions of words and theirs too because so much TV is american.
It's a shame that part-of-speech analysis tools aren't more widely available or widely used. That could at least reduce the damage done on "flat" -> "apartment", even when only considering entire words.
> One general point. A thing I have had said to me over and over again whenever I’ve done public appearances and readings and so on in the States is this: Please don’t let anyone Americanise it! We like it the way it is!
> There are some changes in the script that simply don’t make sense. Arthur Dent is English, the setting is England, and has been in every single manifestation of The Hitchhiker's Guide to the Galaxy ever. The ‘Horse and Groom’ pub that Arthur and Ford go to is an English pub, the ‘pounds’ they pay with are English (but make it twenty pounds rather than five – inflation). So why suddenly ‘Newark’ instead of ‘Rickmansworth’? And ‘Bloomingdales’ instead of ‘Marks & Spencer’? The fact that Rickmansworth is not within the continental United States doesn’t mean that it doesn’t exist! American audiences do not need to feel disturbed by the notion that places do exist outside the US or that people might suddenly refer to them in works of fiction.
This hampered me when playing the HHGTTG text adventure as a youth. My memory is fuzzy on the specifics, but one of the early things needed is to relieve a headache or similar, and a search of Arthur's home would reveal an "analgesic" eventually. Unfortunately you wouldn't be able to do something else until you consumed the analgesic, and eventually you'd lose the game because the home was demolished while you're in it.
I did not know at the time what an analgesic was, despite having a fairly broad vocabulary. If anything, I assumed it was some dirty adult toy; certainly not a thing to be consumed as a remedy for pain.
It was frustrating because I searched high and low for aspirin, Tylenol, pain-killer, etc. but never got past that section until I had a chance to search the Internet for a solution - many years after I really cared to play a text adventure again.
Sometimes it turns out great, though. Wasn't there a scene with an award for "best use of the word 'fuck'" that was changed to "best use of the word 'Belgium'"? That was an improvement.
"So why suddenly ‘Newark’ instead of ‘Rickmansworth’? And ‘Bloomingdales’ instead of ‘Marks & Spencer’?"
Newark (on Trent) is in Notts. Rickmansworth is pronounced "Rick-uth" (not really but it should be) and Bloomingdales would be pronounced something like "Blimmin'dolls", if it ever showed itself over here.
The Adams quotes are from some time ago and we have all passed a lot of water since then. The entirety of en_* are now routinely bombarded with everyone else's TV/stream output in various guises. I might be a Brit but I am now (not really) intimately familiar with Australian and New Zealand police and "Border Force" etiquette and more. I can even hold my own when confronted with a particularly tricky issue in say Nunavut with the RCMPs. I am of course an expert in Texan rangering thanks to a bloke called Walker.
You say tomato and I say fuck that: viva la difference!
-I'll see you and raise by some movie or other with Walther Matthau in it which I saw on the Hallmark Channel a few years ago - in a scene, he asks a kid if he knows Lincoln's Gettysburg Address.
The Norwegian subtitle? 'Do you know where in Gettysburg Lincoln lived?'
Oh, and of course, from Star Wars - 'Luke, this is your father's light sabre' was translated as 'Luke, this is your father's lightweight sabre'
(Someone took a copy of Revenge of the Sith that had been translated from English to Chinese and back, and then re-recorded that dialogue and put it back over the original footage.)
In the Finnish subs for The Royal Tenenbaums DVD, a character who says in English “There’s a dent in the car. There’s one here, too” gets translated as “There’s a dentist in that car. There is a dentist, too.”
A few years after I saw this, I entered the film translation business myself. Generally for anything Hollywood or otherwise big-budget, you can watch a copy of the whole film yourself to understand the context, and you can bill the client for the time spent doing that. Therefore, I tend to suspect that such cases of mistranslation are laziness or a company with an incompetent workflow.
Another classic is a Simpsons episode where Homer shouts „Isotopes rules!“ which in the German dub turned into „Isotopenspielregeln!“ (rules of the game Isotopes)
Translations of foreign media that make the story about something else entirely, are a well-established genre. For example, one of Woody Allen’s earliest films.[0] Granted, this wasn’t done “subtly” at all.
I've often seen TV series translations (both as subtitles and as scripts for later dubbing) done by handing over the text to be translated - without any access to the show itself. So the translator has zero context what is on the screen that they're talking about, for things similar to that Star Wars scene, the translator would have no way of telling whether that "light sabre" is bright or lightweight.
I mean... The first one wouldn't be confusing, if it was plainly called Lincoln's Gettysburg Speech.
So a Norwegian interpreter - who has no obligation to know of Lincoln whatsoever - can read it and reasonably interpret it as a request for an address in Gettysburg, for Mr Lincoln.
> who has no obligation to know of Lincoln whatsoever
Part of being a decent translator is having a knowledge of the common cultural references of the source-language’s country/countries. Films very frequently generally play on local history or previous films or literature, and you are expected to be able to deal with that.
I suspect it was an early attempt at machine translation, or perhaps that the translators are paid so bad there is no incentive to pause even for a moment to evaluate if the translation makes sense in context.
The problem is what you don't know you don't know. If your understanding is lacking, how do you know you're not watching a movie veering into the absurd?
The weirdest number translation I saw was on a package of spaghetti. The cooking time was 8-10 minutes in English, and 10-12 minutes in Spanish. Note that they were both in Arabic numerals, not spelled out. Why do I have to cook my spaghetti longer if I speak Spanish??
I've got a pack of rice that says put in X amount of water and cook until dry in danish, in swedish it says put in 2*X amount of water, cook Y minutes and strain the rice. Double confusion, why different instructions and who strains water from the rice
For food to be kept in the refrigerator Danish, Swedish, and Finnish instructions differ by 1 degree centigrade each. Don't remember the values from the top of my head, something like 8, 7, 6 respectively.
There's a handy rule of thumb for this: is the number of seconds per minute (60 aka 2*2*3*5) a weird, random-sounding number that's way too easy to make clean subdivisions out of[0], rather than a nice, math-hostile power of ten? Then it's probably already in American units, unless it's British.
After careful double blind studies, it was determined that if you directed impatient spaniards to cook for 10-12 minutes, they cooked it for 8-10 minutes?
Exactly 1,000 feet is 304.8 meters, but "or about 300 meters" would have sufficed given the context. "About 1000 feet" already implies not being exactly 1000 feet.
Of course, exactly 6 ft is 182.88 cm, but the precision is unnecessary there too. 189 cm if it was right on the 6 ft mark.
Precision might be an overshoot, but providing SI measurements along the obsolete yet better sounding to NA ear units is quite nice. Since most of the weird units offer no straightforward / memorable ratio to multiply by to get a regular one, so people outside the bubble cannot translate it to anything meaningful.
My dad wrote the gardening column in the local newspaper, and a yearly gardening book, when NZ went metric they updated the book with things like "plant seeds an inch apart" with "plant seeds 2.54cm apart" .....
There's several browser extensions which will do "metrication" for you. I leave one on just for the occasional amusement (although I'm actually more comfortable with US units), and it recently did that to the title of this article: https://news.ycombinator.com/item?id=35539595
To add to the many find and replace issues that have shown up, at one point TSR's style guide for D&D material said that "wizard" had to be used to the exclusion of "mage". So when proofs of a book arrived where the authors had used "mage" they did a find and replace job.
Resulting in such lines as "The tower can take up to 200 points of dawizard before it is destroyed. Dawizard sustained is cumulative." and references to a spell called "Silent Iwizard".
There once was a company in Russia who tried to make their own wikipedia — but of course, they didn't have any resources to build their own content, so in their world, Europe in the middle ages was terrorized by mighty Nordic warriors called encyclongs (w and v are the same letter in Russian language).
If you replace (in cyrillic script) viki[-pedia] to encyclo[-pedia] (because you can't do a whole word replace, since in inflectional languages like Russian every word has many forms with different endings, so you must replace the beginnings of these words to catch all the inflected wordforms) then vikings become encyclongs.
Also dawizard and wizardnta (when a tabletop RPG company tried changing "mage" to "wizard"), "amDanielan" (when an Eric was renamed to a Daniel) and a company that was "formerly in the red, but now in the African american."
I feel like folks in editing or similar roles put their "problem detection" hat on too tightly.
I get lots of advice like that regarding code and UI and the suggestions about perspective problems are often absurd. Nobody has been confused by the thing yet and they're concerned that someone "might" be confused and not able to figure it out themselves so they change some words or UI and ... I kid you not more often than not the solution is the thing that trips up users.
In the above example, maybe if the reader doesn't know what a "flat" is, maybe they'll just look it up or understand by context and they'll be ok?
I don't think they're concerned so much by the people who don't know what a flat is, but by the people who do. There's more than a handful of bookreaders out there who are very protective of any difference between their national form of English and some other national form of English, who will get upset if a local publisher uses the foreign form.
That's exceptionally lazy. When coding, unless the false positive rate is exceptionally low, I just find (without replace) and go through them by hand. How many "flat"s can there be in 1 book? C'mon. You might have to read 20 sentences, oh no.
I developed KeenWrite[1] to make using variables in documents trivial[2]. My editor has no search and replace function. For my sci-fi novel, there's a variable {{location.protagonist.tertiary.Type}} that has the value "Bavarian Village". I could change this to "apartment" and every instance throughout the document prose and reference diagrams would update automatically and contextually. My typical example for explaining the use case is changing the name "May" to "June", but flat/apartment is hilarious.
That definitely must've been early. I would've thought Gaiman of all people knows how easily English humour can get lost in translation. Often even when not changing the text
the irony being that many readers probably attributed this to neil's sense of humor. i would have. "he said apartmently? what? aaah ... now i get it - haha - good one!"
A couple of years ago, Turkey/Türkiye had a campaign to get people to use Türkiye in English. At that time I flew Turkish airlines, which had a promotional video about this, and with all mentions of Turkey in safety cards, magazines and the seat back screen changed to Türkiye. Then when browsing for a tv show to watch on my seat back screen, I came across an episode of Everybody Loves Raymond with a description something like “on thanksgiving day, Raymond burns the Türkiye“
That's the story of my life, unless you speak French you will probably pronounce my name wrong, no matter how many times I repeat it and help people pronounce it, I don't get mad nor think it's rude, it's kind of amusing actually. Once I realize they can't handle it I just tell people to call me by my last name which is very easy to say in English. There are too many interesting things in life to get hung up on a petty detail like a name.
Hey if you write your name out as LAST First (which I see many French people do) then you might not even have to ask people to call you by your first name.
Most of the time (and all of the time with gendered pronouns), you don't use it to address me. You're using it to talk about me. And then, it's really between you two what you call me, isn't it? Of course I would be happier knowing you didn't refer to me in a rude manner, or as something I'm not, but I believe in privacy too, so it's really your business.
Turkey isn't rude, and it's usually understood from context that we don't refer to the bird (the Turkish government also push Türkiye on countries where the native word has no bird connotation). We used to translate all names, and that's understandable because names are often unpronounceable or otherwise violate grammatical rules if you just blindly drop them in a different language. I think it should be fine to use "Turkey" when talking to another English-speaker.
It's a bit more complicated than that. It is also quite rude of you, where you not to not accept that other people write using other letters have have other abilities in terms of what phonemes they can pronounce.
I see a distinction between mapping a name more or less faithfully to the sounds and spelling of a language and coming up with a completely different name. For example Brazil in English is not the same as Brasil but is fairly close and fits the language. Whereas IDK where Germany came from.
I'm not a fan of this modern idea that you should get to dictate how others refer to you. E.g. I don't think it makes sense to refer to China as Middle Country for anyone not from China.
Sometimes, sometimes not. Everybody changed Peking to Beijing when China requested that. But living in Netherland, I shake my head at all the languages insisting on pluralising the name of my country. And that's not even addressing "Dutch".
I'm 100% opposed to telling people how to speak their language. I don't even like telling people who use English as a lingua franca how to use it in a culturally "correct" way.
Normally, I'm outspoken against most "domestic" proposals to change words, since the rationale and implementation are usually very poor.
But as an American, I've decided that Turkiye without the ü is more desirable than Turkey. It resolves an annoyingly ambiguous search term, doesn't change anything about how it's pronounced, reads the exact same way, and it's trivial to switch how I write it. It really is a superior design with a painless transition.
no one's gonna type the "ü". this new name will either not get adopted all get adopted in wrong ways, like people using turkiye, adding more to the search fragmentation.
> TRT World explained the decision in an article earlier this year, saying Googling “Turkey” brings up a “a muddled set of images, articles, and dictionary definitions that conflate the country with Meleagris – otherwise known as the turkey, a large bird native to North America – which is famous for being served on Christmas menus or Thanksgiving dinners.”
Ha, so yeah it seems like that's definitely a part of it.
Idk what's happening to the Windows/file explorer teams, this is embarrassing.
Another issue I've had in newest file explorer, (probably a bug, unless they're trying to get rid of ".txt" files), is this one also reported on Twitter by "MittringMartin":
> Idk what's happening to the Windows/file explorer teams
I'd say whatever it is, it's been happening for a looong time, considering they've still never fixed the glaring bug in previous versions where the down arrow takes you to the second file in the list. Or where you go to save a file (save as), and you put your cursor at the end of the file name to rename just part of it, the cursor randomly-but-not-always jumps to the beginning of the file.
I really don't think this is a bug. The first element is focused when you start, but not selected. Pressing down moves the focus down, and selects. If you want to select the focused element, press 'Space'.
This collection-/tableview technique is known by maybe 0.1% of users and used by much less. You can also alt-arrow(?) to move the cursor around and select more elements with space. Never seen anyone use that. Instead of clinging onto this cursor nonsense they’d better finally made it work as (no selection, arrows select next/prev, [modifier+]mouse selects as usual) and get rid of a cursor. And make a proper tree view. Everyone does it this way except explorer.
What percent of users navigate explorer with a keyboard at all? I'd hate to lose keyboard selection over this. Ctrl and shift do different things and I use them both.
Every once in a while, navigating with the keyboard in Explorer will cause the window thread to hang up in a busy loop and I have to kill it. I have no idea if this is an Explorer bug or caused by other stuff.
7-zip's GUI is basically an incomplete WinRar clone, and WinRar has a couple more features. Most notably recovery records to be able to recover from mild bit rot, but it can also preserve more NTFS attributes of your files (saving alternate streams, security records, turning hard links into links) and can optionally respect the Archive attribute of files (archiving only files with the attribute set, clearing the attribute after, optionally deleting them).
It's a very solid archival program. I wouldn't recommend that everyone runs out and buys it, and I use 7zip more than Winrar, but it does have some advantages, especially on Windows.
Then call it slow development, not stopped. A stable a year ago and a beta a month ago is more active than most software out there, for which development literally stopped.
I've been using WinRAR for multiple decades and it works well. I'm not about to rearchive all my stuff with a new format so of course I still use it regularly. If I were a new computer user I might choose something else as I certainly experimented with various tools back in 2000. WinAce comes to mind as a decent competitor of the time.
Oddly enough, I ran into an odd case a few months ago with a zip file which required me to use winrar.
In windows, when you extract a zip file that contains japanese characters, they get 'garbled', which can cause problems if you need to maintain the directory and file names. I tried with 7-Zip as well with the same outcome.
I found a fix on stackoverflow [1] which mentioned using winrar, as it had an option to change the name encoding for archived file names. Using that I was able to extract the zip and the files and directories maintained their original japanese names.
infozip also has problems with japanese characters. I ended up solving the problem with the python zip module. considerably more awkward to use but it offers a lot of control over the extraction process.
I did not look it very closely so I don't know what exactly infozip was getting incorrect in my case. but I did find this interesting bug report from 2012. Apparently a lot of encoders are sloppy about the spec and will leave header fields zeroed rather than set them. and if infozip reads a header that says the zipfile was created by dos(a zero) it believes it and extracts it using a dos compatible encoding.
- "The name "zip" (meaning "move at high speed") was suggested by [Phil] Katz's friend, Robert Mahoney.[4]. They wanted to imply that their product would be faster than ARC and other compression formats of the time.[4]"
The term ZIP is an acronym for Zone Improvement Plan; it was chosen to suggest that the mail travels more efficiently and quickly (zipping along) when senders use the code in the postal address.
And for anyone who hasn’t heard of him, look up “mr zip”! The smithsonian has some great material. I’m hoping to create an online archive (hah) of mr zip materials at https://mr.zip sometime soon
I think the zipper iconography was used by pretty early versions of pkzip, so it was definitely on the mind of the format's creator. Of course the more common implementation WinZip (these days from Corel but I assume it was acquired, not sure of the history) went the C-clamp route, which I think it shared with WinRar.
Like other commentors, I always thought it was an allusion to zipping up some sort of storage or container, like a suitcase - zipping would still be useful even if it didn't compress, as it's a way to send a folder as a single file.
I could've sworn it was named for the authors - took a while to find with nothing more to go on, but I was thinking of Lempel-Ziv (and I suppose Ziv became ZIP, sorry Ziv). I must be using a lossy compression for memory.
No, it's a kilozip, like kilometer (= 1024 meters). A gigazip (not gibizip) would be 1'073'741'824 zips (1024^3 = 2^30).
Also it should be Gzip, not gzip, although it's not ambiguous since the reciprocal prefix is nano-, and in general there are no 2^-X0 prefixes that start with g- like there is for Mega-/milli-.
When I worked at Microsoft as a localization lead, we had actual human beings that would look over this stuff before calling machine translation good (of course, we had actual human translators back then, too). But that was a long time ago, back when Microsoft had human software testers, too.
And what options are available when I right click on a .zip/.postcode file? Say, perhaps I wished to uncompress it?
I think the worst example of this overzealous localization was when some person translated the VB(A) reserved keywords to the appropriate language (I want to say it was german or swedish). Happened in the 1990s.
Your programming language reserved keywords changing depending on your selected language was a big facepalm.
Searching for this problem on the web doesn't produce any valid relevant results, so the web has forgotten or maybe the horror has been plastered over.
VBA is still localized in many languages, I saw macros in italian.
It's not just VBA keywords either, even excel functions are localized. e.g. this is from microsoft's docs[0]
So obnoxious, although apparently in newer versions you can modify what language you want.
I remember seeing the dictionary to translate between German and English function names[1], one time I googled for "Excel German translation" and Google proudly showed the Translate interace pre-filled with "English: excel / German: übertreffen"
The feature itself isn't that crazy – it's even a good thing, especially considering that Excel (and to lesser degree, VBA) were designed for ordinary people and that in the 80s/90s English was less common than it is today.
Where it went wrong is that as soon as you choose a language you were stuck with that. It should have been a display option.
The missing part of that story is probably the file is a .xlsm (or even an older .xls with macros) and therefore has VBA scripts, which is not saved in a language-independent way.
Probably someone manipulating with formulas with VBA and improperly using FormulaLocal property. Formulas are stored in language neutral way and it shouldn't be a problem under normal circumstances.
There was a famous translation mistake back in the days of IE 5 for Mac or so where in one of the settings screens they translated the country code for Norway "no" as "nej" (literally No) in Swedish
I get that in the latest version of the Panasonic aircon apps. They have the term "No." (with ha dot, as the column header for the column "number" in a table). So in Swedish the app says
Nej.
1
2
i18n is HARD. But it's not THIS step that is hard. This is day one of i18n school. You can't translate from english terms. Just stop translating things from english, translate from a neutral key!
Since translation is given less and less thought, the underlying system is also becoming more basic and less context-aware.
I don't remember exact occurrences but I've noticed quite a few similar problems on macOS that are just the result of words being translated out of context.
On a popular forum software I used to do support for, "flag" (for moderation) was translated in French as "drapeau" for example. Which is technically correct I suppose, drapeau does mean flag, but the wavy kind of flag that countries like to use as symbols. Nothing else. And it's not a verb.
In this case, there is simply the string "zip" somewhere that is used both in some address forms and for the file type, and the translator just happened to translate the first occurrence probably without even knowing that is was also used in another context as well. Or without any means to ask the dev team to separate them into two terms. Which might not even be possible if they used too simplistic a translation system.
Other localizations of Amazon are just as hilarious. And to understand the product listings, you have to know (pretty complete) English, because the translations only make any sense if you know the English description and can reverse the translation in your head back to English.
So, you have to understand 2 languages to understand these listings.
The programmers and the translators are different people who have probably never met eachother.
Translators usually just get to see the string they are translatinbg, and if they're lucky a 1 sentence explanation written by a programmer.
With no context, a "Zip file" might indeed mean a "postcode file" - perhaps it is a file for storing all the zip codes, and now it will be used for storing all the postcodes?
To be fair, translation systems genuinely are a pain, and it's easy to make a mistake either as a developer or as a translator who has thousands of strings to translate.
It doesn't help that a lot of developers probably never translated an application; when I added translations the first thing I did was translate it myself, but that's not really feasible for larger systems and/or with people who don't speak a second language well enough to actually do translations.
They can't work on the per-word basis, translations only works for long enough/large enough texts and images. Words("zip") and partial sentences("expand zip") are garbage in. Out comes "postcodes" and "elaborate addresses" or whatever the PRNG chooses.
We had an almost identical bug, also at Microsoft, when I was working on Hotmail (it may have been Outlook.com by then, I can't remember): POP, the email protocol, was localized into UK English as DAD.
I find POP->DAD a little odd, because I've never (as an American) heard anyone say 'pop' referring to their father. 'pops'? yes. Now, POP->SODA, would not surprise me in the least.
“Pop” gets used in the UK sometimes but I’ve only heard it in northern England. Never heard Soda anywhere in the UK though, apart from the specific case of cream soda.
In Aus I use Pop to refer to my Grandad who was british. I do think it was to help with confusion with which grandparent I was referring too, was it my dads dad or my mums dad
It's an older generation thing. See Pop Tate in Archie comics [1], for example; the soda fountain guy. Guessing it's probably derived from Papa, like Pa, Paw (hillbillies), etc.
Vaguely related, I’ve noticed before manuals in which tech writers provide expansions of acronyms - but they don’t realise that in the context of that manual the acronym means something different.
I remember one z/OS manual saying AIX stood for “Advanced Interactive eXecutive”-which is true if we are talking about the Unix, but this manual was talking about a VSAM AIX (Alternative IndeX) - a secondary index for the VSAM flat file database system. Another example was USS being wrongly expanded as “UNIX System Services” when in the context it was actually a reference to VTAM’s “Unformatted System Services” (the part of VTAM which handles the initial LOGIN command, I think “Unformatted” because it is plain EBCDIC not 3270)
In Windows, there is an account flag called “UF_MNS_LOGON_ACCOUNT”. A lot of people claim “MNS” stands for “Majority Node Set” - even one MS doc - see MNSLogonAccount in https://learn.microsoft.com/en-us/powershell/module/activedi... - however, while it is true that’s what the acronym stands for in the context of Windows clustering, I’m pretty sure in the context of Windows user accounts it actually means “Microsoft Netware Services”-this flag was used by Microsoft’s 1990s era software which enabled Windows NT to pretend to be a Netware server, and the unrelated Windows clustering “MNS” has never used that account flag. “Microsoft Netware Services” was renamed to “Microsoft Services for Netware” (MSN-but not that MSN!-hence MSFN or SNW), no doubt for trademark law reasons, but the name of this flag got stuck with the original acronym. I don’t think it actually does anything unless you have that old stuff installed, which probably doesn’t work on newer versions of Windows. I suppose that being an effectively disused flag, there is nothing stopping people from stealing it for their own purposes, and probably someone out there has.
One of our sites uses the word "hopper", and I was once told that when a client asks what it means, our UK people tell them it's an American term, and our US people tell them it's a British term.
That’s funny. It’s definitely an actively used term in the U.K. though I can imagine people not knowing it. Outside of heavy industry I’d associate it with centuries-old flour mills.
I’ve found that often when my fellow Brits complain of a word being American it’s actually just an old English word we stopped using.
The very first thing we do, when we create a new e-commerce website is to change "cart" to "basket" everywhere we can find it. Without that - as our manager says - we wouldn't sell anything on UK market
because most people don't care that much, i've lived in the uk my whole life but my english is extremely americanized due to my use of the internet growing up
Another British English oddity is with Mac OS. Classic Mac OS had a British version until circa Mac OS 8, which called the Trash the Wastebasket. But the icon wasn't changed, so was still visually a dustbin. British English returned at some point with recent Mac OS X versions, but now the icon really is a wastebasket, they called it the Bin.
(If anyone has this British English Windows 11, is it still the Recycle Bin, or the more British Recycling Bin?)
Recycling Bin is the normal name in US English too. I'm in the US and when I search google for "Recycle Bin" (https://www.google.com/search?q=recycle+bin), the results (in order) are
Dustbin is also UK English, and I (Canadian) don’t actually know what “visually a dustbin” looks like. In Japanese/English spoken in Japan, “dustbox” is also a term sometimes used for what I would call a small garbage can.
Do they also localize the folder path? It’s “./Trash” in US english
A dustbin is I believe called a trashcan in North America, ie: it’s what the Classic Mac OS Trash / “Wastebasket” icon shows. A (usually) metal cylindrical container for rubbish with a lid, usually kept outside. Nowadays in the UK mostly replaced by wheelie bins…
It's still internally .Trash, as that isn't user visible I suspect it's the same for all localisations of Mac OS.
On Windows, I think it has always been "Recycle Bin" in any English dialect since the beginning. I guess that leaves a question I never thought of: what do the British call a recycling bin?
I've honestly just given up using British / Commonwealth English localisations in software. You run into constant little bugs and annoyances, and it's not like I can't decypher the US English version.
Please always remember to let your users change regional settings separately and individually for currency, measures, first day of the week, etc.
ZIP Code was a service mark owned by the USPS. The generic term is a postal code. I've personally seen more and more software use the term postal code, especially anything that handles international addresses.
If you're American, who are generally seen as incredibly parochial (especially by the British), then that may be the case. If you're British, then the colonies really could be any of 75% of the world.
As a Brit, I think it's a tad unfair on the yanks to see them as that parochial. As a test, ask any Brit moaning about the NHS and railing against the US private system (these complaints often come together from people who would never consider themselves parochial) whether the health services of Germany, France or the Netherlands are private. They won't have the first clue.
Just shows you, all that consumption of worldly and international news by worldly and international types remains strangely parochial, and that's just one example.
There's quite a few things that are considered "Americanisms" now, but are actually British. After the American revolution, British English continued to change, but many old-timey Britishisms remained frozen in American English.
There's also two forms of "British" English, with the less common being Oxford English, which uses ‑ize instead of -ise for most words https://en.wikipedia.org/wiki/Oxford_spelling
Sure. "Soccer" is one. It started in England as college slang. "Soc" short for "association" from "football association", and "er" added as a jocular formation. That guy is a soccer. Rugby players were similarly called ruggers.
Also, more vague but fascinating, is that the American "southern accent" (quotes because there isn't a single southern accent, but most people think of a specific one when they think "southern accent") is largely a "British accent" (quotes for the same reason), but British circa the 1700s.
This is actually a fascinating subject, well worth looking into if you're interested in English language history. Also interesting is that the division between American and British English has been growing softer over time, and more and more modern Britishisms are being used regularly by Americans (and vice versa).
The Southern accent in the US did not emerge until after the Civil War, replacing a diverse array of local accents. It came from the Appalachian South which was settled by the Scotch-Irish and Germans, none of whom had English accents.
It's quite funny and depressing, how one of the youngest and wealthiest countries on Earth has already fundamentally forgotten huge chunks of its history - after barely 250 years. Ten generations are clearly enough to lose a lot of data.
Whenever I hear the "British accent of the 1700" I just don't really believe it. There isn't a British accent now. There are many many different accents, all wildly different.
Right, as I mentioned. Same with American or even Southern American accents. There are a wide variety of distinct ones on both sides of the pond.
The various accents do tend to have features in common, though, so you can hear that the various southern accents are part of the same family, and similarly with the various British accents. There are, of course, exceptions to this as well. This bit of linguistic history is really referring to a fairly specific American accent and a fairly specific British accent.
Rhoticity is the big one I know. Parts of Britain started dropping the R before 1776, but it became more widespread after that. The US port cities (most notably Boston, but it was also a class difference) had enough contact with Britain that they dropped the R too, whereas the rest of the US kept it.
I don't think anyone would consider rhoticity as especially American considering several dialects across the UK, ireland & canada are rhotic. It is just another way in which english varies globally.
True, and I didn't mean to imply it was. I was just thinking of things Americans say because of the British that the British no longer do, wasn't thinking America-exclusive things.
"Tire" and "curb" were once the normal spellings in the UK; "kerb" is an innovation whereas "tyre" is either an innovation that is coincidentally the same as an archaic spelling, or the restoration/repopularisation of an archaic spelling. The spelling "kerb" upsets me whenever I see it because it's clearly referring to the curvature of the kerb, but fortunately I almost never see it.
Likewise, spellings like "programme" are deliberate changes to mimic the French spelling. These have been rather more successful than they have any right to be, but some have completely failed (like "gramme") and a lot of people still use the older spelling.
-ize, also, used to be the standard spelling with -ise an alternative also found in the UK. In this case, it's clear that an understanding that -ize is used in the US and -ise is used in the UK became an understanding that -ize is the US spelling and -ise is the UK spelling which raised its currency. But I think they both remain in use in the UK (-ise has more-or-less chased out -ize in Australia though).
-or spellings like "honor" and "color" were once much more common in places where they are now rarely seen and vice versa. To an extent they follow the same story as -ize/-ise, with the US standardisation of one chasing its use out in the UK. In Australia, -or was much more common (than now) when the power to distinguish oneself from was the UK, but now that the main power one needs to distinguish oneself from is the US, -our has chased it out except in the name of the Labor party (because the paperwork was filed by someone who happened to prefer the shorter spelling in a time when both were current) and some uses of "honor" that are literally etched in stone. (The last general use was until about the year 2000, by "The Age", a Melbourne newspaper which used -or as its house style, but by then it was seen to be improperly American and they switched to -our.)
Generally, spellings and spelling variations remain open and subject to gradual change in all English-writing countries.
Gotten: “English speakers in North America preserved gotten as the past participle of got. Outside of North America, the shortened version [got] became standard.”
Modern French has had little influence on English. As I understand it, the -our forms are evolutions from the Norman French spellings that introduced these Latinate words into English after the Conquest. In the Renaissance, as many words made their way into English directly from Latin, there was something of a desire to Latinize the spellings of these words to their original -or forms. This had some traction in Britain (eg, horror, tremor, governor) but really took root in America.
My favorite is when google translate translates the Chinese word for product/item/thing as "baby." You shop for babies, babies go on sale, here, have a coupon for $2 off any baby! If it doesn't work out there's always the "return defective baby" button!
This reminded me of the song "All that she wants" by Ace of Base. The mistranslation of the word "baby" makes the song sound like it's about a woman who wants more kids, rather than a woman who wants a lover.
Another very ironic example of that is this recent Reddit post [1], where the name "Nasser" was censored in-game to become "N***er", making it look much, much worse.
Probably an issue during localization, where someone saw "zip" and assumed it meant "zip code" and so "translated" it to "postcode". This is a perfect example of why its important to also supply the context instead of just the raw strings you need translated!
It also brings to mind a translation issue from my home country, wales, where road signs must be bilingual (english and welsh). A request was sent to a welsh translation service asking for the translation for a specific phrase. The signwriters received the response, completed the sign and it was then erected. The problem: it was an out-of-office auto-reply! https://www.snopes.com/fact-check/mistranslated-welsh-traffi...
I dont think you understand how localization works. They have a localization file and they send them off to a translation service. The translation service goes through the file and translates the individual strings (or string fragments).
Either the translators made a mistake, and thought it was referring to a ZIP (regardless of capitalization) and translated accordingly, or a developer used the wrong key when assembling the string references - i.e. he used the equivalent of (this is pseudocode as I dont know how they handle localizations):
localize("CompressToArchive", localize("Zip"), localize("File")) - i.e. with a reference to localization of "Zip" (or ZIP, or zip - the dev likely just searched for a string that matched what he wanted to localize)
instead of
localize("CompressToArchive", "Zip", localize("File")) - i.e. with a string of "Zip"
where the strings are defined as:
CompressToArchive: Compress to %1.%2 (same for us and uk)
Zip: ZIP (us) or postcode (uk)
File: File (same for us and uk)
This kind of mistake isn't that amazing, it's actually very common in translation because translators often don't get enough context - they just see a string with no comments.
Yes? People have been joking about Microsoft's buggy software for longer than some of my friends have been alive. Backwards-compatibility is a hell of a drug; combine it with "every time my computer changes, it's really hard to figure out how to use it, so I don't want to change OS", and it's unstoppable.
Yes, there was a reorg in 2014 where they eliminated the role of 'software developer in test', the justification was something something testing as a separate function is slowing us down and we want to put out more releases. Or something. Supposedly testing responsibilities didn't go away, they were just moved to the software developers, but my lived experience is quality went down, so I'm convinced that testing isn't happening as much. Certainly windows mobile 10 had much less polish than any windows phone >= 7.5 release. Of course, design changes I don't like can be hard to separate from some of the things I think QA would have found.
It's the right thing to do, for Microsoft. Their users aren't going to leave them, no matter what, so why waste money on QA staff and testing? It's better for the company's profit margin to just let the users deal with those problems, and the users agree since they're happy to keep using MS products and paying for them.
Reminds me of the time I used my French bank card in a ticket machine in the UK.
The machine detected it was a French card and switched the interface to French (fine).
It then asked me to "entrez votre broche" as a prompt for my PIN.
In the context of electronics "pin" (as in on a component) translates to French as "broche" but it makes no sense for a pincode!
The scope and scale of translation work has always seemed daunting to me.
I can’t imagine a way to make it any more efficient, without sacrificing significant accuracy by missing context, than to do manual translation of everything.
I can’t imagine if, every time I added a button to a UI, an entire team of localization translators mobilizing to make sure the right context came through in every language the UI supported. Not to mention the tools that support that workflow of passing context to translators, compiling the translations, and binding them all into a data structure I can use to populate my UI.
And that’s just for language. Iconography and colors have their own localizations.
Screenshots are all you ever need. Just give a screenshot or terminal output to a bilingual SME and have it double checked. It’s not like translators must be given sweat and blood of engineers just for their motivations.
A screenshot of just the dropdown menu could have been generated in CI and could have easily prevented the “postcode file”, so long the translator was presented that image along the string and had recognized that Windows dropdown.
How does that even happen? I mean for localizing software - any software - you need some kind of Lookup key for each resource and then you can do a lookup for that key and a language, to get the resource (An image, a text string, whatever). So the function is (key, language) -> resource. E.g. ("filetype:zip", "en-US") -> "Zip file". Since any term can be used in multiple places and words can have multiple meanings, you can't localize software as a mapping from one language to another. Especially not english.
You can add context this way, but programmers often forget, translators don't look for context, and seemingly no one reviews the final product in other languages.
That's exactly my point: using gettext and hoping to hard-code one language (typically english) as the "key" language and then sending it to translators who will try to map context-less strings to a different language is just not a good way of localizing software. I think the key design flaw is that it's the best solution to an unviable problem: the idea of adding i18n to software as a simple transform from one language to another (i.e. an afterthought).
The proper solution again, is using keys that provide the context. A special syntax for these neutral strings (e.g. prefix, uppercase, whatever) will quickly show where a translation is missing. Translating is then a mapping from keys to english, or keys to french and so on and never english -> french, even with gettext. Instead of "Zip file" you'd have to hard code ":filetype_zip".
I have worked with volunteer translations for Valve using the now defunct Steam Translation Server (and Crowdin very rarely nowadays) and discussing problems like these with other people from the community, the mods agreed that this is a lack of context issue.
Then, I started playing World of Warcraft and realized how bad the localization is. I mean, the translation isn't bad per se, I really appreciate that they tried to adapt the game to my country's culture and that is totally awesome.
But seeing Blizzard pay a ton of money for some people that don't even play the game to verify if their translation is correct or makes sense is just astonishing to me. When Blizz released the new UI for WoW, there was an option called "snap" which meant "snap to grid". The amazing translator team, having no context and not even the decency of checking the context in game, translated this option as "estalo", which, in portuguese, means "snap (sound)", as in a finger snap.
Another example is that when they added minimap sizing to the UI editor, there's an option that allows the name of the zone to be below the minimap. The brilliant translator team translated it in portuguese to "abaixo cabeçalho" (literally under header) or something very silly which you could literally see that they put no effort in the translation.
Playstation used the equivalent of "Store/archive/stock 20%" on the PSN Store for years in Norway instead of "Save 20%", because they translated Save without realizing that we have different words for saving money and saving things.
Yup. Every graphic designer I know around the world uses Photoshop in English, not in their native language.
Every Photoshop user knows the difference between the image and the canvas, between layers and channels and levels. They learn what dodge and burn mean.
But nobody can even guess what arbitrary terms a translator will use for layers and canvases and channels and levels, or what terms printers used for dodging and burning back in the day.
Not to mention most Photoshop tutorials you find are in English as well.
And to someone who hasn't done touch-up work on film, the terms "dodge" and "burn" have no obvious connection to lightening or darkening an area, so even native English speakers need to learn what the terms mean.
Russian translations tend to be quite robust even for open source software - large user base who are used to translation being there and of acceptable quality.
MS was the first mover here by having a robust translation of Windows 95 from the start (and yeah, they did translate the start button).
Open Source translations tend to be better in general. Specialy for smaller languages like Estonian. MS uses pretty horrible direct machine translations here that usually don't make sense.
Russian seems to be better than even English at grabbing words wholesale from elsewhere. Maybe that makes translations easier? Probably not, but it's a fun thought.
My favorite example of that was a sign at an airport. Marking a spot off for taxis and rideshares. The latter was spelled out phonetically in Cyrillic. I was a little surprised they didn't use the Russian words for "ride" and "share".
A big, and already mentioned in this thread, issue is when ignorant people start to reimport words from other languages (English these days), and that catches up to some extent, even though educated people are using some proper term for decades or centuries.
You may end up having multiple conflicting ways to describe the same thing or even a person. A few days ago I saw Hades transliterated as Гадеc in some web comic translation, even though he is usually called Аид for many centuries already.
Most people aren’t proficient enough in English. And even if they are, many would prefer to see their native language, just like you would prefer to see a nice GUI instead of MS-DOS.
(Though I personally find most Polish translations of software abysmal, especially the translations into Microsoft Polish, and use US/UK English everywhere.)
I haven't worked with MS products in ages, especially not in German. But I sure remember the gruesome "Schaltfläche" which unfortunately became the industry's standard translation for "button". And please don't forget about the "Eingabegebietsschemaleiste".
These are localization issues. If the GUI was done in another language then US from start it wouldnt be an issue. Of course sloppy work is bad, doesnt mean it’s inherently bad
There are so many other ways this can happen that it's really not surprising this happens. Like: started 20 years ago with the UI toolkit in English and a junior dev hence fixed size then suddenly got a growing customerbase in another language and slammed on a translation only to realize it would be a pretty large effort to track down and change each label to resize. Or even just: things adjusting size can simply be a big no depending on software (and you'd use truncation+toltip), because one doesn't want to waste space and/or keeps things aligned so it's easier to learn where everything is. Your browser tabs probably don't adjust size to text, just to name one thing.
Alot of UIs just can't deal with the size of a box being resized, so they instead adjust the size of the text, but for languages like Arabic and Japanese this can easily make text unreadable because characters often rely on finer details to be understood
I was going to say since the invention of ASCII in 1963, but there were encodings before it and arguably you could say since the invention of Morse code in 1844.
There are keys for locks (Schlüssel*) and keys on keyboards (Tasten*). Guess which word VMWare uses for its special keys menu in the *German translation. :-)
All software is buggy. By translating it more bugs will be introduced. So I try to avoid using any translated software.
I remember in Windows 95 (that's been a while) a little binary, probably ipconfig.exe was broken in the Finnish version we had on our home computer. I copied the binary from the English version we had at work and it worked.
Windows localizations have been getting worse and worse over the past decade and a half. And now apparently they are further degrading to automated translations? Who could possibly think this is a good idea?
>I like a clean install and ripping out all the non-english languages takes a while. I also trawl through programs and extensions and get rid of the unnecessarily installed [for me] languages.
Additionally, any os images - backgrounds, ms logos etc get reduced to 10*10 pixels. I don't need eight versions of the same wallpaper. And ss for themes...
What's very funny to me is that there is a very important "file" called the postcode address file in the UK, which is Royal Mail's mapping of postcodes to addresses.
I especially object to the case of zip-code being used internationally, because this isn’t a case of a sensible alternative usage, but rather one very specific US term being interpreted as universal.
Please, for internalisation sake, use the generic term ‘postal code’ in English , then regionalise as appropriate (e.g. Post Code in the UK, Eircode in Ireland, Postal Index Number in India, Zip code in the US, etc.)
Not really? Windows 95 pre-IE4 has no HTML stuff in the shell, so the core Control Panel UI and applets dating back to then are native. The main HTML bits in Explorer of the era once it was glued in was the waste of space side bars. I guess some newer applets might be HTML, but the main ones weren’t.
If only that was on the cards. MS really needs something to give them a good kick up the arse so they stop pumping out crummy software. Sadly I dont see it happening any time soon.
Fix all the bugs they introduced by trying to rewrite stuff that worked fine like the taskbar, stop changing functionality that works like the taskbar, remove the telemetry and need for an MS account, and give back users control over updates.
Windows is a goldmine of horrible translations. Since the introduction of the Windows Fax & Scan tool the dutch translation has the translation "Zoeken" for the button to start the scan. This translates back to "Search". In dutch we actually just call it scan, just like in english.
On the two Windows 11 machines that I have access to, one Pro and one Home, and both set to UK English, the file compression option says "Compress to Zip file". I've never seen "postcode file" in the Windows UI.
It's odd that apparently nobody on HN bothered to personally check this story, except you. (Is a lesson somewhere for the coming wave of AI fake news?)
I wonder if we could get a sane Windows experience if ReactOS was a little bit more advanced in its drivers support. Microsoft's incentives with Windows are clearly not aligned with the customers'.
I've seen a lot of translations breaking lately also in Windows 10 - I guess someone is just eager to rework a lot of things that have been working perfectly to get a raise.
This is the most 2023 answer possible but I wonder if LLM translation will help handle this sort of situation better.
An LLM first translates to tokens in more of a "concept space", then translates to other languages. So it would translate "zip file" to something completely different than "zip code", and translate to UK English correctly, or at least better I would hope.
They may not solve all problems, but maybe this one?
There's like two main ways this kind of mistake happens.
One is a programmer using the wrong string ID. Probably a mistake when searching a translation file/db.
The second is missing context. The word "zip" has multiple meanings in software, so you need more information to disambiguate when translating. "Zip code" and "zip file" is probably sufficient, but you could imagine them also having a templated translation for file types like "{} file".
I could see an unspecialized LLM translation working well for whole sentences or paragraphs, because that kind of text isn't really likely to run into either of these problems. And if you have the context you need to avoid the second problem for small snippets, then human translation is also pretty unlikely to fail.
So, such LLM still would have to be informed that this is referring to a zip compressed archive, not a postcode file, whatever such file might be. This is basically the same thing as LLM hallucination, in that it comes from lack of constraints in latent space, if I dare use that buzzword.
Without context, a “zip” or “zip file” still can be anything. Could even be “zip [this] file” as a command.
This is, in fact, the task that transformers (the technology currently branded as "LLMs") do well. It was the task they were invented for: https://arxiv.org/abs/1706.03762
Why would you even localize an American operating system for the UK?
Most of the concepts in an OS are named by Americans.
If I were using a UK-made OS, I'd happily live with "dustbin" rather than a "trash can". Unless you have your head inserted in your ass, you know some UK words if you're North American and vice versa.
Well, my take here is that English language should be renamed to "American", and Brits should be forbidden to speak it, because they make it sound ridiculous.
Honestly, speaking as a foreigner, who has been learning English one way or another for half of his life, I cannot even attempt to speak in British accent without feeling like I'm doing a stand-up comedy. How can Brits speak with a straight face?
"He looked out over the apartment landscape"
and
"'Come with me', he said apartmently"