Abstract Wikipedia is in my opinion fully wasted work. Translation is free and instant for web pages. I've lived for 6 years in different countries where I don't speak the local language (and am also not native English speaking) and you can get all the information you need by translating. This works totally fine already today with Google translate on top of pages.
And the pages that are in fact missing from "the other language wikis" are local myths, local detailed history, things that wouldn't even be in the English Wikipedia or in the "abstract" version in the first place.
And also very often quite incorrect, and you don't know where.
I think the general idea of a "universal language" Wikipedia, that gets flawlessly rendered into local languages, is laudable.
But I don't think anybody would ever edit in it directly -- what I want to see is that when somebody edits Wikipedia to add a new sentence, it attempts to translate into the "universal language" and prompt you to select from ambiguities.
E.g. if you wrote:
I saw someone on the hill with a telescope.
It would ask you to confirm which of the following was intended:
[ ] "with a telescope" modifies "I saw"
[ ] "with a telescope" modifies "someone on the hill"
It would be a real dream to have translated outputs that were guaranteed to be correct, because the intermediate representation was correct, because the translation from someone's native language to that intermediate translation was verified in this way.
I would still invest those resources into documenting more knowledge that currently doesn't exist online on their original languages and immediately translating to English. For better or for worse English is the "abstract" representation of language online and there's so much absent stuff that worrying about another universal format seems pointless.
It's not either/or. Different groups of people can do different things at once. And of the two things you're comparing, one is expert technical/engineering and the other requires expert archivists/translators. They're totally different groups.
This works totally fine already today with Google translate on top of pages.
How would anyone even know? By definition, if someone is using Google Translate, he already doesn't know the language, so how can he judge the quality of the results?
My company spends millions on professional translators because products like Google Translate are so bad for anything beyond the most basic uses.
This is wrong on two counts: 1) translation is not the same as abstraction and 2) having the world's encyclopedia translated by an advertising company is not exactly everybody's idea for how things should be organized
Of course wrong criticism doesnt mean the project is a success (i think its been going for a few years now). The documentation in particular does not highlight what this infrastructure is good for.
Denny Vrandečić — the lead developer of Wikifunctions, former Germany PM of Wikidata, co-developer of Semantic Wikipedia, and former member of the Wikimedia Foundation Board of Trustees — also helped develop Google's Knowledge Graph from 2013 to 2020. None of this is hidden, it's even on his Wikipedia article.[1]
The "having the world's encyclopedia translated by an advertising company" ship sailed years ago. All of these projects are supported, directly and indirectly, by exactly that motivation. The ultimate goal of commercial enterprises is to take zero-cost volunteer projects like Wikipedia and OpenStreetMap and make them cheaper for enterprises to associate user input with compatible monetization. It's now just a bonus side-effect, rather than their mission, that any public good comes from these projects.
"translated by an advertising company" is akin to "Tor was funded by the US government" - it's basically organizational ad hominem.
Google's translations are fine and are high quality and don't yet (or in the foreseeable future) inject ad copy into the translations (like they do on eg Google Maps for POIs).
That's apples and oranges though. Tor is out of control of the US military as this point (+/- your tinfoil hat level), whereas Google Translate was created and is owned solely by Google. I'm not saying GP is fully correct but context is important.
I personally think using transformers for, well, transforming input into another language is going to be a great approach once hardware catches up for local offline use at a reasonable speed and hallucinations are minimized.
Corporate entities come and go. They bait-and-switch at will as they are ultimately only answering to legal obligations and in particular shareholders. It would be odd to overlay such a liability and uncertainty on top of wikipedia.
While abstraction is not the same as translation, if the wikipedia community wants specifically a translation service that is more tightly integrated into the platform imho it should be a fully open source project.
My point is about translating after the fact by the end user solving the problem. Now you can use Google translate for free, later you can use your own LLM. Abstracting the knowledge away is wasted work. We already have it in a definitive source language (english for most things, local languages for local things).
This abstract Wikipedia sounds like Esperanto to me.
Translation solves an immediate problem of giving human users a glimpse of Wikipedias knowledge base, but it is still stricly wrapped in textual data. It is still a content black box that, e.g an LLM would not make more transparent.
Abstraction builds a mathematical representation. Its a new product and it opens up new use cases that have nothing to do with translation. It may on occasion be more factually correct than a translation, or may be used in conjuction with translation, but is potentially a far more flexible and versatile technology.
The challege is really matching ambition and vision with resources and execution. Especially if it is to attract volunteers to crowdsource the enormous task, it needs to have a very clear and attractive onboarding ramp. The somewhat related Wikidata / wikibase projects seem to have a reasonable fan base so there is precedent.
Similar to abstracting maps and geography into GIS data and getting things like geographic proximity and POI-type filtering with lower overhead than creating a category tree for place articles in Wikipedia.
For instance, Wikipedia right now relies quite a lot entirely on manual tagging (authored categories) for classifying related subjects. If you want a list of all notable association footballers, for instance, then the best way to get one is to go to Category:Association football players. But then you're stuck in a very human, flawed, and often in-flux attempt to reach a consensus definition of that, and the list remains out of reach. (Hell, American players are categorized as "soccer players" under the same tree, confounding things like search, because that's the kind of thing Americans do.)
With abstraction, you get classification for much less, and the consensus problem moves from an arbitrary, authored category tree to a much narrower space. If an article is about a footballer, and the abstract data for that subject contains occupation Q937857 (association football player). The dialect and language don't matter — a footballer is a footballer. If you just want a list of footballers, you can get just a list of footballers without even going near things like SPARQL: https://www.wikidata.org/w/index.php?title=Special:WhatLinks...
You might well be right. Furthermore, English is on its way to become the universal language everyone speaks. You are however wrong about comparing AW to translators, which are probabilistic algorithms whereas AW is intended to be as exact as Wolfram Alpha. AW should be also able to use Wikidata to generate unique articles that do not exist even in English.
BTW, translation tech is not as good as you paint it here. I regularly translate my English blog posts to Slovak and every blog post requires 20-30 corrections. DeepL is marginally better than Google Translate. GPT-4 cannot even get word inflection right, an embarrassing fail for such a large model.
And the pages that are in fact missing from "the other language wikis" are local myths, local detailed history, things that wouldn't even be in the English Wikipedia or in the "abstract" version in the first place.