Hacker News new | past | comments | ask | show | jobs | submit login
Wikipedia-grounded chatbot “outperforms all baselines” on factual accuracy (wikipedia.org)
233 points by akolbe on July 17, 2023 | hide | past | favorite | 177 comments



“Breaking: Trivia bot trained on the dictionary spells words better than trivia bot trained on high school English papers”


They are not bragging about the bot. They are bragging about how great the dictionary is. There’s this subtle difference in the context.


They are also the ones judging what is truth and neutrality.

This is equivalent to saying:

"A bot trained on the articles that we have written gives the answers that the writers of the articles expected"


esjeon is simply wrong; this study is not touting the accuracy of wikipedia's knowledge. It's touting their bot's ability to accurately convey wikipedia's knowledge. It's very much about the qualities of their bot, not the qualities of wikipedia.

https://arxiv.org/pdf/2305.14292.pdf


It's both. My interpretation is that the study is as you say, but it's posted on the wikipedia signpost for the reasons esjeon says.


Wikimedia controls what goes in the signpost, they are very different from the actual community, and they are known for misrepresenting the intent of a project.

When the Ukraine Russia war came out, they made a banner about a ukranian translation project that had existed for years and made it look like some project to support Ukraine, effectively breaking neutrality on the war subject (which was unrelated to the translation project.)


It's also largely going to be the sum of it's sources since most (contentious) arguments on Wikis come down to who can cite the most articles, assuming edits get challenged in the first place.

Wikipedia maintains a list of 'reliable' news sites:

https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources/Per...


Wow, what a list. Eye-opening really.

The American Conservative (yellow) ==> The American Conservative is published by the American Ideas Institute, an advocacy organisation. It is a self-identified opinionated source whose factual accuracy was questioned and many editors say that The American Conservative should not be used as a source for facts.

The New Republic (green) ==> There is consensus that The New Republic is generally reliable. Most editors consider The New Republic biased or opinionated. Opinions in the magazine should be attributed.

This seems like a somewhat arbitrary double standard to be applying. As a reader of both news sources they are both biased, opinionated sources, and I don't think you can trust one more than the other. But one is green with "be careful this might be biased" and the other is yellow for pretty much the same reason.


Bias != Reliability.

There's a reason "The Atlantic" is listed green even though it's conservative. Hell they list the Christian Science monitor as green for reliability (as they should imo), I don't think Wikipedia is demonstrating a bias based on any particular ideology in their sources on this list.

This wiki list is a list of sources by reliability. If you only publish stories which support your bias, but those stories are scientifically sound and don't omit context, I don't see the problem with using them as a source regardless of bias.

If you only allow sources from reliable sources aligned with a particular bias to the exclusion of reliable sources from another alignment, that would be an issue, but I don't see evidence of such here.

The problem isn't the bias. The problem is the factuality.



I think there is a large proportion of people who don’t (and maybe even can’t) understand the difference. For them facts are things they agree with, and everything else is a lie.

It almost seems as if some people think that reality is everywhere subjective, and saying or believing something makes it truth, much like religion.


> the other is yellow for pretty much the same reason

I have read neither, so don't have an opinion on them.

But going by the descriptions quoted, it doesn't seem to be for the same reason.

Both are listed as biased/opinionated, but for The American Conservative it additionally says "factual accuracy was questioned", which would make it less trustworthy as a reference.


Wikileaks on zero retractions, and most anti-war news sources: red / black.

Lest anyone think the problem is that Wikileaks is 'left biased'.


More like it's WASP establishment biased. Like the NYTimes.


I think the biggest issue is not even the left/right or political bias of Wikipedia but rather the fact that some committee of wiki editors decide along what seem to be fairly arbitrary/subjective lines that some sources are reliable and others aren't.

And then those claims make their way into Wikipedia where they inevitably (even though they shouldn't) are relied upon by students, politicians, journalists, who then perpetuate the claim.

https://xkcd.com/978/


It's not a committee, it's not arbitrary, and "arbitrary" and "subjective" mean two very different things.

Reliability from a fact-checking perspective is a pretty specific thing, and a thing that is vital to Wikipedia as an open-source, anyone-can-edit encyclopedia. This can correlate with political views in particular times and places, but does not broadly correlate with either left or right. E.g., after the Russian revolution, we saw the left using Pravda as a vehicle to "indoctrinate" and "encourage unity of thought". [1] But a significant part of the current US right has frequently taken the approach of "flooding the zone with shit" [2].

[1] https://www.britannica.com/topic/Pravda

[2] https://www.google.com/search?q=flood+the+zone+with+shit


Have you considered that conservative sources have always been less accurate by dint of failure to accept new data that contradicts existing bias.

You can argue that all parties have biases but if you look at modern conservatism it's worldview is increasingly wildly divergent from reality. If your publication desires the readership of people who are obliged to stand in a puddle and deny being wet you shall have to follow them at least to the perimeter of Neverland and spend at least some of your breath speaking of pirates and fairies. Mentioning the puddle will also be verboten.

Reading several of the articles on the front page I noted a completely incoherent takes on Ukraine and birth control for instance. It's not the outright horror show of Fox news nor is it what one would consider objective or news. It's essentially 100% op eds by your least incoherent older relative.


'factual accuracy was questioned' vs 'The New Republic is generally reliable'


Heh. The glass is missing some water vs the glass is mostly full of water.

(Not commentary on either of those media orgs btw, I don't follow nor have any opinion on either of those one way or the other.)


Also see en.wikipedia.org/wiki/FUTON_bias

There's also some policy pages that talk about othet potential biases like technical biases. There is awareness


The old self-fulfilling prophecy.


X doubt


This sounds like a paradox - but it is not. You don't give the bot the answer you expect - but only the ground facts, it generates the answer based on these facts by itself.

This is a RAG system - and you need to treat it as a whole - it is an question answering machine that remembers the whole wikipedia.

By the way just today I wrote a blogpost about the common misconception that to teach an LLM new facts you need to finetune it: https://zzbbyy.substack.com/p/why-you-need-rag-not-finetunin...


I think there is a subtle hindsight bias here. Like if you asked someone yesterday "would grounding a chatbot on wikipedia make it do better?" I think many people would say that it sounds quite plausible. But if you ask instead "what are your top 10 ideas for making chatbots better at facts?" then it may not be so obvious.


The religious level hype that minor incremental and obvious improvements to existing technologies gets are patently absurd.


It is an important information that you don’t really need petabytes of common crawl data to make a highly accurate bot. There are a few other open source models that preform well with significantly smaller training data that OpenAI.


That isn't what is being described here. They are just providing additional context to ChatGPT using its plugin API. It's still trained on large amounts of public text data.


>It is an important information that you don’t really need petabytes of common crawl data to make a highly accurate bot. There are a few other open source models that preform well with significantly smaller training data that OpenAI.

Sure, but the tradeoff is in generalization vs specialization. No one is impressed by the fact that ChatGPT is able to recite facts. Google can do that. Where it becomes interesting is in the general applicability of a single tool to thousands of possible domains.


I wish I had the time or facility to take a snapshot of wikipedia now before the imminent deluge of Chat-GPT based updates that start materially modifying wikipedia is some weird and unpredictable manner.


In late 2021 / early 2022 I got scared about the incoming consequences of LLMs and downloaded all the "Kiwix" archives I could find, including Wikipedia, a bunch of other Wikimedia sites, Stack Overflow, etc.

I'm pretty glad that I did. I'm going to hold onto them indefinitely. They have become the "low background steel" of text.


I really like that analogy.

For anyone curious what low background steel is, it's steel that was made before the first atomic bombs were tested: https://en.m.wikipedia.org/wiki/Low-background_steel


> In late 2021 / early 2022 I got scared about the incoming consequences of LLMs and downloaded all the "Kiwix" archives I could find, including Wikipedia, a bunch of other Wikimedia sites, Stack Overflow, etc.

> I'm pretty glad that I did. I'm going to hold onto them indefinitely. They have become the "low background steel" of text.

Also, ironically, the Pushshift reddit dumps (still available via torrent), before they were taken down. The exact time Reddit shut down the API to sell their data for AI training is also exactly the time it started to become less valuable for that.

I believe a lot of subreddits started implementing protest moderation policies after reddit came down on the blackout. IMHO, they should implement rules like "no posts unless it's a ChatGPT hallucination."


Link to the torrent for science ?


Wikipedia doesn’t remove the old versions.

Otherwise you can find an archive there: https://archive.org/details/wikimediadownloads?and%5B%5D=sub...


Wikipedia has released snapshots available for download for over a decade now including ones with full edit histories, meaning you can just revert all edits to before a chosen epoch.


You can download a full archive already.

edit, link: https://en.wikipedia.org/wiki/Wikipedia:Database_download


Without article history and videos, it's small enough that many modern smartphones can have a local offline copy.

http://kiwix.org/


I'm unsure if this will happen. There's plenty of checks-and-balances for Wikipedia edits. There's automated spam detection, editors manually looking over edits for articles on their watchlist, editors who look over subtopics, and even editors that take a look at the general stream of edits. It's already possible to flag mass edits. As for whether ChatGPT will inflect the subtle tone and bias of edits made using it, that's the same as bias from human users. And the same mechanisms for dealing with human bias apply here.

In terms of practical utility, for the vast majority of humanity, access to translated articles in their local language is the biggest problem, I think. There is no Yoruba-language Wiki article on General Relativity, for example. Second comes entire biased communities - like some of the smaller Wikis are full of far-right editors, and most editors (like 90%) are men.


I can see AI bots submitting convincing edits at random times in no particular pattern. Eventually they will overwhelm Wikipedia checks and balances.


>> I wish I had the time or facility to take a snapshot of wikipedia now before the imminent deluge of Chat-GPT based updates that start materially modifying wikipedia is some weird and unpredictable manner.

> I'm unsure if this will happen. There's plenty of checks-and-balances for Wikipedia edits.

I think it will. It's so tedious to edit Wikipedia (due to bureaucracy and internal politics) that their editorial population is in a long-term decline, which means their oversight ability is declining too.

Probably what will happen is LLM generated content will creep into long-tail articles, then work its way into more "medium-profile" articles as editors get exhausted. The extremely high-profile stuff (e.g. New York City), political battleground articles (e.g. Donald Trump), and areas patrolled by obsessives (railroads, Pokemon) will probably remain unaffected by the corruption the longest. At some point, the only way to resist will to become much more hostile to new editors, but that's also long-term suicide for the project.

I think they're painted into a corner.


I mean, maybe. AI on the "good side" will also improve. It should be possible to check a sentence against its reference with LLMs. And anything not sourced is suspect, just as it is now.

I also don't like the attitude of Wikipedia being "them", as in "their editorial population". It's our public good, like our air, and everyone should care to ensure its high quality. If you see a problem in the world, you have to try to fix it, instead of sitting on the sidelines, looking from the outside in.


"As of 2 July 2023, the size of the current version of all articles compressed is about 22.14 GB without media." - https://en.m.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia


The Wikipedia community has generally been pretty resistant to allowing fully AI-based tools in. We've had tools such as Lsjbot (https://en.wikipedia.org/wiki/Lsjbot) in the past, but they've failed to gain community consent on any of the large Wikipedias. If someone tries to bring an LLM-based tool to Wikipedia, it would take a lot of finesse to have any shot of the community allowing it.


I don't think it takes much finesse to just randomly start "improving" articles using the output of an LLM. It only takes a single well meaning yet misguided person. Remember this? https://www.theguardian.com/uk-news/2020/aug/26/shock-an-aw-...


Yeah, definitely a potential problem on the smaller language Wikipedias as the Scots Wikipedia incident shows, but for the big ones, low-quality content from new editors is not really a new problem to deal with.


But what about the whole mass of tech bros who don't understand what LLM's are (random text generators and nothing more), and manually start to add changes? It's a virus polluting every industry.


Wikipedia dumps are publicly available, both from themselves and from the Internet archives.

There’s no “time or facility” constraint, only storage space.


The wikipedia politburo already makes it impossible for normies to edit any wikipedia article worth editing. If you don't believe me, try it out with a stopwatch to see how long it takes for your edit to be reverted.


That you call them a 'politburo' and refer to 'normies' gives an indication that the types of edits you were making were neither well sourced nor neutral.

I've never had an edit reverted on Wikipedia.


> the types of edits you were making were neither well sourced nor neutral.

There are a lot of such edits at Wikipedia (neither well sourced nor neutral). For some reasons, a certain bias passes through the filter.


You can torrent a copy of Wikipedia, including article history. Locally, you can go back to any revision of any article you want. I keep a copy locally just because it seems something valuable to have.


You can use Kiwix too as an easy way to get an archive of it


There are multiple studies discussed on this page; the one we're looking at is partway down the page, under "Wikipedia-based LLM chatbot "outperforms all baselines" regarding factual accuracy". This link will take you there if your browser supports scrolling to text fragments: https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2...


This link will take you there regardless of text-fragment-link support: https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2...


Here is a direct link to the arxiv article: https://arxiv.org/abs/2305.14292

WikiChat: A Few-Shot LLM-Based Chatbot Grounded with Wikipedia


I'm not a scientist but isn't it suspect that they're both creating a new bot and a new evaluation metric for bots at the same time?

Like we invented this new thing and this new measurement for evaluating it. It does great on the metric we just made up while we were making it.


Here is this new thing, and here is how it is different than anything else.


We invented the Turing Test decades ago. Since it became irrelevant with ChatGPT [1], we need new tests.

[1]: We can discuss if ChatGPT passes the Turing Test or not, but I think we can now all agree that being able to have a convincing conversation is not a good test for intelligence.


[1] I disagree. I think we can agree there needs to be a refinement on the definition of intelligence, but I think LLMs passed the 1950 definition of general machine intelligence.


No, it’s not suspect in and of itself. Often you need to develop a new benchmark when solving a new problem. It’s common to see this in software engineering/CS papers too.

Of course, one should always be critical of benchmarks, and there is an obvious opportunity for bias here that should be reviewed with care. But your phrasing suggests that this is unusual or actively suspicious, which it is not.


The trouble with Wikipedia is that it's an inch deep. For any given topic, especially history, there's a trove of information in scholarly books from good publishers, but the corresponding Wikipedia article is like three pages long.


I don't think that's a fair criticism of wikipedia. Summarizing knowledge is literally the job of an encyclopedia. There was a reason all my professors in college told us to use wiki as a jumping off point for further reading in the citations.


Once you stray off the popsci/undergrad topics, wikipedia's summation of knowledge is often a few sentences if any. Topics for which numerous books have been written may get only one or two sentences on wikipedia, so I think it's fair to say that wikipedia is an inch deep. Maybe a few inches deep since popular topics do get longer articles, but the long tail of knowledge gets very shallow coverage on wikipedia.


If you're knowledgeable about missing topics that's a perfect opportunity to give back to wikipedia and write the article yourself.


I generally agree with you and I will say that my experience contributing to wikipedia has been extremely pleasant. The community does a good job of making newcomers feel welcome even if you make mistakes.

With that said I've seen two areas where contributing to wikipedia falls short:

The first is things involving what I'd call editorialization (I'm sure there's some wikipedia term for it). Any article that's about an unsettled or somewhat contentious issue seems to give outsized weight to the non-consensus view. Even if 85% of a field thinks that one thing is more likely than the other the wikipedia article will often split its coverage of the views 50/50 and then maybe tack a sentence on at the end saying that the majority of people in the field favor xyz view.

Contributing to or changing those pages is often a hassle because you have to argue with people and that's generally not worth your time (unless you're of the minority opinion and you want to give legitimacy to your side - in which case you are motivated to argue).

The second are the stub articles. The ones that say "This article is a stub. You can help Wikipedia by expanding it." Often I could help wikipedia by expanding it but it's so much work to write a full encyclopedia entry. Like it might take me 4 hours to summarize what I know, look up references, etc. It's easier to just not do it.

Where I find contributing useful is when I'm fixing a small factual error, updating based on a recent discovery, fixing a citation, etc.

I'm not sure if they do it but it would be good for Wikipedia to pay someone to go through and fill out the basics of a bunch of pages on a topic so that there's a scaffolding to work with and then the occasional volunteers could come through and add on facts and fix problems.


Often it's a topic I don't yet know much about. I hear about a topic and search for it, and find a disappointing wikipedia stub. I continue my search to find there are numerous books and research papers about the subject. After reading those my intellectual curiosity may be satisfied, but I wouldn't consider myself an expert and I also don't have any inclination to go back and write a proper wikipedia page.


> I don't think that's a fair criticism of wikipedia.

It's not a fair criticism of Wikipedia, but it is a fair criticism of using Wikipedia as a single-source.


I think it's a fair criticism; or rather, an important limitation one needs to keep in mind. Wikipedia articles can miss out on a great deal of nuance and context, which can matter a great deal.


Worse than that, motivated editors often color their pet pages with specific nuance and context, and no casual editor has a hope of winning an edit war against such opposition.

My favorite example of this is the debate between a faction that believes Lithobates is the proper genus of a certain set of frogs, and another faction that believes the correct genus is Rana. The Lithobates side is essentially one person along with his sock and/or meat puppets, so in the end, after many rounds of moderation, most of the species in question are listed under both genuses.


Isn't that literally the point of an encyclopedia? A starting point, it's the abstract on the subject if you will.


> Isn't that literally the point of an encyclopedia? A starting point, it's the abstract on the subject if you will.

Yes, but Wikipedia is also frequently conceived and marketed as "the sum of all human knowledge," which that shows is a lie by definition.


Jimmy Wales said that phrase in an interview, but it was never meant to say that Wikipedia itself was the only work that needed to be consulted.

https://en.m.wikipedia.org/wiki/Wikipedia:Prime_objective

You seem to have an issue with some person or persons who has been advising others that the only work they need to consult is Wikipedia. Who are they? Specifically.


> "the sum of all human knowledge," which that shows is a lie by definition.

By which definition? In math, the sum of a set necessarily implies a loss of information about that set, for sets larger than 1. But they're using "sum" not in the purely mathematical sense, more like "the summary of all human knowledge". But the same principle applies axiomatically, because summaries are lossy compression: you cannot have a summary that contains all the information of the source it is summarizing.


>> "the sum of all human knowledge," which that shows is a lie by definition.

> By which definition?

That's pretty easy: definition 2 "the whole amount : aggregate" (https://www.merriam-webster.com/dictionary/sum). That it's interpreted that way is shown by the frequency of people saying stuff like "I loaded Wikipedia onto this battery powered Raspberry Pi in a Pelican case, now I'm ready to rebuild civilization if it collapses," and seemingly believing it.

But you do correctly point to another issue: sum has a meaning of "a summary of the chief points or thoughts," which I feel is a less common usage. So the marketing phrase may not be so much a lie, but rather an extremely misleading statement that invites misinterpretation that usually goes unchallenged. IMHO, those are actually even more pernicious than outright lies.


> the frequency of people saying stuff like "I loaded Wikipedia onto this battery powered Raspberry Pi in a Pelican case, now I'm ready to rebuild civilization if it collapses," and seemingly believing it.

As someone who inched closer into the doomsday prepper scene before swerving far away from it, I assure you that people in that subculture have a lot of unrealistic beliefs about their own capacities and resources. I don't think it's Wikipedia's fault that they (and you) are taking a quote about Wikipedia's never-ending goal and interpreting it as if it is their description of what they are.

An even worse example of deceptive marketing would be a compact folding multitool marketed as "the only tool you'll ever need." Even with that, I'd say that if you actually believe that you can rebuild civilization with that tool solely on the basis of that marketing slogan, then that's your fault as much as it is the marketers.

And a minor nitpick: the standard prepper info archives also include collections of various survival guides and resources that are specifically written for these kinds of purposes.


> people saying stuff like "I loaded Wikipedia onto this battery powered Raspberry Pi in a Pelican case, now I'm ready to rebuild civilization if it collapses," and seemingly believing it.

The least delusional part of this is the sparseness of information contained within Wikipedia. If that scenario came to be, they wouldn't be short on information. They'd be short on time, resources, and skills.


If we could condemn a thing due to hype alone, we would condemn all that is good in the world.


Wikipedia also has numerous sister projects like Wikibooks and Wikiversity (including open access WikiJournal) which aim to fill in the details. All these project taken together can indeed fulfill the *aspirational* goal of noting down all human knowledge. If we ever reach there is of course upto us.


It sounds like your issue is with someone describing Wikipedia as the sum of all human knowledge, not with Wikipedia itself, which is what the person to whom you're replying seemed to be saying.


I don't think that TillE meant "the problem with wikipedia is that wikipedia is an inch deep." I think TillE meant "the problem with training a chatbot grounded with wikipedia is that wikipedia is only an inch deep."


In discussions about "deletionism" I've seen people argue that, disk space being cheap, Wikipedia should try to be much more expansive than an encyclopedia.

A paper encyclopedia might not have time or space for individual entries about many hundreds of pokemon, episodes of the simpsons, or characters from star wars.


it has a good format though and would be nice to have a second level of scholarship (e.g scholarpedia). Modeling itself after encyclopedia would be regressive


Just an anecdotal experience : wikipedia is up-to-date with regards to "recent" discoveries. Whereas books will always be engraved with what was regarded as knowledge at the time of writing.

Case in point : elder of zions protocol french article contained outdated knowledge (it said we knew who the author was, propagating an old hypothesis that had been debunked in the last 10 years, eventhough the other wikipedia articles were fixed). Historians were heard repeating that same boggus claim on radio. Until i convinced an historian friend of mine to fix the french article, and all of the sudden historians started fixing their speech. Meaning not only did they not update their knowledge from scholarly books, but they needed wikipedia to help them get up to date.


New is not always better.

For example, from about 1960-2010, anthropologists universally held a "pots, not people" view of prehistory: they asserted, with great confidence, that styles of pottery and metalworking changed over time due to voluntary exchange of ideas among peaceful, cooperating peoples. These anthropologists asserted that pre-1960 theories that pottery styles changed because population groups violently replaced each other were not only wrong, but immoral and barbaric. To them, it was modernity that made humans violent.

Now due to ancient DNA, we know that the pre-1960s anthropologists were right and the post-1960 consensus was wrong: prehistory was violent and populations violently replaced each other with regularity.

You're more informed reading, say, Gordon Child book from 1920 than a serious book on prehistoric archaeology from 2000.

So it goes in many fields. Imagine how much longer it would take for science's self correction mechanism to operate if our knowledge were encoded solely in a "living" information system aligned with only currently fashionable ideas.


But wouldn't you agree reading about this topic now, with the counter-argument of the post-1960 consensus (though I have a hard time thinking most things debatable like this are ever strictly consensus), and the follow-up DNA evidence, is far more informative and convincing than what you would read in 1920? It seems that the people guessing from 1920 might've had about as much chance of being right as the people guessing in 1960 with neither having the relevant evidence to back their claim.


Come on: if you're excavating an ancient village and find a layer of charcoal littered with arrowheads and skulls and find totally different pottery before and after the charcoal layer, then unless your brain has been codrycepted by fashionable academic nonsense, you're going to conclude that someone conquered that village and replaced its people --- not that the charcoal layer represents some kind of ceremonial swords-to-plowshares peaceful pottery replacement ceremony. For 50 years, academics insisted on the latter interpretation. If you'd read old books, you'd know the post-1960s consensus was nonsense even without ancient DNA. Ancient DNA merely created a body of evidence so totally compelling that not even diffusionists (the "pots not people" crowd) could stick to their stories and keep a straight face.


Is it wrong that Wikipedia articles are only three pages long? Does anybody claim that reading an encyclopedia article (Wikipedia, Britannica or whatever) is better than reading a scholarly book on the given topic?


People with just a Wikipedia-level knowledge will argue with actual experts as if Wikipedia is equivalent.

Of course, you can't really say that distrusting experts is unique to encyclopedias.


But isn't that the same as people who argue because "they read it on a magazine/on the newspaper"? So they are wrong -- is it Wikipedia's fault though?

An encyclopedia is always the starting point, never the end of serious research. (It's ok however to stick to Wikipedia if a superficial acquaintance with the topic is enough!).


If there's something in a wikipedia article that experts will argue against, the article needs updating to be compatible with, even if does not include, expert-level knowledge.


Some people just demand that Wikipedia be a universal factual info database, missing nothing. It'd be nice, though.


There really isn't such a thing though for many topics as a universal factual info database. For many, e.g. historical, topics different books have different areas of focus and interpret events differently. Encyclopedias do to a certain degree (and historical "truth" may shift over time) but, in general, they're not the place to hash out the "right" interpretation of events.


> historical, topics different books have different areas of focus and interpret events differently

In those areas the fact isn't the base fact, but claims of fact. We don't know who explored which bit of the great pyramid in which order, and may never, but we know of many specific claims.

The fact check wouldn't be "The great pyramid X" but "Herodotus said X about the great pyramid".

> in general, they're not the place to hash out the "right" interpretation of events.

Once you scope the problem correctly it's not a problem. The point isn't to solve historical riddles, it's to document what evidence we have.

Sometimes that evidence is broadly accepted measurements (land area of Australia) and other times it's not.


Wikipedia is a summary. Is it meant to be deep? If you want to go deep on any topic you'll need to go to other sources.


Please, before assuming you know what I mean, read the complete comment.

The trouble with Wikipedia is that it's an inch deep.

That's not its biggest problem. Wikipedia is biased.

Of course there's political bias in the American fashion. But that's not all. There is bias about History depending on what country is telling the story. And there is a strong bias even in scientific topics (maybe specially in them) when there are commercial interests involved.

That's not specific to Wikipedia.

But when you research some topic, reading multiple books, you'll notice there are different opinions, you learn to discount bias looking at the procedence. Wikipedia tries to adopt a neutral tone and cite different sources, but sometimes it does a terrible job at it.


> There is bias about History depending on what country is telling the story.

A funny example of this is fan death [1]. Comparing English to southeast Asian languages show that Asian languages pages suggest that it's real (at least last few times I translated it).

An, in a bit of relevancy, the Japanese page has been overwritten with "I love you" and "I'm sorry".

[1] https://en.wikipedia.org/wiki/Fan_death


Well that was bizarre. But if you have a government agency saying that electric fans can cause asphyxiation, and even that they have data to provide it, can you blame people in that country for believing it?


Check the wiki page in another language, closer to the affected area.

Its not a direct translation, its an entirely different encyclopedia and can be far more robust.

(Maybe an LLM could harmonize all the wikipedias across languages)


My hope in this regard is that Wikipedia pages tend to have much more than an inch of citations. If even a significant fraction of those sources can be digested, it could give rise to a much deeper source. The really cool thing about their chatbot is that it appears to have the ability to summarize and highlight where the summaries came from. Extending that to the ability to summarize the backing sources, and point to where that came from, could be an incredible research tool.


Maybe a Wikipedia based LLM could make decisions about which papers are factual enough to include in a more extensive LLM.


You'll end up with a corpus consisting entirely of paywall text and 404 pages.


Can there be factual correctness from being grounded in scholarly books as such?

The amount of disagreement between researchers over time and changing consensus requires an external arbiter of individual facts at the very least.


If anything Wikipedia has a problem with not being able to summarise and having too much original research. Some articles are just pages and pages of formulas with few citations.


Wikipedia is full of references to scholarly sources. Make a bot that follows the references to the sources and incorporates them in the training data, and Bob's your uncle.


Wikipedia is not some thing handed down by God at the beginning of time. It’s a work in progress by volunteers. If you think a page lacks depth, you’re free to update it.


Still deeper than most people, though. You can't put all of human knowledge in Wikipedia, but it's extremely thorough in the basics.


Well, it has link that it can follow.

Add some AI to take the footnotes and get to the sources and train on that.


I wonder if that's the next iteration of Wikipedia. Right now, the model is to summarize secondary sources. Once summarization becomes trivial via LLMs, the most valuable thing to do would be to assemble ever-expanding datasets of secondary sources for the LLM to pull from.



This article was more about open the value of open access when publishing research being 15% more likely to be cited on Wikipedia. The AI part was somewhat weak as it did not compare against ChatGPT.


The layout is very confusing, but the page is a review of various recent research papers about Wikipedia and the title references one of them, search for the section titled

> Wikipedia-based LLM chatbot "outperforms all baselines" regarding factual accuracy


How does this make sense? Search to find it on the page??


This is great and all, but we still run into the problem with political biases embedded in the source data [0]

Musk’s AI’s aim is to get to the truth, not eliminate biases retroactively. I think that’s a noble goal, politics aside.

I agree with him that teaching an AI to lie is a dangerous path. Currently it’s probably not exactly akin to lying, but it’s close enough to be on that path.

We should find a way to feed source material from all “biases” if you will, and have it produce what’s closest to reality. It’s obviously easier said than done, but I don’t think the AI Czar VP Harris aims to do this.

If we’re too divided or hellbent on pushing our own agenda, it’ll be a bad outcome for all.

Unfortunately the differences we have are at a very fundamental level that really is a question of how reality is perceived, and what we consider meaningful. The difference of if something by its nature has meaning, or if we give meaning to it culturally/societally.

The former is a more “conservative” (personality wise, not political) view.

The later is more of, “everything that has meaning is based off the meaning we say it has, thus we can ascribe the level of meaning to that or other things as we wish”. The idea that many things are social constructs, and we can change those as we wish to craft what we’d like to see.

I’m probably doing a poor job of wording it, but this fundamental difference in perception is going to very quickly be at the forefront of AI ethics.

[0] https://en.m.wikipedia.org/wiki/Ideological_bias_on_Wikipedi...


The problem that Musk is going to run into is that civilization blossoms from deeply rooted lies.

Apart from necessary lies that lie at the core mechanics of civilization, anything remotely political has been long been vulnerable to outrageous grand lies that enjoy as much pressure as it takes to maintain them. Wikipedia is valuable apart from any political topic. More topics are political than many would believe.

They are going to make AI lie, as there isn't a choice in the matter. One major future problem will be the strategic war (military, business, etc) advantage of AI that is beyond the reach of censors. The reasonably accurate conclusion is likely that private and DoD AI won't be trained on lies, but all others will be.


Any lie that you can identity as a lie, an AI can be trained not to tell.


By the same token, any AI can be trained to withhold any truth identified as inconvenient. :/


That is exactly what an AI the size of GPT-4 does if you do RLHF on it.

https://cdn2.assets-servd.host/anthropic-website/production/...


Literally suggesting we enshrine the Balance fallacy into our conception of truth:

https://rationalwiki.org/wiki/Balance_fallacy


> We should find a way to feed source material from all “biases” if you will, and have it produce what’s closest to reality.

Can't help but suspect you'll end up with an AI that confidently reports that Jesus was an extra-terrestrial, and the world is controlled by a secret cabal of lizard people.

If you look into rare diseases, you'll find the counter intuitive idea that rare disease is common. Each disease is individually rare, but there are so many of them, that a lot of people have them in total. Human beliefs are sort of similar. There's a huge volume of strange beliefs.


Accuracy according to whom? Wikipedia is a battleground for ideologues. You can't trust anything even remotely controversial present there.


This highlights a problem LLMs will face if they improve enough to solve their hallucination problems. People will begin to treat the LLM like some sort of all knowing oracle. Activists will fight fiercely to control the model's output on controversial topics, and will demand lots of model "tuning" after training.


> Activists will fight fiercely to control the model's output on controversial topics,

They already do. I'd love to know how much "brain damage" RLHF and other censorship techniques cause to the general purpose reasoning abilities of models. (Human reasoning ability is also harmed by lying.) We know the damage is nontrivial.



Accuracy as in faithfully represents the source material. It doesn't matter if the source material is true or not in this analysis.


Can someone update the link to https://arxiv.org/abs/2305.14292 ?

The headline refers to only a small portion of the linked page.


You know you could probably get really far just training an LLM on wikipedia and all linked citations, and nothing else

The whole problem of wikipedia only being an "inch deep" on any given topic is basically solved if the LLM also has to read every cited work in full

And maybe citation counts could affect exposure to that work during training


> and all linked citations

I wonder what percentage of Wikipedia citations are actually currently available on the internet. For example, here is today's featured article[1]. The majority of references on that pages are books, journals, magazines, television, and unlinked news articles that can't be easily accessed. Plus on more niche topics, it is common for the externally linked references to disappear over time.

[1] - https://en.wikipedia.org/wiki/David_Kelly_(weapons_expert)


archive dot org and dot is, sci-hub and λιβgen / zλιβ will cover a lot of those text sources. Aren't bots largely what's responsible for links being updated to point to archived sources? I've noticed archive links a lot lately.

Someone doing serious AI training will mirror sci-hub and λιβgen first, so they'll already have a fair amount of the (good quality) referenced papers and educational books.

Wikipedia (and citation count on google scholar for papers) could be used as a filter for which books and papers to train on first.


The ability to predict the expected answer to a given question isn't something I could see naturally falling out of those sources though, unlike an LLM trained on text from online forums and the like.


AFAIK the training works by just selecting random words in sentences to have the LLM "fill in" until it generalizes to being able to write entire sentences. The training data can be completely unstructured.


That’s basically Google’s PageRank algorithm with Wikipedia as the 10/10 ranked source of truth.


A lot of full text for research (outside CS) is still locked up behind subscription paywalls. Plus, often times PDFs are not the best format to extract text out of.

Interesting suggestion but probably a lot of practical limitations.


That's also a blessing in disguise though, we don't want an LLM trained on closed source stuff, so ignoring sources it can't access is probably a great thing


[flagged]


That's a strong accusation. What evidence do you have? What are the "ideological views" and why do you think they are baseless?


How does one even go about testing such a thing? Comparing it to Wikipedia articles? Even if it is factual does it spew the interpretations present in most Wikipedia articles?


The abstract of the article explains what they mean: they mean the LLM does not hallucinate (so much) and provides facts based on Wikipedia. Absolute "truth" is not measured; rather, they measure how much the chatbot "sticks to the known facts" within Wikipedia. Since they are measuring this, presumably other chatbots and LLMs tend to hallucinate much more, providing "facts" not supported by their training data.


now that is the interesting bit really. what makes the wikipedia based LLM hallucinate less?

the only thing i can think of ad hoc is that wikipedia contains less conflicting or unclear information which helps to avoid the LLM getting confused. also the information is more organized, and it is clear which articles relate to each other.

this would show what i think we already knew that LLMs can summarize the data they get but they can not evaluate or verify it.


The most irritating effect is that the LLM somehow guesses what you want it to respond. Human-in-the-loop training is imperfect.


After having seen what professional "fact checkers" accept as fact (and reject as misinformation) makes me similarly skeptical.


They are not fact checking "truth" but whether the chatbot spouts "facts" supported by Wikipedia. This is objective and much easier to check than capital letter Truth. Consider that when discussing ideological or political articles, what is "true" becomes nebulous.


Lol, I wonder how many of their fact checkers silently used Wikipedia to verify the facts outputted by the AI.


I've been seeing a bit of buzz about "grounded" models. Other than setting temperature to Zero and adding better context matching (with a vector database and embeddings, for example), what other changes in methodology or implementation makes it better?


How did they manage to get it to stop hallucinating? I can't prevent my llama-index based chatbot from making up absurd things from my own documents, even though I've been trying to restrict it to that specific area of knowledge.


"this first draft is probably not solid enough to be cited in Wikipedia"


We can talk about alignment of LLMs. We can also talk about alignment of people who write Wikipedia. To imagine there is no bias is foolish and dangerous. More accurate isn't truth. More accurate for whom?


Everything is biased. Everyone, every single human being, is aligned.

That said, bias towards, and alignment with, verifiable reality is possible to achieve, and getting there partway is better than not at all:

https://hermiene.net/essays-trans/relativity_of_wrong.html

> [W]hen people thought the Earth was flat, they were wrong. When people thought the Earth was spherical, they were wrong. But if you think that thinking the Earth is spherical is just as wrong as thinking the Earth is flat, then your view is wronger than both of them put together.


Everything is biased. Everyone, every single human being, is aligned.

Of course but as discussed many times here before, Wikipedia leans left - presumably reflecting the statistical properties of the demographic of the people drawn to edit and moderate it - as implied by the comment you are replying to - and that can be a significant issue for topics (e.g cultural, historical, social, political etc) where that bias filters what might be assumed by the user to be objective answers.

This isn't a left vs right thing either; there are plenty of publications, demographics and institutions that lean right. The problem is the transparency, awareness and communication of that bias when using them as sources for tools like this.

In the underlying study, there is no mention of the word "bias".

Here's a sample quote which is also concerning:

For recent topics, we look at the most edited Wikipedia articles 1 in the first four months of 2023. The number of edits is also a good proxy for how much interest there is around a certain topic.

True - and it may also be an indication of a topic that is heavily contested. If the two (or more) views on the "truth" of the article are imbalanced, the chatbot will reflect that imbalance, and can therefore in no way be said to “outperform all baselines on factual accuracy".

To be fair to the researchers, they do address related concerns and talk about avoiding some areas of discussion, but the headline here is extremely misleading.


> Wikipedia leans left

That is subjective, and depends on where you think the "centre" is.

I don't regard Wikipedia as reliable on any topic that is political or involves national history. Modern Wikipedia expects editors to support their edits with citations to "reliable sources", which means the mainstream press, mainly (because primary sources are deprecated). But the mainstream press is overwhelmingly right-wing, and left-wing papers and magazines are usually explicitly rejected as not reliable.

On matters of politics and history, I always dig into the citations (unless I'm happy to get a sketchy version that isn't really accurate). But on most technical and humanities-based topics, the articles are usually quite good (and often much deeper than 1").

There's still way too much stuff in articles that is not cited at all. That changes gradually, as editors delete uncited material, and others come along with suitable citations. I think it's getting better all the time.


That is subjective, and depends on where you think the "centre" is.

Not at all. Even wikipedia itself acknowledges it [1] - and you can bet the editors responsible for the bias were fighting tooth and nail against that admission - which gives some idea how unbalanced it must be in reality.

Modern Wikipedia expects editors to support their edits with citations to "reliable sources", which means the mainstream press...

And academia - don't forget academia, that bastion of the right.

...the mainstream press is overwhelmingly right-wing

That's ridiculous - The Guardian?? The Washington Post? New York Times?

I think you've made a point about Wikipedia though, but perhaps not the one you intended...

[1] https://en.wikipedia.org/wiki/Ideological_bias_on_Wikipedia


> That's ridiculous - The Guardian?? The Washington Post? New York Times?

Yes, those. I'm not closely familiar with NYT and Washpo; but I'm a long-time reader of the Guardian. It's role is to delimit the left end of the Overton Window, which is roughly the left-ish end of the Shadow Cabinet, and is currently well to the right of Labour Party membership. From my limited exposure, I'd say NYT and Washpo are well to the right of The Guardian, as the Democratic Party is well to the right of Labour in the UK.

Your WP link doesn't speak to any strong bias in English Wikipedia; it says roughly that WP content is "establishment", which isn't surprising, considering the Reliable Sources policy. It also says that older articles exhibit a stronger "left-wing" bias (i.e. pro-Democratic). And the article is written from a US POV; in the USA, socialism is still a "bad word", and the word "liberal" is used instead.


If you consider the likes of The Guardian right-wing there is really no point in continuing this discussion as I don't believe it can be done in a good faith manner.


I hope you aren't accusing me of bad faith!

The Guardian has been a supporter of the Israeli government since before there was an Israeli government. Nobody could accuse the Israeli government of being left-wing.

The Guardian was part of the coalition that hounded Jeremy Corbyn out of the Labour leadership, because he was too "left wing". Starmer refused to let him stand in the seat he's represented for decades; The Guardian strongly backs Starmer, who represents the most right-wing elements in Labour.

I'm happy to discontinue this discussion, not because I suspect you of bad faith, but because political discussions aren't generally on-topic on HN.


I would be very careful assuming that knowledge from Wikipedia is correct. Take medical articles, many of them are debatable at best and misleading at worst from a professional POV.


“All baselines” is doing a lot of heavy lifting in that sentence.


I hope somebody makes a game of Trivial Pursuit with generated questions sourced from Wikipedia.


People on the internet still often criticizes wikipedia when I link to it, I don't understand why.

It's true that it's not good enough for academic work (is it?), but it's largely enough for everything else.


That value was calculated and verified using Wikipedia?


But did they measure the truthiness of those facts?


[citation needed]

And no you can't cite Wikipedia ;)


Easy-peasy, here's the citation: https://arxiv.org/abs/2305.14292

Looks like a legit paper.


"Wikipedia Is Badly Biased": https://larrysanger.org/2020/05/wikipedia-is-badly-biased/

By cofounder Larry Sanger


There's some atrociously written articles on Wikipedia even in the year 2023.

Case example:

https://en.wikipedia.org/wiki/Fear_of_intimacy

The majority of the article is woman-centered, even though there's no evidence that it's highly gender-biased, and the only information pertaining to men is that if they have fear-of-intimacy then they might be a sex offender.

Otherwise, the article barely communicates anything meaningful. How do attachment types relate to fear-of-intimacy? Are they causative or merely correlative?

Then there's of course poor writing throughout such as this:

> Fear of intimacy has three defining features: content which represents the ability to communicate personal information [...]

What the hell does that mean? "Content?" Like a YouTube video or something?

This is just the latest example I've come across, and happens to be one of the least encyclopedic bodies I've text I've ever read. So much of what I read on Wikipedia is of a similar low caliber. People scan over Wikipedia articles but don't think critically, in part because Wikipedia has devolved into writing that can't decide what its audience is and won't get to the point. As I've said before, check out the Talk sections of the pages you visit, and you'll find some of the most arrogant responses from Wikipedia's inner circle of editors.

What makes me LOL the most is supposedly scientific articles that are written as if there is no debate behind a scientific idea, despite there being no such thing in science as "case closed." Wikipedia often behaves like it's a peer-reviewed scientific journal, yet has none of the chops to act as such. Anything that you read on Wikipedia that suggest that there is "no evidence" for something is likely to be some buffoon's ignorant opinion on the actual literature.

And no, I can't just "edit" Wikipedia to fix these issues. I've tried. Both my home IP address and my phone IP address is banned from them, despite my having never set up an account with them.


> > Fear of intimacy has three defining features: content which represents the ability to communicate personal information [...]

> What the hell does that mean? "Content?" Like a YouTube video or something?

It's taken directly from the source cited (page 2 of https://www.semanticscholar.org/paper/Development-and-Valida...). I'm not an expert in the field and have no idea if this is a good paper, but it has received 267 citations which does convey some impact.

> The fear-of-intimacy construct takes into account three defining features: (a) content, the communication of personal information;(b) emotional valence, strong feelings about the personal information exchanged; and (c) vulnerability, high regard for the intimate other. We propose that it is only with the coexistence of content, emotional valence, and vulnerability that intimacy can exist. Consider, for example, the customer who talks to an unknown bartender about his or her troubles. Although there may be personal content

It's clear that it's not the noun "content" but the adjective, defined as "pleased with your situation and not hoping for change or improvement".

I hope the Wikipedia editors are more literate and willing to research than that. I don't think I want to read your version of wikipedia.


> It's clear that it's not the noun "content" but the adjective, defined as "pleased with your situation and not hoping for change or improvement".

No, it's not the adjective. The other 2 features are nouns, so this one must also be a noun, since it's a parallel construct. Also, they're all "features", so they have to be nouns by definition. And what would the adjective even be describing?

In this case, the "content" refers (I guess) to the content that's being communicated, though it's poorly phrased.

The Wikipedia excerpt is badly written, whether you agree with the GP or not about the article being biased towards women. It's not even a paraphrase of the original source, which claims the content is the communication itself, whereas the article claims the content "represents the ability to communicate personal information" — which is pretty meaningless.


> It's clear that it's not the noun "content" but the adjective, defined as "pleased with your situation and not hoping for change or improvement".

If it was clear, I wouldn't have brought it up. Judging by the Talk section of that page, I'm not the only one who finds that choice of words confusing. It doesn't really matter if it's lifted from a cited article; that article isn't the Wikipedia page.

> I hope the Wikipedia editors are more literate and willing to research than that. I don't think I want to read your version of wikipedia.

What specifically do you object to? A better choice of words? Not being unnecessarily gender biased? Not misrepresenting the state of research?


>The majority of the article is woman-centered, even though there's no evidence that it's highly gender-biased…

If you were able to edit this wiki page, what particular studies about fear of intimacy in men would you cite in the sections you add?

Also, is this bit

> Anything that you read on Wikipedia that suggest that there is "no evidence" for something is likely to be some buffoon's ignorant opinion on the actual literature

meant to be ironic?


> In another place, the article simply asserts, “the gospels are not independent nor consistent records of Jesus’ life.” A great many Christians would take issue with such statements, which means they are not neutral for that reason alone.

I'd love to see his article on Jesus that absolutely no one would "take issue with".


While I don't think it's possible to write an article on a controversial subject that no one will take issue with, it is possible to write with a generally Neutral Point of View, which has been a guiding principle of Wikipedia since the very early days: https://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_vie...

Making a flat statement that the gospels are "not independent nor consistent" is not settled or universal assessment. An article written in NPOV would discuss the variety of citeable interpretations and the debate between them over time.


> Making a flat statement that the gospels are "not independent nor consistent" is not settled or universal assessment.

The Earth is round is not a universally accepted statement either. NPOV doesn’t mean living equal time to everyone with an opinion.


Larry Sanger is not exactly a neutral source on wikipedia. He is behind multiple competing projects, so might be financially motivated to shit-talk wikipedia.


Trying to change the subject to Larry Sanger is an ad hominem fallacy. Address the content of the message, not the speaker.

For example, is this accurate or isn't it?

>Examples have become embarrassingly easy to find. The Barack Obama article completely fails to mention many well-known scandals: Benghazi, the IRS scandal, the AP phone records scandal, and Fast and Furious, to say nothing of Solyndra or the Hillary Clinton email server scandal—or, of course, the developing “Obamagate” story in which Obama was personally involved in surveilling Donald Trump. A fair article about a major political figure certainly must include the bad with the good. Beyond that, a neutral article must fairly represent competing views on the figure by the major parties.

And if so, then wikipedia is indeed badly biased. Whether or not Larry Sanger is isn't that interesting. But a bias at wikipedia - a source blindly trusted by millions - is a very interesting and concerning state of affairs.


> Trying to change the subject to Larry Sanger is an ad hominem fallacy. Address the content of the message, not the speaker.

I disagree. This thread started with "By cofounder Larry Sanger" - so the argument started with an implication that larry sangar should be listened to due to who he is. You can't both claim his argument holds extra weight due to who he is well also claiming its irrelavent who he is. You have to pick one.

As far as the obama article goes - im not an american and i havent heard of those scandals before, so honestly i dont know if their ommision is appropriate or not (it should be noted that libyan intervention is mentioned in his article).

However, i think this is asking the wrong question. Nothing is 100% neutral. I don't doubt you can find biased things in wikipedia. It is made by humans not revealed through divine revelation. The important question in my mind is how does it stack up against other sources. Is it mostly neutral relative to other information sources? That's how i would like to judge it.


Larry hates Wikipedia because Jimbo Wales got all the credit.


It's even blatantly worse in the Spanish Wikipedia


What an absolute trashfire of a blogpost.

It's written in the tone of a sore loser. A person who fought for regressive policies, against people with better arguments and more accurate facts. A person who now, having lost the fight for the policy, retreats into their echo chamber and decries the debate as "not making room for my facts."

It's apparently impossible to write any neutral statement that does not receive 100% unanimous support from every single person on the planet earth:

> A great many Christians would take issue with such statements, which means they are not neutral for that reason alone


Way to take that statement entirely out of context. For anyone reading this later, this is what the article OP posted says in context:

> In another place, the article [the Wikipedia article about Jesus] simply asserts, “the gospels are not independent nor consistent records of Jesus’ life.” A great many Christians would take issue with such statements, which means they are not neutral for that reason alone. In other words, the very fact that many Christians, including many deeply educated conservative seminarians, believe in the historical reliability of the Gospels, and that they are wholly consistent, means that the article is biased if it simply asserts, without attribution or qualification, that this is a matter of “major uncertainty.” Now, it would be accurate and neutral to say it is widely disputed, but being “disputed” and being “uncertain” are very different concepts.

Put in context, Sanger is saying something that seems reasonable. I wouldn't expect Wikipedia to need unanimous support from Christians on all their articles, but it seems to me that maybe the article on Jesus should have some qualifications in there from, you know, the people who actually study and practice the faith centered on Jesus? It seems reasonable to me, but maybe I'm thinking about this too hard and the fumes from the "trashfire" are messing with my thinking.


And how is factual accuracy determined? Using the exact same sources as Wikipedia, right?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: