Hacker News new | past | comments | ask | show | jobs | submit login

The trouble with Wikipedia is that it's an inch deep. For any given topic, especially history, there's a trove of information in scholarly books from good publishers, but the corresponding Wikipedia article is like three pages long.



I don't think that's a fair criticism of wikipedia. Summarizing knowledge is literally the job of an encyclopedia. There was a reason all my professors in college told us to use wiki as a jumping off point for further reading in the citations.


Once you stray off the popsci/undergrad topics, wikipedia's summation of knowledge is often a few sentences if any. Topics for which numerous books have been written may get only one or two sentences on wikipedia, so I think it's fair to say that wikipedia is an inch deep. Maybe a few inches deep since popular topics do get longer articles, but the long tail of knowledge gets very shallow coverage on wikipedia.


If you're knowledgeable about missing topics that's a perfect opportunity to give back to wikipedia and write the article yourself.


I generally agree with you and I will say that my experience contributing to wikipedia has been extremely pleasant. The community does a good job of making newcomers feel welcome even if you make mistakes.

With that said I've seen two areas where contributing to wikipedia falls short:

The first is things involving what I'd call editorialization (I'm sure there's some wikipedia term for it). Any article that's about an unsettled or somewhat contentious issue seems to give outsized weight to the non-consensus view. Even if 85% of a field thinks that one thing is more likely than the other the wikipedia article will often split its coverage of the views 50/50 and then maybe tack a sentence on at the end saying that the majority of people in the field favor xyz view.

Contributing to or changing those pages is often a hassle because you have to argue with people and that's generally not worth your time (unless you're of the minority opinion and you want to give legitimacy to your side - in which case you are motivated to argue).

The second are the stub articles. The ones that say "This article is a stub. You can help Wikipedia by expanding it." Often I could help wikipedia by expanding it but it's so much work to write a full encyclopedia entry. Like it might take me 4 hours to summarize what I know, look up references, etc. It's easier to just not do it.

Where I find contributing useful is when I'm fixing a small factual error, updating based on a recent discovery, fixing a citation, etc.

I'm not sure if they do it but it would be good for Wikipedia to pay someone to go through and fill out the basics of a bunch of pages on a topic so that there's a scaffolding to work with and then the occasional volunteers could come through and add on facts and fix problems.


Often it's a topic I don't yet know much about. I hear about a topic and search for it, and find a disappointing wikipedia stub. I continue my search to find there are numerous books and research papers about the subject. After reading those my intellectual curiosity may be satisfied, but I wouldn't consider myself an expert and I also don't have any inclination to go back and write a proper wikipedia page.


> I don't think that's a fair criticism of wikipedia.

It's not a fair criticism of Wikipedia, but it is a fair criticism of using Wikipedia as a single-source.


I think it's a fair criticism; or rather, an important limitation one needs to keep in mind. Wikipedia articles can miss out on a great deal of nuance and context, which can matter a great deal.


Worse than that, motivated editors often color their pet pages with specific nuance and context, and no casual editor has a hope of winning an edit war against such opposition.

My favorite example of this is the debate between a faction that believes Lithobates is the proper genus of a certain set of frogs, and another faction that believes the correct genus is Rana. The Lithobates side is essentially one person along with his sock and/or meat puppets, so in the end, after many rounds of moderation, most of the species in question are listed under both genuses.


Isn't that literally the point of an encyclopedia? A starting point, it's the abstract on the subject if you will.


> Isn't that literally the point of an encyclopedia? A starting point, it's the abstract on the subject if you will.

Yes, but Wikipedia is also frequently conceived and marketed as "the sum of all human knowledge," which that shows is a lie by definition.


Jimmy Wales said that phrase in an interview, but it was never meant to say that Wikipedia itself was the only work that needed to be consulted.

https://en.m.wikipedia.org/wiki/Wikipedia:Prime_objective

You seem to have an issue with some person or persons who has been advising others that the only work they need to consult is Wikipedia. Who are they? Specifically.


> "the sum of all human knowledge," which that shows is a lie by definition.

By which definition? In math, the sum of a set necessarily implies a loss of information about that set, for sets larger than 1. But they're using "sum" not in the purely mathematical sense, more like "the summary of all human knowledge". But the same principle applies axiomatically, because summaries are lossy compression: you cannot have a summary that contains all the information of the source it is summarizing.


>> "the sum of all human knowledge," which that shows is a lie by definition.

> By which definition?

That's pretty easy: definition 2 "the whole amount : aggregate" (https://www.merriam-webster.com/dictionary/sum). That it's interpreted that way is shown by the frequency of people saying stuff like "I loaded Wikipedia onto this battery powered Raspberry Pi in a Pelican case, now I'm ready to rebuild civilization if it collapses," and seemingly believing it.

But you do correctly point to another issue: sum has a meaning of "a summary of the chief points or thoughts," which I feel is a less common usage. So the marketing phrase may not be so much a lie, but rather an extremely misleading statement that invites misinterpretation that usually goes unchallenged. IMHO, those are actually even more pernicious than outright lies.


> the frequency of people saying stuff like "I loaded Wikipedia onto this battery powered Raspberry Pi in a Pelican case, now I'm ready to rebuild civilization if it collapses," and seemingly believing it.

As someone who inched closer into the doomsday prepper scene before swerving far away from it, I assure you that people in that subculture have a lot of unrealistic beliefs about their own capacities and resources. I don't think it's Wikipedia's fault that they (and you) are taking a quote about Wikipedia's never-ending goal and interpreting it as if it is their description of what they are.

An even worse example of deceptive marketing would be a compact folding multitool marketed as "the only tool you'll ever need." Even with that, I'd say that if you actually believe that you can rebuild civilization with that tool solely on the basis of that marketing slogan, then that's your fault as much as it is the marketers.

And a minor nitpick: the standard prepper info archives also include collections of various survival guides and resources that are specifically written for these kinds of purposes.


> people saying stuff like "I loaded Wikipedia onto this battery powered Raspberry Pi in a Pelican case, now I'm ready to rebuild civilization if it collapses," and seemingly believing it.

The least delusional part of this is the sparseness of information contained within Wikipedia. If that scenario came to be, they wouldn't be short on information. They'd be short on time, resources, and skills.


If we could condemn a thing due to hype alone, we would condemn all that is good in the world.


Wikipedia also has numerous sister projects like Wikibooks and Wikiversity (including open access WikiJournal) which aim to fill in the details. All these project taken together can indeed fulfill the *aspirational* goal of noting down all human knowledge. If we ever reach there is of course upto us.


It sounds like your issue is with someone describing Wikipedia as the sum of all human knowledge, not with Wikipedia itself, which is what the person to whom you're replying seemed to be saying.


I don't think that TillE meant "the problem with wikipedia is that wikipedia is an inch deep." I think TillE meant "the problem with training a chatbot grounded with wikipedia is that wikipedia is only an inch deep."


In discussions about "deletionism" I've seen people argue that, disk space being cheap, Wikipedia should try to be much more expansive than an encyclopedia.

A paper encyclopedia might not have time or space for individual entries about many hundreds of pokemon, episodes of the simpsons, or characters from star wars.


it has a good format though and would be nice to have a second level of scholarship (e.g scholarpedia). Modeling itself after encyclopedia would be regressive


Just an anecdotal experience : wikipedia is up-to-date with regards to "recent" discoveries. Whereas books will always be engraved with what was regarded as knowledge at the time of writing.

Case in point : elder of zions protocol french article contained outdated knowledge (it said we knew who the author was, propagating an old hypothesis that had been debunked in the last 10 years, eventhough the other wikipedia articles were fixed). Historians were heard repeating that same boggus claim on radio. Until i convinced an historian friend of mine to fix the french article, and all of the sudden historians started fixing their speech. Meaning not only did they not update their knowledge from scholarly books, but they needed wikipedia to help them get up to date.


New is not always better.

For example, from about 1960-2010, anthropologists universally held a "pots, not people" view of prehistory: they asserted, with great confidence, that styles of pottery and metalworking changed over time due to voluntary exchange of ideas among peaceful, cooperating peoples. These anthropologists asserted that pre-1960 theories that pottery styles changed because population groups violently replaced each other were not only wrong, but immoral and barbaric. To them, it was modernity that made humans violent.

Now due to ancient DNA, we know that the pre-1960s anthropologists were right and the post-1960 consensus was wrong: prehistory was violent and populations violently replaced each other with regularity.

You're more informed reading, say, Gordon Child book from 1920 than a serious book on prehistoric archaeology from 2000.

So it goes in many fields. Imagine how much longer it would take for science's self correction mechanism to operate if our knowledge were encoded solely in a "living" information system aligned with only currently fashionable ideas.


But wouldn't you agree reading about this topic now, with the counter-argument of the post-1960 consensus (though I have a hard time thinking most things debatable like this are ever strictly consensus), and the follow-up DNA evidence, is far more informative and convincing than what you would read in 1920? It seems that the people guessing from 1920 might've had about as much chance of being right as the people guessing in 1960 with neither having the relevant evidence to back their claim.


Come on: if you're excavating an ancient village and find a layer of charcoal littered with arrowheads and skulls and find totally different pottery before and after the charcoal layer, then unless your brain has been codrycepted by fashionable academic nonsense, you're going to conclude that someone conquered that village and replaced its people --- not that the charcoal layer represents some kind of ceremonial swords-to-plowshares peaceful pottery replacement ceremony. For 50 years, academics insisted on the latter interpretation. If you'd read old books, you'd know the post-1960s consensus was nonsense even without ancient DNA. Ancient DNA merely created a body of evidence so totally compelling that not even diffusionists (the "pots not people" crowd) could stick to their stories and keep a straight face.


Is it wrong that Wikipedia articles are only three pages long? Does anybody claim that reading an encyclopedia article (Wikipedia, Britannica or whatever) is better than reading a scholarly book on the given topic?


People with just a Wikipedia-level knowledge will argue with actual experts as if Wikipedia is equivalent.

Of course, you can't really say that distrusting experts is unique to encyclopedias.


But isn't that the same as people who argue because "they read it on a magazine/on the newspaper"? So they are wrong -- is it Wikipedia's fault though?

An encyclopedia is always the starting point, never the end of serious research. (It's ok however to stick to Wikipedia if a superficial acquaintance with the topic is enough!).


If there's something in a wikipedia article that experts will argue against, the article needs updating to be compatible with, even if does not include, expert-level knowledge.


Some people just demand that Wikipedia be a universal factual info database, missing nothing. It'd be nice, though.


There really isn't such a thing though for many topics as a universal factual info database. For many, e.g. historical, topics different books have different areas of focus and interpret events differently. Encyclopedias do to a certain degree (and historical "truth" may shift over time) but, in general, they're not the place to hash out the "right" interpretation of events.


> historical, topics different books have different areas of focus and interpret events differently

In those areas the fact isn't the base fact, but claims of fact. We don't know who explored which bit of the great pyramid in which order, and may never, but we know of many specific claims.

The fact check wouldn't be "The great pyramid X" but "Herodotus said X about the great pyramid".

> in general, they're not the place to hash out the "right" interpretation of events.

Once you scope the problem correctly it's not a problem. The point isn't to solve historical riddles, it's to document what evidence we have.

Sometimes that evidence is broadly accepted measurements (land area of Australia) and other times it's not.


Wikipedia is a summary. Is it meant to be deep? If you want to go deep on any topic you'll need to go to other sources.


Please, before assuming you know what I mean, read the complete comment.

The trouble with Wikipedia is that it's an inch deep.

That's not its biggest problem. Wikipedia is biased.

Of course there's political bias in the American fashion. But that's not all. There is bias about History depending on what country is telling the story. And there is a strong bias even in scientific topics (maybe specially in them) when there are commercial interests involved.

That's not specific to Wikipedia.

But when you research some topic, reading multiple books, you'll notice there are different opinions, you learn to discount bias looking at the procedence. Wikipedia tries to adopt a neutral tone and cite different sources, but sometimes it does a terrible job at it.


> There is bias about History depending on what country is telling the story.

A funny example of this is fan death [1]. Comparing English to southeast Asian languages show that Asian languages pages suggest that it's real (at least last few times I translated it).

An, in a bit of relevancy, the Japanese page has been overwritten with "I love you" and "I'm sorry".

[1] https://en.wikipedia.org/wiki/Fan_death


Well that was bizarre. But if you have a government agency saying that electric fans can cause asphyxiation, and even that they have data to provide it, can you blame people in that country for believing it?


Check the wiki page in another language, closer to the affected area.

Its not a direct translation, its an entirely different encyclopedia and can be far more robust.

(Maybe an LLM could harmonize all the wikipedias across languages)


My hope in this regard is that Wikipedia pages tend to have much more than an inch of citations. If even a significant fraction of those sources can be digested, it could give rise to a much deeper source. The really cool thing about their chatbot is that it appears to have the ability to summarize and highlight where the summaries came from. Extending that to the ability to summarize the backing sources, and point to where that came from, could be an incredible research tool.


Maybe a Wikipedia based LLM could make decisions about which papers are factual enough to include in a more extensive LLM.


You'll end up with a corpus consisting entirely of paywall text and 404 pages.


Can there be factual correctness from being grounded in scholarly books as such?

The amount of disagreement between researchers over time and changing consensus requires an external arbiter of individual facts at the very least.


If anything Wikipedia has a problem with not being able to summarise and having too much original research. Some articles are just pages and pages of formulas with few citations.


Wikipedia is full of references to scholarly sources. Make a bot that follows the references to the sources and incorporates them in the training data, and Bob's your uncle.


Wikipedia is not some thing handed down by God at the beginning of time. It’s a work in progress by volunteers. If you think a page lacks depth, you’re free to update it.


Still deeper than most people, though. You can't put all of human knowledge in Wikipedia, but it's extremely thorough in the basics.


Well, it has link that it can follow.

Add some AI to take the footnotes and get to the sources and train on that.


I wonder if that's the next iteration of Wikipedia. Right now, the model is to summarize secondary sources. Once summarization becomes trivial via LLMs, the most valuable thing to do would be to assemble ever-expanding datasets of secondary sources for the LLM to pull from.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: