Hacker News new | past | comments | ask | show | jobs | submit login

6 million pages! A quick look at their distribution:

3.6M of these pages got less than 10 views during December. 4.3M pages got less than 100 views each. 2.3M got less than 1000 views. That's the long tail. On the other side 680K pages got more than 1,000 views, 111K pages got more than 10k views, only 6k pages got more than 100k views, and 90 pages were able to gather more than 1M views (Star Wars, The Mandalorian, The Witcher...).

In summary: The top 7.2% of Wikipedia pages earn 87% of all the monthly views.

Yesterday I posted a deeper analysis on how these pages get their daily views:

- https://towardsdatascience.com/interactive-the-top-2019-wiki...

Apparently the most popular pages are all related to movies (Avengers, Joker, ...), series (Ted Bundy, Chernobyl, Game of Thrones), and deaths.




Whenever seeing the staggering number of Wikipedia articles it reminds me of my own recent experience with creating an article. To centralize some of the knowledge of now niche set of keyboards (manufactured by what used to be one of the FAANGs of its time) I set to create a page. After adding some - what I thought - useful and interesting data, the page got deleted as it was not "notable" enough, i.e. lacking sufficient number of external references. These keyboards did exist and are used until this day however are not as popular (and therefore not as commented/described) like Apple's keyboards (which do have their own page on Wikipedia) so the line was drawn there. In the end I found it funny and also interesting to see how Wikipedia works. If the editors were not so strictly bound to the set of rules there would be probably tens of millions of articles/pages now.

Another experience I had was watching a video on YouTube from Japan where a guy was brewing coffee which reminded me that the brand of the coffee equipment used in the video is sometimes sold in coffee shops in Europe. I wanted to know a little bit more about the company so I went to Wikipedia - the English page got deleted because there was little information about the company on the English speaking part of the Internet. This does not say anything about the company (which seems to be hugely popular in Japan and certain social circles in Europe), rather it demonstrates the limit of the model Wikipedia uses for curating the articles.


There is a struggle between "deletionist" and "inclusionist" factions within Wikipedia. One side tends to find fault (many times real faults, mind you) and err on the side of deletion; the other prefers to err on the side of inclusion in Wikipedia.

All agree they want to keep some standard of quality in Wikipedia, they just disagree on the method and threshold for deletion.

Just be aware Wikipedia is not a single hive-mind :)


I find all of this really hilarious especially considering that a lot of Wikipedia is about trivial stuff like entertainment (films, sports, anime, whatnot), but whenever someone tries to a document something technical in the wrong way ... NO SOUP FOR YOU!

I find it really really rude and disrespectful to just trample over other volunteers' work like this. I'm all for deleting crap, but outright deletion instead of offering suggestions first and then being super-defensive when asked "hey, what's up?" Yeah nah, that doesn't fly. It's like just closing people's PRs on GitHub without offering suggestions on how to fix the patch first with just a single line of "doesn't fit code standards", "doesn't work on Windows", or just a link to a lengthy page.

Guess everyone is too busy fighting over capitalisation of movies titles or whether or not the Stanley Kubrick article should have an infobox to offer meaningful suggestions to new users shrug.

On Wikipedia, the biggest asshole with the most persistence wins. They've deluded themselves in to thinking this is "meritocracy".


I guess you're an inclusionist then. A lot of people agree with you. (Note some in the deletionist camp want to delete many of the articles about trivial stuff!).

I find Wikipedia articles on technical/science subjects tend to be higher quality (with some exceptions of course), those on history tend to be so-so, those about current political matters tend to be outrageously one-sided, and those on pop culture matters are often hilarious -- an article on a subject related to anime can be a lot longer and more detailed than one about real history!

Back in the olden days I remember there used to be a Wikipedia article about vampires, written from the point of view of someone in the "vampire community" who wrote about them as if "vampirism" was a "lifestyle", and when it got edited/deleted of course these people threw a fit. But this was a clear case of garbage that doesn't belong in a general encyclopedia.

Contributing to Wikipedia, especially in topics where there is controversy, is very frustrating. Like you said, the person with the most time or clout with the online community tends to "win". Having lots of rules of the form WP:<this or that> in order to overwhelm you and make you give up can feel very tiresome. This is why I don't contribute; I simply don't have the time for it.

Still find Wikipedia immensely useful though. I'm glad it exists.


> I guess you're an inclusionist then.

Well, not really. I'm mostly a "treat people's time with respect"-ist. The biggest problem I have is not so much with deleting as such, but rather that for some this seems to be the first tool they use once they see something that they think is not a good fit. Not everyone in the deletionist camp is like that, but there are quite a lot of them and you will run in to these people sooner rather than later.

I've saved more than a bit of content by just copy/pasting the deleted content somewhere else (instead of a separate article), or just by Googling for a few extra references.


These numbers don’t add up... does the 4.3M pages that got less than 100 include the 3.6M that got less than 10? I should image so, otherwise that’s 7.9M pages. But then you say only 2.3M got less than 1000 views? That obviously can’t include the less than 100 or 10’s, but if you add them then you’re at 6.6M..??


I was wondering the same!

Not every page is an article, and I'm counting page views from their logs.

https://en.m.wikipedia.org/wiki/Wikipedia:What_is_an_article...

Even then you can see in my query that I have removed most of their "special" pages.


I think its a typo 5.3M not 2.3M have less than 1000 views. Makes sense with 600k over 1000 vies and 6M total


And here you can see the most visited pages for every week: https://en.wikipedia.org/w/index.php?title=Category:Top_25_R...


Wow that's fascinating, I'd never really thought about page distribution.

I have good feeling that the majority of those views are due to Google providing the answer for almost anything that can be found on Wikipedia in an "instant search result" box. It has a wikipedia link beneath it to read more, and I guess many people do.


Also makes me wonder how many pages are not viewed because instant search result was sufficiently informative.


The 80/20 rule applies to many these sort of things. In case of Wiki, it's 90/10

https://en.wikipedia.org/wiki/Pareto_principle


That's really not too shabby for a long tail though. What's the % that had zero views in December? Do they account for bots?


Sometimes I worry about projects about Wikipedia and the Archive. When will they run out of storage space and money? When will it become too big to handle?


That will never happen for Wikipedia. Their content is tiny in disk space and not growing very fast. They are also drowning in money, despite what their donation begging messages claim.

The Internet Archive, on the other hand, is drowning in data and pinching pennies. Still, they're doing okay, and the constant decrease in per-gigabyte HDD costs are allowing them to continue for the time being. They absolutely need and deserve your donations, however.

If HDD price/gig stopped going down, however, they might be in a lot of trouble.


You're saying they rely on the decreasing prices, rather than low prices? How's that?


The amount of data is growing superlinearly while their resources (probably) aren't, so a dropping price for storage is currently allowing them to keep up (if just barely so).


As a mathematical curiosity: an exponential decreasing price means that the total expense to store a Gigabyte forever is finite.

Eg a 10% decrease a year, means that storing forever costs approximately ten times the first year's storage cost. https://en.wikipedia.org/wiki/Geometric_series#Sum has the math.


A stream of perpetual future payments does not need to be decrease exponentially (or decrease at all) to have finite present value. A contract to receive $1/year forever (which you can resell or leave as inheritance etc.) is worth 1/r, where r is the risk-free rate of return at the time the contract is sold. At 1% interest a $1 perpetuity is worth $100 — the amount of money you’d have to put in a bank today to expect to get $1 in interest per year forever.


Yes, the exponential decrease is sufficient, but not necessary.

Of course, you are putting an exponential decrease into the present value calculation.

To give a truly non-exponential example: summing up 1/n^2 also only gives you a finite value.


Wikipedia is drowning in money?

Their audited financials:

- Cash in hand: $100m, Annual expenses: $91m - https://upload.wikimedia.org/wikipedia/foundation/3/31/Wikim...

Internet archive:

- Net assets: $2m, Functional expenses: $18m - https://projects.propublica.org/nonprofits/organizations/943...


Yes, but look at how those $91M break down. Almost all "actual" expenses (including hosting) went down from last year, but the salaries continued to grow by several million, as they do every single year. I love Wikipedia, but I've stopped donating for that reason.


How much should an organization spend in salaries?

Snapchat spends 1,649m in operating expenses. Wikimedia 46m in salaries.

Now, are these dollars invested paying off for our society? That's for all of us to decide.

- https://www.nasdaq.com/market-activity/stocks/snap/financial..., https://upload.wikimedia.org/wikipedia/foundation/3/31/Wikim...


The problem with Wikimedia salaries is that they often are allocated to some projects / roles that don't have much to do with the principal activity (Wikipedia). Although I recognize that it is important to fund / subsidize stuff that won't otherwise be included, I wish their final goal was more concentrated on the wellbeing of Wikipedia, rather than other minor projects.



Actually wikipedia has too much money. If anything we should stop giving it money for some time.


Yep.

https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2...

> In 2005, Wikipedia co-founder and Wikimedia Foundation founder Jimmy Wales told a TED audience: So, we're doing around 1.4 billion page views monthly. So, it's really gotten to be a huge thing. And everything is managed by the volunteers and the total monthly cost for our bandwidth is about US$5,000, and that's essentially our main cost. We could actually do without the employee … We actually hired Brion because he was working part-time for two years and full-time at Wikipedia so we actually hired him so he could get a life and go to the movies sometimes.

2005: bandwidth cost of $5000 in with 1.4 billion page views

For 2018:

https://wikimediafoundation.org/about/2018-annual-report/

https://wikimediafoundation.org/about/2018-annual-report/fin...

https://meta.wikimedia.org/wiki/Wikimedia_Foundation_salarie...

15 billion page views

$145,850,778 in assets

$10,901,208 in liabilities

$134,949,570 in net assets

$104,505,783 in revenue ($97,748,964 from donations)

$81,442,265 in expenses ($38,597,407 salary, $2,342,130 tech costs, others)

I am not that educated on their operations but it seems a bit crazy to me how only 2.4 million out of all expenses is the server costs. And the 39 million salaries are all to their employees none of which do the actual content contribution

Going from 1.5 billion page views at $5000 to 15 billion page views at 82 million is crazy imo.


The WMF does quite a few things besides paying for the hosting costs of Wikipedia. As one example, their latest project is Wikidata, a general-purpose, language-independent knowledge base which already gets more edits over time than the English Wikipedia does. Given that developing such a generally-useful knowledge base has been a "holy grail" of AI for literally decades and that Wikidata is freely available at zero cost right now, those donations might already be more than paying for themselves.


There was a behind-the-scenes talk at 36c3 in Dec 2019 where a few Wikipedia Admins show what the infrastructure looks like and what they actually do: Infrastructure of Wikipedia https://media.ccc.de/v/36c3-73-infrastructure-of-wikipedia


I compare the value I get from them to the $$$ I sometimes send them, and I am content.


I wonder what the correlation is between inbound links from other articles and views, though.


Ehm, isn't it super obvious that this, too, follows the Pareto distribution?


Do we have a power-law distribution here? (They are rare)


Isn't Pareto a power law distribution?


This is unremarkable since this is just Zipf's law: https://en.m.wikipedia.org/wiki/Zipf%27s_law


> 90 pages were able to gather more than 1M views (Star Wars, The Mandalorian, The Witcher...).

So they could cut their network costs by a lot if they just removed pages about current popular culture, which have the highest view rate and a cultural value of exactly zero.

Edit: I'm being sarcastic and a bit bitter, the comments below all make good points. However I find it sad, though unavoidable, that the highest page views are for highly marketed entertainment franchises that have a slim chance to withstand the test of time.


> a cultural value of exactly zero

I hope you recognise that this is only your opinion. The beauty about a crowd-sourced encyclopedia is that you don't get to define what other people find interesting or of cultural value. The (utopic) mission of Wikipedia is to encompass the whole human knowledge, and that includes a lot of niche topics which might be interesting only for a small amount of people, but it is not limited to that. I love that Wikipedia has a lot of info on "current" (i.e. popular right now) topics, as I get to know what I want to know avoiding the gazillion ads (even though I blackhole all of them) or click-baity titles from the rest of the internet.


Their network cost is negligible. Overall hosting cost them less then $3 million, actually they spend more on travel & conferences [0].

[0]: https://upload.wikimedia.org/wikipedia/foundation/3/31/Wikim...


This is dangerously dismissive. Cultural value is subjective, and in any case these articles provide an in-road to other articles, giving the opportunity to expand people’s knowledge further.

I would, however, support Wikipedia having ads on articles in such pop culture categories – and those alone, of course, to avoid conflict of interest where factual accuracy is more vital. This would have to be carefully managed by the Wikimedia Foundation — marking articles for monetisation should be handled only by a specific role – but if done quite carefully and in a privacy- and user-respecting manner (eg Carbon ads [0]) could potentially subside the academic content without damaging Wikipedia’s brand.

[0] https://www.carbonads.net


That's a terrible idea in my opinion, any advertising at all on Wikipedia would be the thin edge of a very unpleasant wedge. Wikipedia is many people's first stop for knowledge, we really don't want people being able to buy eyeball space next to what's supposed to be unbiased content.

Additionally, expanding advertising generally is a bad move when exposure to advertising has been shown to be negatively correlated to quality of life and contentedness. We need to be transitioning away from advertising as a primary driver of the online economy, not fuelling it further. Advertising should only be employed when there's no better option in my opinion, not the first thing you reach for monetising a site. We should see advertising like fossil fuels, a regrettable necessity that we should aim to phase out.


As soon as you open the barn door for having ads at all, the floodgates will open. Also a pointless concession given Wikipedia has far more money than they need.

If Wikipedia ever puts ads up because of the bureaucratic WMF cancer exceeding donations, that's the day Wikipedia dies.


There is little money in creating a piece of artwork that lasts 300yrs, like a church or an exquisite painting for the producer. Companies can't survive on that, all the money goes to magic pop art and maybe some heroic efforts. After that you're outside the system. To quote a fading movie, you either die a hero or live long enough to become a villain (dead genius).

You probably know it and it feels worth saying anyway. I yearn for the enduring masterpiece that transcends simply 'changing the game' and is instead appreciated for its own being for hundreds of years.


Why do they have a cultural value of exactly zero?


More to the point, what even is "cultural value?" After Googling it, I found that it doesn't even seem to be a well defined phrase, much less a quantifiable concept.

Wikipedia uses notability[1] to decide which articles to include. Notability is defined as significant coverage in 3rd party sources. Popular movies are "notable" because other people write about them.

[1]: https://en.wikipedia.org/wiki/Wikipedia:Notability


There's too much in art that relies on unmeasured effects like the impact on the audience's consciousness and the execution on structure and intention to be bothered with numbers beyond box office or critic reviews. You could measure something related to those concepts and you would be capturing a fourth of available judgement for cultural impact.

Culture is a game we have been playing for thousands of years. Art pieces with an observable influence on other artists, the public audience or any human decision making along with an ability to survive the wear and tear of existing in the world as an artwork is a good place to start summarizing an artwork's cultural value.

A grand cathedral like st peter's stands as an example of high quality baroque architecture, the value of the church as an institution. It remains intact after a large number of years, not having been physically deconstructed. It's influence on other artists is common to this day, recently used as a baroque example of maximalism on HN. The site attracts visitors and has yet to be rivalled in it's unique baroque grandeur.

Those influences are quantifiable, and kind of comparable to a flash in a pan like a Hollywood film that gets most of its attention and influence in the first few months then sits in an institutional back pocket whilst other artists denigrate it so that new films can occupy the spotlight and the cycle continues. These films are sometimes used as reference for making new films and the industry puts out two similar styles of film at the same time reducing uniqueness. Generally otherwise left to sit in DVD/bluray jewel cases or disney vaults as they fall out of favour.

You can rank these impacts and have conversations over which interpretive labels they fit into and conclude some are mor valuable than others.


You don't think those views result in a stronger brand and more views on pages with a higher cultural value?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: