Hacker News new | past | comments | ask | show | jobs | submit login

Here's a comparison of all the versions of Wikipedia

http://meta.wikimedia.org/wiki/List_of_Wikipedias#All_Wikipe...

Take a look at the "depth" column. Dutch and Swedish used to be your average Wikipedias, with article counts somewhere in the ~500k. They both decided that it was more important to have a higher article count so they now employ bots to create new articles with the bare minimum of data (sentence or two, pulled from other language Wikipedias). It usually becomes more extreme when there's some sort of milestone ahead. There's even worse offenders, for example Waray-Waray.

I'm genuinely wondering whether Wikipedia should have a policy of deleting all such articles and disabling their creation. I don't think it's in the spirit of an encyclopedia.




>Dutch and Swedish used to be your average Wikipedias, with article counts somewhere in the ~500k. They both decided that it was more important to have a higher article count so they now employ bots to create new articles with the bare minimum of data (sentence or two, pulled from other language Wikipedias).

Interesting! The german wikipedia in comparison is doing the exact opposite and has very strict relevance criteria for new articles.

http://translate.google.com/translate?sl=de&tl=en&js=n&prev=...

http://translate.google.com/translate?hl=en&sl=de&tl=en&u=ht...


> I'm genuinely wondering whether Wikipedia should have a policy of deleting all such articles and disabling their creation. I don't think it's in the spirit of an encyclopedia.

Hopefully, it will soon be possible to generate such minimal articles automatically from the language-independent and structured data on Wikidata. At this point, hopefully, it will be possible to create just the stylesheets and get the data from Wikidata instead of populating small Wikipedias with bot-created articles; only the relevant work will remain (to be done on Wikidata in a language-independent way).


I still think it's good because even if it doesn't provide good info, it links you to a list of articles in other languages that does. Providing you speak more than one language, it's very useful.


Around 2006 my (German) high school teachers started to check Wikipedia articles for all essay topics they assigned. But they never checked the English or French articles. Good for me!


I remember my high school teacher's first introduction to an online encyclopedia article - we were tasked with some writing, and one of my friends had this new "Encarta" software (that was 1994 or so).

We made our homework in 15 minutes, proceeded to goof off all the afternoon, and blew away the teacher (we were intelligent enough to slightly modify it).


What's the incentive of artificially inflating the article count? Is it just a stupid race to be "on top"?


Once the article is created, it can appear in Google. Once it's in Google, it's more likely to have visitors and some of them might improve the article.

I have no idea whether this was the rationale or even whether it's a good idea, but there might be more to their strategy than article count.


Partially it's just a stupid race, see e.g. page in Russian Wikipedia [1], completely dedicated to discussions of the "Wikipedias race", celebrating wins over other wikis, coordinating bot article creation, blaming competitors for "unfair" low-quality bot article uploads, etc. On the other hand, batch uploads are not completely useless -- batch-uploaded article stubs have consistent style, depth and quality (something that's hard to get with crowd-sourced articles), and with script-created articles it's possible to get exhaustive consistent coverage of boring topics like rivers, insects or villages.

[1] http://ru.wikipedia.org/wiki/%D0%9E%D0%B1%D1%81%D1%83%D0%B6%... (title -- "Race")


This has worsened the "random article" feqture on Swedish Wikipedia by a lot. Try your luck, I bet you'll get a stub article about some insect:

http://sv.wikipedia.org/wiki/Special:Slumpsida

Regarding your suggestion: this is, after all, a decision made by the Swedish/Dutch Wikipedia community. I'm not familiar with Wikipedia's hierarchy, but I'm not sure that this is unencyclopaediac (?) enough for an outside intetvention to be a good idea.


Looking at how they calculate "depth", I would say it's just as likely measuring edit-wars. It's not evident that it's measuring quality and I don't think it should be used as a target or marker for quality.

I often find myself clicking on the Dutch translation link - it regularly has more concise, useful data.


>I often find myself clicking on the Dutch translation link - it regularly has more concise, useful data.

For content that you care about, perhaps. Not for the content that they use to raise the article count number. The latter is mostly made up of one to two sentence articles scraped from foreign Wikipedias. Oftentimes they're articles about towns in remote countries.

Here's an example of such an article created by a bot

http://nl.wikipedia.org/wiki/Abitanti


Same type of data all over English wikipedia, e.g.: https://en.wikipedia.org/wiki/Reeve,_Wisconsin

The existence of those pages has no disadvantage and does not detract from the quality of the encyclopedia (if correct) - wikipedia itself points out that it is not a paper encyclopedia and there is no limit to the amount of content.

I do not understand the disadvantage of listing Abitanti and providing its location within Slovenia.


My point was regarding the usage of bots to scrape that content and adding it to a specific Wikipedia in an attempt to boost the article count, as has been happening on certain Wikipedias. The article that you linked to was created by an actual user.

For example, there were tens of thousands of articles created by bots on the Dutch Wikipedia, within a day, around the time when it was about to surpass the German Wikipedia. I don't consider that to be something appropriate for an encyclopedia. It's really difficult to find any alternative explanation for such acts, other than "we wanted to be ahead of that other Wikipedia in article count".

>and there is no limit to the amount of content.

WP:Stub would make it seem that it's at least not endorsed and that there's an expectation of having such articles expanded. But these don't get marked as stub, because there's no expectation of them having more content, just a mere increase on the article counter.


One huge problem with deleting or eliminating geographic records, is sooner or later something will happen there and its "a lot of work" to reinstate, especially if it was deleted by the deletionist jerks.

For example, a couple years back a dude went nuts and shot several northern WI hunters for no apparent reason. Not in Reeve but somewhere up there. Screwing around with wiki to make it harder to use and contain less information (why?) merely makes it harder to add actual real news when it later happens.

Deleting today creates a pointless load dragging down the future when it inevitably becomes notorious. Boring individual human beings might fade into obscurity, but geographic locales will inevitably "someday" be front page news for some crazy reason or another. Reeve WI will someday have its name up in lights. Maybe not today, maybe not this century...


Maybe it's just not the kind of contents I expect to find in wikipedia, and as such is just click-bait since it's likely to be high in google search results?

If I want to know the location of Abianti within Slovenia I'm more likely to turn towards some mapping website rather than wikipedia, where I'd expect a more detailed description of the city's history and other relevant information.


Many wikipedia articles have geolocations, so you're only one click away from a OSM map of the location.

And any Dutch travellers in that region will get an article about local towns suggested on their smart phones.

Finally, the wiki data project should soon (if not already) allow changes to data like population propagate from one language page to all the different versions.

In short, don't think small with wikipedia, it can be better than any Encyclopaedia in existence, possibly better than many can even imagine.


In this case, Abianti is a town (if you can call it that!) of 12 inhabitants. It's quite likely that there's no significant recorded history to it outside the heads of the dozen people living there.


Maybe it doesn't have its place in an encyclopedia then...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: