"Users apparently dislike this workflow so much that they don’t bother contributing at all: significantly less people are editing Wikipedia than did a few years ago."
To be fair, I think this exaggerates the role of the editing UI -- if the editing UI was that bad, then people wouldn't have contributed in the first place. The more common narrative for reduced participation is the growth of cultural issues which make it less rewarding to participate (e.g. http://www.technologyreview.com/featuredstory/520446/the-dec... ).
(Edit: Not trying to be negative -- the suggestions in the article sound cool anyway.)
I was also somewhat skeptical of the author's claim that the barrier to entry for Wikipedia editors was the editor itself. I've read a few papers concluding that Wikipedia contributions are declining because of the community pushing out people who don't understand the many rules that have been developed[1], or in other cases because editors have gotten into turf wars over articles they feel some sense of ownership over[2]. Meanwhile, I haven't read anything concluding that the editing environment was a strong deterrent (although I haven't been looking, so I could be willfully blind).
Edit: more articles identifying interpersonal friction as the leading factor deterring would-be editors [3,4]. At this point I'm suspicious of the author's due diligence in studying the problem, making the proposed solution more dubious.
[1] Suh, Bongwon, et al. "The singularity is not near: slowing growth of Wikipedia." Proceedings of the 5th International Symposium on Wikis and Open Collaboration. ACM, 2009.
[2] Thom-Santelli, Jennifer, Dan R. Cosley, and Geri Gay. "What's mine is mine: territoriality in collaborative authoring." Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2009.
[3] Kittur, Aniket, and Robert E. Kraut. "Harnessing the wisdom of crowds in wikipedia: quality through coordination." Proceedings of the 2008 ACM conference on Computer supported cooperative work. ACM, 2008.
[4] Halfaker, Aaron, Aniket Kittur, and John Riedl. "Don't bite the newbies: how reverts affect the quantity and quality of Wikipedia work." Proceedings of the 7th international symposium on wikis and open collaboration. ACM, 2011.
> editors have gotten into turf wars over articles they feel some sense of ownership over
That's exactly what killed my one and only attempt to substantially improve an article (not just typo-fixing and stuff):
The two guys with a lot of edits on that page couldn't stand someone else editing "their" article.
Sure, they tried to bring up superficial disagreements about the content, but that could have been resolved trivially. It was very obvious that they didn't want someone else on their lawn.
Unless Wikipedia fixes exactly that ownership problem, specifically, I doubt they can improve their attraction and retention rate of contributors.
And that is a social problem. I bet it will never be fixed.
If you go in and blam an entire page, no doubt the original authors will cry foul; it probably took them a fair amount of time, even if what they wrote wasn't particularly good. It's also fairly rude without at least giving them a chance to improve sections.
Incremental editing will also give you a far stronger case when they get arsey, particularly if your inclusions are better than what was there already.
The key is trying to work with the grumpy people and reassure them that you're not trying to take their article away from them.
The key is trying to work with the grumpy people and reassure them that you're not trying to take their article away from them.
In my view, that right there is the problem with Wikipedia. Just because someone puts up content on a "commons" site does not give them sole ownership or even a moral claim to resist changes proposed by the those who come after them.
Why should the average Wikipedia user not have the chance to clearly see the differences of opinion (in terms of value of information) right on the article page itself? The answer is most likely because the Wikipedia UI is not designed for lucid presentation of such differences.
Wikipedia's (subconscious?) striving to be an authoritative source of information is in direct conflict with very nature of information (and human knowledge) to be non-self-consistent and ever evolving.
It would be technically trick to do though I guess you could have something like those little speech bubble symbols that they have alongside the articles on Medium which would give access to alternatives - and possibly allow voting/commenting on them in a Hacker News style.
I tried to improve a very small section, about six or seven sentences. It was clearly wrong, I had lots of citations, but they preferred a wrong "my" article to a better "our article".
After the second or third edit, incorporating their feedback, someone called it "vandalism" and reverted everything.
this hasn't been my experience at all, I guess it depends which areas you're contributing to. I've contributed to CS and art pages, with good success, I stay clear of opinion pages, or contentious topics.
"It's also fairly rude without at least giving them a chance to improve sections."
To me, this only furthers and enhances problems with ownership. If my first approach to "collaborative editing" is "give the original writers a chance to improve" rather than "collaborate" and "edit", then it only enhances the perception or "reality" that that article belongs to them.
There are certainly ways to make original authors justifiably upset, wikipedia definitely is a place where kindness pays dividends. While probably a factor in some cases, though, I don't think lack of socialization for new users is sufficient to explain what's going on.
I used to see more great collaborative experiences, now I see even incrementalism and conciliatory pleas get rebuffed.
I wouldn't think much of my (anecdotal) experiences, but with all these articles claiming the same thing, I'm willing to accept this might be a wider trend.
As you say, dealing with the UI is fine. Dealing with the bureaucrats isn't. Having your account banned and your new article marked for spam by a very abrasive person, when trying to add legitimate content, rather than being supported through the maze of rules, as happened to a colleague the other day. That is a problem which stops meaningful content being added.
For anybody who does an occasional 5-10 edits per year. The admins leave no moment unused to point out that you're doing something wrong (most likely based on Guideline CA 13, where CA stands for cryptic abbreviation₁) and in a lot of cases you're left to the god will of a single person.
The whole appeal system needs to be streamlined, it can't be that there a multiple entry point's for a deleted article, solely because it was deleted for different reasons.
It's the bureaucrats that drive people away, compared to Wikipedia the German tax forms/appeals are outright user friendly by now.
₁ I see no good reasons why abbreviations should not be expanded by default in discussion pages, makes it look way less intimidating to the casual user.
I put up another GitLab article for you - the previous was deleted for lack of links indication notability. I linked a couple of TheNextWeb articles that will hopefully do. It's kind of hard work this stuff finding references and the like. (I'm not an admin, just an occasional contributor).
Thinking about those kind of issues I think they could improve the friendliness of the process by sending helpful / apologetic emails along the lines of "Dear Mr X, Sorry we had to take down your article because we need ... blah blah... If you can find the references just click this and do this. Best, Wikipedia team." Rather than the current situation where the article just disappears and there are somewhat cryptic comments under the talk tab. I don't think it would be terribly hard to implement.
Thanks for your efforts (though you might have violated some policy, don't ask me), I (User:Diskurs), I already added external sources for review here:
I don't understand were wikipedia's fixation (other than the name and origin) comes from to handle everything as a wiki page, were any kind of ticketing system might be way more transparent.
>I don't understand were wikipedia's fixation (other than the name and origin) comes from to handle everything as a wiki page
I think that's where a lot of problems come from and I think it's a tech issue that they built a wiki system and used it for everything and it's hard to change now it's ingrained. My guess with your appeal and the links is no one looked at it because it's not like an email system where someone gets it in their inbox, it's more that some one has to check on that page and maybe Bob thinks Joe is doing it or visa versa, perhaps.
I think false positives on spam are a significant problem for Wikipedia, but false negatives are also a significant problem (maybe a bigger one), which is part of what makes it tough. If you hit "random article" a bunch of times, look for biographies on currently living people of relatively low levels of importance. I would estimate about 20% of those are clearly puff pieces, either written by a PR agency, a university/company publicity department, the person themselves, someone they know well, etc. It's not the end of the world since those articles also tend to be not that frequently read (articles on high-profile people are much harder for a PR agency to control the content of), but it does make me take Wikipedia somewhat less seriously when such a large proportion of random pages I land on are obviously written by a PR hack, using the kind of diction you expect to find in a press release or an official university bio, rather than language you would expect to find in a respectable encyclopedia.
While bios are the biggest area, there's a bunch of other areas where I've learned not to pay attention to Wikipedia because it's too spammy: anything to do with tourism (full of booster and hotel spam), any company except very high-profile ones, many non-profit organizations, etc. It's good when writing about people/things that no longer exist, though; the articles about 19th-century French companies don't read like they were written by a PR agency.
This is nothing new, something similar happened to me 9 years ago.
I had just started contributing to wikipedia, a couple hours daily, then after about one week an admin decided one of my article title was typographically wrong, after a couple silent reverts from him I got to his personal discussion page to ask why and he brandished some arcane internal rules. I checked and he was wrong so I told him so but he refused to acknowledge this and suddenly I was faced with a bunch of people picking on me. I was followed everywhere i contributed and all my edits were reverted by seemingly fans of said admin who also showed up to gang vote against me in all attempt at resolving conflict.
So I went to the public wikipedia space to ask for help and discovered said admin had silently modified the internal global wikipedia policy to reflect his view and it's all downhill from there. I started receiving threats and at this point I was spending 4 hours daily on wikipedia but barely contributed a thing to the content anymore. After a bunch of ineffective bans I worked around, all my contribution were deleted on ground that I had no right to add content while banned.
Then came the messages from people experiencing the same situation who supported me because I stood up, I got tipped that a bunch of admins were discussing me on a secret page and escalated the issue until it reached jimbo wales himself who dismissed the whole thing as he said he fully trusted his admins.
Then I discovered that other people went through the same process before me, I wrote an article on the importance of newcomers for a healthy wikipedia long term and the bias of the power structure and a weakness of wikipedia that would allow a few individuals to ruin the whole project. For this I received a permanent ban from all wikimedia project and a request from the top of the power structure to delete all my contribution to wikipedia side projects, wikitravel just deleted everything no question asked.
So i did the same thing others did before me I replaced a key article of wikipedia with this very story that happened to and reverted my change so now this story would live hidden in a page history inside wikipedia and stopped contributing to wikipedia to do something worthy of my time instead.
back then wikipedia was already a totally different thing that what it pretended to be and imho the problem wikipedia has cannot be solved with a better editor.
Thanks. I am sitting down with Wikimedia Sweden next week to take a serious look at how we can add some of our rather good content without tripping all the wrong triggers. If it doesn't work I'll ping you. No need to agonise over it until this doesn't work, but appreciate the offer.
The basic reason that Wikipedia editing is down is that most of the job is done. The goal was to produce a reasonably comprehensive encyclopedia. Mission accomplished. Most of the important articles were written years ago. Take a look at new articles being created. They are mostly about very minor subjects.
This was typical of the paper encyclopedias. The initial creation of the Encyclopedia Brittanica was a huge job. The periodic maintenance update was a far smaller one.
> Mission accomplished. Most of the important articles were written years ago.
I would contend that should be leading to more edits. Few people have the depth of knowledge of expertise to contribute to 'US Foreign Policy' article, but masses can contribute to an article about their home town.
For example, the article on my town doesn't even explain its unusal name. At least half of its residents could make that particular edit... but have no incentive to do so.
> At least half of its residents could make that particular edit
What proportion could/would actually cite a source for the edit, though? It's likely not that hard to find a source: the local library probably has some local-history books that explain the source of the name. But in my experience the proportion of people willing to open a book before making an edit is smallish. There are some local "edit-a-thons" trying to improve that, though. I've never been to one myself, but from what I can tell they tend to be organized at libraries or universities, and the organizers have already pulled out some relevant reference materials, so people can more easily start making cited improvements to articles about their local area.
Most non-English versions of Wikipedia have dramatically less content than the English language version, and even the more common language versions like Spanish seem to be significantly less accurate. Make whatever claims you want about what role we should play in English hegemony (blah blah blah), but that's one feature Wikipedia affords where the job is far from done.
That's true, but also a different situation with different data: many of the non-English versions do not have a similar decline in edits. The total size of the English Wikipedia seems to be slowly plateauing, but many others still have approximately linear growth.
Sociological issues among established Wikipedia editors do contribute do lower retention of desirable new editors, but like Animats I suspect Wikipedia's comprehensive coverage is a significant factor (perhaps more so). It's simply less appealing to do curatorial work than to make large additions to major articles like "Genetics" or "France".
Which factor contributes more to Wikipedia's active editors drop-off, sociological issues or encyclopedic completeness? Has anyone looked at that? I'd be interested to see an analysis of how the trends in the notorious graph at http://commons.wikimedia.org/wiki/File:Enwp_retention_vs_act... correlate with word count and reference count in important articles.
I used to think this was just a cultural shift, now though I'm wondering if it's not something more fundamental.
It might be an unavoidable symptom of shifting priorities from addition to revision.
When I've worked on joint drafts, having many collaborators tackle a blank page is incredible. Our ability to rapidly churn out rough but readable pages is always stronger than I expect. Each new edit, even sentences in broken English, are putting ideas on the page, contributing value, and so more collaborators is almost always for the better.
There always comes a turning point during drafting, though, where additional collaborators suddenly become a liability. You always want a few different eyes for editing. But when you have enough collaborators polishing a mostly finished product, it will kill that piece (or add months to drafting).
My most optimistic read: maybe this is ok. Polishing is harder, not everyone is good at it. Maybe we should expect more edits to be rejected in articles that are 95% of where they need to be.
Pessimistically, though: Maybe each new collaborator just increases the risk of bikeshedding and petty squabbles. A couple bad edits create presumptions in editors, driving them to revert even high quality additions as their heuristic subconsciously shifts from "new stuff useful" to "new stuff harmful." The most stubborn or fanatical will drive out reasonable high quality editors, leading to a quality death spiral.
"Cultural issues" is the understatement of the year. Cultural catastrophe is what it is.
Wikipedia would be greatly improved if they just issued a lifetime ban to anyone with more than 100 edits. They are all small-minded despots whose only intention is carving out a little fiefdom of pages and "protecting"/manipulating them by any means possible. Usually with a political agenda in mind.
The comments on that article in MIT Technology Review seem to be a golden example of the problem with Wikipedia. A bunch of people with axes to grind saying "the problem with Wikipedia is that I didn't prevail in this or that dispute, and my brilliance isn't recognized there."
There just aren't incentives to contribute to Wikipedia aside from ego; its too hard to do it out of a sense of communitarian sharing. And its that same ego that makes the bureaucracy so awful.
I didn't mean to say it was the sole reason; just the reason relevant to my post. There certainly are other reasons and they are larger problems than the editor.
However, Wikipedia feels that the editor is a big enough problem that they have spent a lot of time and money on their Visual Editor. That time and money could have been better spent.
I sympathize with the sentiment expressed by the OP here, but his conclusions -- especially that the VisualEditor project is not worth undertaking -- are, I believe, very much off base.
The Wikimedia Foundation did a great deal of research, including professional usability testing, and reached the (I believe fairly self-evident!) conclusion that a visual editing environment was sorely needed. You can go look up videos of regular people attempting to contribute to Wikipedia if you don't believe me[1].
The difficulty of the task, the particularities of hardcore Wikipedia editors, and the importance of preserving the record that Wikipedia's edits represent makes the project have a very real amount of essential complexity. If you think this is easy then you have not done much work with complex rich text editing on the web[2]. People in this thread suggesting Wikipedia use an off-the-shelf OSS rich text editor don't understand the requirements of the problem.
The team at the Wikimedia Foundation has done an exceptional job given the task at hand, especially given their relatively small amount of funding (relative to other sites Wikipedia's size). The reasons the VisualEditor isn't enabled by default have to do with getting from a 97% solution to a 99.9% solution[3], and based on the work I've seen out of the Wikimedia engineering team, I'm sure they will get there.
"Users apparently dislike this workflow so much that they don’t bother contributing at all: significantly less people are editing Wikipedia than did a few years ago."
This isn't really feasible, as Mediawiki's markup is far too hard to parse[0]. You can't just write an IDE for wikitext in a nice language, you'd need to instrument Mediawiki itself somehow to give fine grained information.
This is the bidirectional conversion engine between wikitext and HTML+RDFa that powers VisualEditor and several other tools. It tracks source range to DOM structure correspondence as proposed in the post. At this point, the IDE is basically a user interface and performance problem. The conversion is readily available through a REST interface, but on the largest articles parsing from modified wikitext to HTML can take around 10 seconds. Most of that time is spent in the expansion of the myriad of citation templates that we like people to add. It is possible to speed this up to something more usable for an IDE, but it's not trivial.
Hello! OP here. I'm totally ignorant of Parsoid but I respectfully suggest that bidirectional lossless conversion is not possible in general.
First, it relies on the function Wikitext->HTML being injective. But isn't it trivial to create two different Wikitexts that compile to the same HTML? Whitespace is just the start of this story.
Second, apparently the template language is Turing-complete. Let's say I write a prime sieve in order to generate a page that lists the first 100 prime numbers. What would it then mean to edit "31" to change it to "30"?
(With apologies for not yet having read the things you kindly linked to.)
> First, it relies on the function Wikitext->HTML being injective. But isn't it trivial to create two different Wikitexts that compile to the same HTML? Whitespace is just the start of this story.
Yes, and Parsoid works around this by preserving some metadata about wikitext in HTML (such as information about whitespace around syntax elements) and, since this preservation isn't perfect, only reserializing HTML→wikitext where the content was changed during editing.
> Second, apparently the template language is Turing-complete. Let's say I write a prime sieve in order to generate a page that lists the first 100 prime numbers. What would it then mean to edit "31" to change it to "30"?
Assuming we're talking about VisualEditor, you currently just can't do that (you can only delete the entire template inclusion and replace it with normal text, or edit template parameters).
(As a nitpick, wikitext is not Turing-complete (there is no loop or recursion construct, recursion is explicitly checked for and causes an error), you can only write complicated algorithms by manually unrolling enough loops. However, for some time now you can also write templates in Lua, which is a proper Turing-complete programming language, see <https://www.mediawiki.org/wiki/Extension:Scribunto>.)
>Second, apparently the template language is Turing-complete. Let's say I write a prime sieve in order to generate a page that lists the first 100 prime numbers
You can't. Using recursion (even primitive) is discouraged, there are even automated safeguards that make it hard (but I don't remember what they are exactly).
But I agree with you in general, I think making a visual editor that deals with templates correctly is not an easy task.
You and I know that it was just barely feasible, and if only MW had started with a parser rather than a series of regular expressions, we'd have had a visual editor in 2005 ...
Sure, it claims to do this, but I'm still a bit skeptical that this is actually compliant. What do you make of the paper I linked to? That blog post only glosses things like context-sensitivity.
This depends on the definition of compliant. By the paper's feature-based approach, Parsoid would be 'compliant' with the PHP parser.
There is more to compliance than a simple feature comparison though, most of which can only be identified by large-scale testing. Each night, we have been running tests on a sample of 160k articles from 16 languages to check our progress. In this test setup, 99.99995% of articles round-trip perfectly from wikitext to HTML and back. Currently the focus is on visual diffing to identify remaining rendering differences.
There is still a good amount of work left until Parsoid is ready to replace the PHP parser, but most of this is actually not relevant to you if all you'd like to do is extract semantic information.
Regarding the paper: The authors correctly describe some of the issues inherent in wikitext parsing. Its conclusions are however based on strong assumptions about the implementation strategy. For example, they do not consider the option of flattening a PEG parse tree back to tokens in order to implement context-sensitive and generally unbalanced parts of the syntax. Similarly, the analysis of the parsing complexity seems to assume a lack of transclusion limits.
There exists a bidirectional parser that converts between wikitext and HTML/XML DOM with RDFa. It's called Parsoid[0], and it powers VisualEditor, as well as a number of other applications.
An IDE looks like a solution for developers by developers. I'm not sure it will appeal to most people. I consider the Visual Editor to be a good solution for non-techie newcomers. They are more used to visually similar tools.
Take a look at the Wikidata project - it's designed for editing structured data.
Unfortunately I wouldn't say it is very easy to use - the Wikipedia editor is much easier. It's also completely unclear which parts of Wikidata data are used in Wikipedia, and where.
That means the Wikidata data is usually much worse than the Wikipedia data, even if it theoretically easier for machines to use.
A specific example: Try adding a timezone to a city. That's something that make complete sense to do, and yet Wikidata gives almost no affordances on how to do it. To get even more specific, it doesn't give a suggestion on appropriate types of data (UTC +/-?, TZ Names?) and it's completely opaque how to deal with summer time (probably via a refinement, but it's not clear what it should be called).
I agree it's pretty easy to add a Freebase ID to a Wikidata ID. Unfortunately this isn't really that valuable, because it's easy to a machine to work it out too.
You can embed statements from a Wikidata item in the corresponding article on Wikipedia using the {{#property:…}} parser function. This is largely unused on the English Wikipedia, but some language versions already use this for their infoboxes – for example the infobox on this page is partially generated from Wikidata: https://pl.wikipedia.org/wiki/Ankara
Do you know why the English Wikipedia doesn't use it? It is simply because the English Wikipedia information is generally better than the corresponding Wikidata information?
I believe it's simply because no one has yet put in the effort necessary to start using it. It might be a conscious decision, but if it was discussed anywhere, I completely missed that.
(English Wikipedians in general are very "territorial" in my experience, so to speak, and rarely appreciate efforts that put some piece of "their" articles beyond their "jurisdiction", like Wikidata does. Editors from other language versions are a lot more welcoming.)
> Visual Editor was rolled out in 2013. Thing is, editors hated its bugginess so much that the roll-out was reverted shortly afterwards. (…) Why did the Visual Editor fail? Because it tries to deny the basic fact that Wikipedia is a program, not a word document.
Eh, I can't really agree with this. VisualEditor rollout was indeed premature (critical features like template editing were developed literally days before it), but a year has passed and it's gone a long way since then. Haters still hate it, but personally I find it more pleasant to use than wikitext for many tasks.
This was a really interesting read, though. I still this there's value in the WYSIWYG paradigm, even if we have to fall back to "IDE mode" for infoboxes and such (right now if you double-click a template, you're shown a key-value table to fill in; there's no live preview…). After all, most of the content of Wikipedia is text with some light formatting.
I must say that Wikipedia community has become very effective at giving partisan editors tools to target undesirable editors - a Byzantine collection of policies which admins overzealously enforce without a second thought, and without thinking of a big picture. Admins in fact are incentivized not to involve themselves, which basically gives the enforcement gun to POV-pushers which are intimately acquainted with all the glorious details of the relevant wikilese. I urge admins to think outside the policy-enforcing request-observing mode.
> The one unconventional suggestion is the idea of a correspondence between the characters of the source text and the HTML text.
Did you get a chance to check out fellow HN homepage link Paperman [1][2]? It offers exactly what you propose: Double-clicking on the results frame highlights the relevant line in the source, and vice versa. This could easily be adapted to match your suggestion.
I would echo the suspicion that the problems here are cultural and not technical, however, I think the solution is kind of technical.
Basically, in future why assume we have one definitive repository of objective fact? The notion is ridiculous. What I believe will happen is a git-ification of Wikipedia, so you can fork the whole thing, run your own when you disagree with the direction, and pull in changes from those you trust. If you're going to attempt to fix this class of problems this is the only way forward.
Interesting, I didn't know about mediawiki-mode. Something like that does sound like the right way to go for a certain class of "power users" who want an IDE: build on top of an existing IDE, with interaction to Wikipedia itself done via the MediaWiki API.
My understanding is they don't want more idiots editing, it's a 'feature' that it's tricky to edit.
I also considered the decline is that it's kinda 'complete' it's not like the old days where it was easy to contribute something important and meaningful.
gwern had a thing or two to say about wikipedia. As I remember, he said its biggest issues were cultural and that technology issues were a red herring. I'm interested if he has anything to add.
To be fair, I think this exaggerates the role of the editing UI -- if the editing UI was that bad, then people wouldn't have contributed in the first place. The more common narrative for reduced participation is the growth of cultural issues which make it less rewarding to participate (e.g. http://www.technologyreview.com/featuredstory/520446/the-dec... ).
(Edit: Not trying to be negative -- the suggestions in the article sound cool anyway.)