This seems all very reasonable. Look forward to the post mortem.
Given their size on both the web in terms of employees this unusual for Wikimedia. They typically fly under the radar. How many times has Wikipedia ever been down?
I recall AWS, Google, Microsoft having more outages -- mind they probably are considerably bigger but still they're doing something right.
it's not hard to have high uptime for content that is largely static. Has less to do with size than static versus dynamic, and accurate (think bank transactions) vs fuzzy (search results)
Sure, but if the dynamic stuff was down, you just wouldn't be able to edit, but the static part would keep humming right along and never get counted as an outage.
You were kind of right - page gets reparsed (and frontend cdn cache cleared, along with a backend cache cleared) anytime someone edits a wikidata entry the lua script uses.
> Sure, but if the dynamic stuff was down, you just wouldn't be able to edit, but the static part would keep humming right along and never get counted as an outage.
Frontend cache gets skipped if you have a session cookie (logged in or have been logged in recently or made an edit logged out). So if you edit something, subsequent views are not hitting the static site, so you would notice if it was down
Wikipedia has a second layer of cache after that of the article html without user interface which is stored in memcache & db (in mediawiki speak this is referred to as the "parser cache"). However typically if the site is down,that layer would go down too, so only the varnish servers really have the potential to hide outages.
You can also think reads vs writes. Like when there are many reads but few writes it makes sense to have replicas, but if there are many writes and little reads you are better off with "sharding". You can also think RAID, where you have mirrors vs stripes. Often though the two concepts are combined, like in RAID 0+1 or RAID 1+0. Scaling many reads are much more simple then scaling both read/writes though. The holy grail of computing/databases is to build a database that can scale both reads and writes while having decent performance and latency.
wikipedia pages are very easy to cache and caching them likely provides a massive benefit, so if we're talking about uptime static is probably a better description of what is happening than SSR.
You are right that technically it's SSR, but that's not what's relevant here.
I don't think they are mutually exclusive. I have no idea how wikipedia works, but I've run a lot of high volume relatively static sites. A simple thing that works very well is to do SSR, but serve it through a CDN like akamai, then configure akamai to serve a static/cached version if the backend is down. Assuming everything is working, you get a semi dynamic SSR model, but if something goes down, the site is still served and you have no customer facing downtime.
Wikipedia basically uses varnish as their own CDN (they having caching servers in SF, texas, Virginia, singapore and amsterdam. Backend servers are in Virginia with hot backup in texas)
Thanks for sharing. We did it that way for a site I worked on until a ddos brought varnish down. Then we put Akamai in front and never had a problem again. This was over a decade ago, it’s wasn’t as easy back then to auto scale a varnish layer in the cloud.
Wikipedia generally tries to take the approach of doing things they can themselves and using open source whereever possible. Most of the setup is documented at https://wikitech.wikimedia.org and there is a public puppet repo with all the server configs https://github.com/wikimedia/puppet
That said, i think they do now use cloudflate's bgp based magic transport ddos protection product to help against ddos
> Wikipedia develops at a rate of over 1.9 edits per second, performed by editors from all over the world. Currently, the English Wikipedia includes 6,167,378 articles and it averages 598 new articles per day.
Note: edits sometimes affect multiple pages (in extreme cases, edits can affect millions of pages. The lua script (which is a wiki page editable like any other) Module:arguments is used on over 25 million pages).
There generally is a bit of a long tail effect. Popular pages get edited a lot, but they also get viewed a lot. It can be expensive when everyone is viewing and editing the same page (Micheal Jackson's death is a famous example that caused downtime, although changes were made to make things more robust so it wouldn't happen again)
The way they designed everything, this doesn't matter. its still static, in that the content is not generated at access time, at least for logged out users.
If the actual servers go down all that means is that wikipedia is read only and the caching reverse proxies that also receive a push update during modifications would just serve the last version of pages. (except anybody with login cookies, valid or not, would get 500 responses)
But everyone sees the same edited version when not logged in, which is the vast majority of users, so you can just throw a huge cache in front, which is what they do. And most edits only touch one page, so the churn is tiny relative to the cache size.
This is a much easier service to reliably engineer than something like Twitter. For SRE purposes, Wikipedia is mostly static.
Compared to something like Slack or Office 365 products it could as well be carved in stone. I'm guessing 99+ % of requests are non-authenticated, the data is easy to cache and freshness (on the timescale of minutes or hours) is almost worthless.
Even something as simple as HN probably have much much lower value of "usefulness if the service is served completely static from caches", due to upvotes and comments. If the front-page and comments stayed static both during breakfast and lunch, my WFH routine would sadly be impacted...
> freshness (on the timescale of minutes or hours) is almost worthless.
On the contrary, users get very angry if stuff isn't fresh.
Someone changes trump article to say he is a poopy head. If that gets fixed in 2 seconds, no big deal. If that gets cached, and the edit to fix it doesnt hit the caches for a couple hours, wikipedia is now the top story on CNN.
Generally wikipedia caches are expected to be updated within seconds or minutes at most.
Okay, that's fair, my view was too simplistic. But still, Wikipedia could probably get away with days of 1-5 minute cache refreshes if it was required? Especially if some banner informed users about it or something.
I think my larger points still stand. In comparison, almost all other services at the scale of wikipedia have critical almost-realtime components, and is almost useless without the possibility to authenticate users (which can't really be cached).
Not saying that the people who manage to keep Wikipedia so stable are doing an easy task, just that it's very different from almost all other things on the web.
I've sometimes heard wikipedia described as a "large scale static site plus a medium scale social network". The caching is a bit more complex than a naive static site due to churn rate and freshness requirements, but fundamentally you are right, without frontend varnish caching, wikipedia would be very different in terms of hosting requirements and scaling complexity.
I'm also wondering if the caching strategy they are using is a naive one (ie: cache is valid for a fix duration, like 5 minutes) or if it's a more active one (like stakeoverflow), with cache in validations each time a page is modified/commented on.
Right, but if the backend goes down, and you serve a stale cached version of the page that is missing the latest edit, it's fine and you have no downtime. That's what I mean by static.
The opposite of static would be an e-commerce site where you can't take transactions if the backend is down and you really don't want to oversell your inventory, so you need the inventory management system to be up for the site to "work".
Also, the average wikipedia page probably isn't edited very often.
There are enough copies of Wikipedia that if it has ever been down you could just get the same content elsewhere, so people don't usually make the same fuss about it that they would about AWS/Google/Microsoft.
AWS/Google being down for even a minute or two is a big deal though.
Prior to having a cellphone data plan (2013) I kept a copy of most Wikipedia articles on my laptop for use when traveling and being away from easy internet access.
Me? I've seen it down a couple times before, and when it was, I just Googled for another copy of the article instead of posting "Wikipedia is down! OMG the world is ending!" all over the internet as people do when AWS is down.
So that makes it anecdotal. I thought that's something others do as well. I personally don't even know where else to look for articles. I also don't trust other sources or possibly outdated mirrors.
If it's a controversial topic (e.g. history of Tibet), yes.
If it's just an article about the history of pianos or CPUs or something, the probability of misinformation is much lower, the consequences of being misinformed are much lower, and I don't usually bother. Many times I just browse Wikipedia because I want to learn about weird animals or off-the-beaten-path places on Earth or culture or something like that.
(By the way, primary sources also sometimes have their drawbacks as well; they can often be politically motivated, biased and not tell you the full story, and Wikipedia is effectively peer-reviewed for a lot of articles.)
I'm reminded that MediaWiki runs on PHP, which isn't much loved here on HN, but does power some of the busiest sites in the world, and PHP 7 and 8 have done a great job of moving the language and performance forward.
Assuming concurrency is done correctly (which FPM handles, although not transparently/as transparently as some would like) any language can be super-performant - PHP's leg up is long-term compatibility (a php 5->7 upgrade is better/easier than say a python 2->3 upgrade, at least in my experience) and syntax being easy to pick up and become proficient in.
Not really, PHP is very backwards compatibility and I don't see lots of "modern" PHP (eg Traits, etc) in the code. I last looked at their release from a few months ago.
I do lots of PHP migration work (4 to 5, 5 to 7) and for much of it it just works.
Not much code change is necessary to the the upgraded features of the engine.
1.34 is the trailing legacy release and requires 7.2.9 or newer. 1.31 LTS, supported into 2021, requires 7.0 or later.
In releases since 1.27 introduced PHP 7 support, the backward version compatibility has been tightening up more aggressively. PHP 7 brought a ton of performance gains that MW had previously relied on HHVM for, which gave them a lot of reasons to shed PHP 5 and start turning more forward-looking.
Wikipedia is running php 7.2 https://en.wikipedia.org/wiki/Special:Version so no. That said, new features do get used when they make sense, so we are definitely not maintaining back compat with ancient versions of php.
(This includes a couple usages of traits, traits is just not something super common in mediawiki, but there are some cases where they are used)
Wikimedia is only provisionally on 7.2 until they finish migrating ahead of the next Mediawiki LTS release. There are a bunch of points where they talk about the benefits of requiring 7.3 here: https://phabricator.wikimedia.org/T257879
For a project as widely used as Mediawiki, it completely makes sense to iterate slowly as PHP evolves. New features are a wonderful opportunity in many cases, but without a full understanding of how they may affect a very large code, caution is well warranted.
I remember early in the history of E*Trade, I went to my account, and it showed someone else's name and account information. Didn't even bother reporting it (I was just a kid); just logged off and on again and withdrew everything from my account and never looked back.
I know the full analysis isn't online but I have a problem with this part...
> This was done out of an abundance of caution, after we received one (1) user report of being logged in as someone else.
This _seems_ like a knee-jerk reaction to one data point.
There could be other causes for a user to report that, like a change to the cache key used for serving a users profile giving the _appearance_ that you're logged in as someone else, even though you're not really.
Forcing everyone to re-login could potentially make the system worse, in that you're now overloading parts of the system that has to handle those logins, plus causing all kinds of cache expiry...
I guess there's more to the story and someone who knows the system deeply knew this was the right choice but just reading the reports it seems knee-jerkish.
The "view profile as another user" feature was infamous for being... really thorough. At one point you would receive chat messages directed to the person you were viewing the profile as (I believe this was before the integration of chat and messages).
My buddy told me that the U.S. marines in afghanistan once misplaced one of their fancy encrypted radios (or it might have been one of the keys to their fancy encrypted radio) so all radio communications were suspended until the next scheduled rotation.
Battery ran out? Left one behind? Throw them all away and visit the quartermasters' Idiocracy/MIC version of Costco to pull another 100 pack off the shelf at a cost to the taxpayers of mere millions. Oh and F-35B's and Blackhawk turbines? just chuck those like Kleenex. My mom worked in civil service for the Navy and $20k parts in the '70's went "missing" and "lost" all of the time. Then there's a USAF Major I know who became unpopular for suggesting proper maintenance rather than ruining/replacing turbines might be a better approach (likely some sort of maintenance/HQ personnel-vendor kickbacks, MIC FTW).
This reminds of a thing that happened on MSN back in the day when I suddenly started to receive random users' chat messages in one of my chats. It only lasted a few minutes and it was all different kind of languages. Never heard anything about it but in retrospect it felt very serious. Imagine the same thing happening on Facebook Messenger or Whatsapp.
This is a very common 'advanced failure scenario.' I've seen it on a handful of sites, session objects and caching are difficult and sometimes overlooked during migrations.
Isn't that the same thing? Say that you forget to specify that "my_login_cookie" should vary the response from the backend. User A logs in. Now user B logs in, but because the page was cached from user As visit, user B sees the content for user A. B is not "logged in" as A, but can still see their content.
I had a similar issue happen to myself (not on wikimedia or anything related to wikipedia.) I clicked on "login" by accident without filling in my credentials and I was logged in as either an admin user or a user called "Adam".
I’ve always wondered if this class of cache issue resulted primarily from collisions of some sort. I’ve only seen it a handful of times over many years. Others here have mentioned it with services using Varnish, for instance.
Varnish cache works by creating a hash of the request consisting of method, url and for a site where you log in, that would also include the cookie you use for identification. Your proposed collision idea is very improbable but possible.
Wow, this is the first time I've seen a company pick "better safe than sorry" when it had global effects (though this isn't a large hit, it's everywhere).
I remember back in the days, I disconnected my dial-up modem and re-connected with another public IP. Upon refreshing Hotmail, I was presented with another user's mailbox.
Do you mean the Visual Editor? That's been available for several years now. Getting the back end for it running can be a bit fraught.
I wrote this: https://www.mediawiki.org/wiki/Intranet and keep it up to date every now and then. It's probably time for me to look at 1.36. Anyway, the current Parsoid based thing works fine.
Do they mean employees? My Wikipedia login lasts for 30 days, I think. I basically only login to edit articles that require so. Otherwise I just edit as anonymous.
That way, if someone is named "Billy" or "Alex", you won't be spending years mistakenly assuming they were a man while they could have been a woman.
This also has the beneficial side effect of covering trans individuals who use a birth name while having another gender identity.
Even more useful for usernames such as yours, "dionian". I do not know if it is your first name, last name or made-up username. So I have no clue what your gender is and I'd default to "they". If I knew, I could call you by the right pronouns.
It's just a slow shift away from assuming everyone on the internet is a man.
It also serves as a flag that allows people to identify you as having bought into the general set of practices which strive to change our culture to support trans and nonbinary gender ideology.
And if our culture doesn't already support trans and non-binary people (ideology? no, it's people we're talking about), then that seems like it'd be a good thing to change.
Support of people may be involved, and this is a poor forum to litigate deep problems -- but if I trusted the diversity-and-inclusion apparatus of Silicon Valley in general to support the entirety of Title VI of the Civil Rights Act, including the inconvenient parts about people of differing religions, then I might not been predisposed to choose that word.
Even your reply highlights it. "It's about people." Notice the manner in which you _correct_ my rather anodyne words. The fault is that I did not frame the question as you would have me frame it, and looked at a part of the general phenomenon that is different from you think deserves to be centered on.
You are welcome to your ideology, in any event, and it would be a poor world indeed where one could not act to change the world in accordance with one's conscience.
There is very much ideology around these issues. One particularly relevant article of ideology suggests that people who are not trans, and whose names would not ordinarily cause confusion, ought nevertheless adopt a practice of specifying pronouns, in service of the idea that those who are trans (or nonbinary) might not stand out when they do so.
While I am agree that there are human rights issues involved in the issues generally, I would say that is not specifically a human right to have others around you adopt this specific practice to support such a person as he, she, or they assert an identity.
You've made several substantial changes -- far beyond the scope of a clarification -- and now I'm reluctant to respond at all because who knows what it'll say next.
Not the same person but for what it's worth: the edit period is limited to a few minutes and it exists for a reason. This topic (gender identity/pronoun use) is important to approach carefully and is exactly when the edit window should be used.
It is unfortunate that there is an excess of ideology around these issues. It hinders communication by confusing questions about the implementation of an ideology with hate-filled attacks on people.
It does not help the matter that similar attacks are real and exist. On the other hand, such confusion of concerns is a common goal of many who advance their ideologies, so there is a possibility some may deem this confusion desirable.
I'm not sure I would call it that at all. Silicon Valley is not the world and, in the corporate world, this remains a very California / Silicon Valley phenomenon (present but less strong in a few key other places — New York, Seattle, Portland come to mind).
It is of course also a university-campus phenomenon (especially private universities) but those aren't really "corpo". For that matter, the Wikimedia Foundation is only "corpo" insofar as a 501(c)(3) is technically in fact a corporation — unlikely to be within the meaning of what you intended.
On that note, it is more prominent in the nonprofit sector, perhaps due to volunteers being more predisposed to activism in general. (Mozilla was a bellwether.)
Not sure exactly what you think pronouns have to do with neoliberalism? I don't really associate the likes of Thatcher, Reagan, Friedman, or Hayek with anything of the sort.
I also think globo-corpo neoliberal is a bit redundant, no?
"Liberal" and "neoliberal" have both been hijacked by progressives thoroughly enough that no one associates colloquial usage of either of those with Friedman or Hayek.
Friedman and Hayek were certainly not globo-corpo, considering that they believed in things like freedom of association, covenants, and other legal protections that work against the likes of Amazon or Wal-mart (who are primarily interested in undermining unions, using strategies to diminish worker cohesion like diversity quotas, and strategies to defang leftism like trans activism).
I hesitate to call it virtue signalling because there are other forms of social signalling that aren't about virtue. Safe spaces require effort from the members who already feel safe.
Putting (he/him) in does three things. It invites you to provide your own. It also stops someone from starting a side-argument about you assuming 'he/him' in your response instead of you/your or they/their. Which means that we can talk about what we want to talk about instead of gender politics, if you don't feel like it. Or we can if you need to.
It also make it so that trans people won't be outed as trans for having their pronouns in their profiles. If only trans people did that, you'd automatically know who is trans and who isn't. Now that everyone is adding those, you can't say.
It's appropriate that the commenter below uses the phrase "bought in", because that conveys the general sense of mob/unthinking latching onto this movement. One which I'm quite disappointed to see companies just caving to, through little insertions of ridiculous practices like these.
Having everyone declare their gender is a little bit ridiculous, just to serve the desires of a ~1% group who are trying to gain more recognition. I am frankly surprised how people are willing to distort their behavior when they refuse to do so for other groups who are far more downtrampled in their rights in greater percentages. I suppose somehow transsexual people just became popular for some reason.
Frankly, it's a symbol and problem of the modern liberal/democratic mind (at least at the party-level) that these problems rise to the level of national and corporate attention -- and apparently we solved all our other material needs and have time to spend on this in comparison.
You project a lot of problems onto people who provide their pronouns in their user descriptions.
Not everyone is American, and not all issues are about American politics.
I find it interesting how much emphasis has been placed on an afterthought that I edited in. A lot of really emotional reactions like yours. I described it as a beneficial side effect and suddenly the whole conversation is about "evil trans people" and "bad liberals".
Having your pronouns available is beneficial to more than just trans people. It benefits everyone, avoids mistakes and makes conversations more accurate. It also breaks the myth that there are no women online and that there are no computer-savvy women. It also brings visibility to non-binary people who are otherwise invisible.
>Preferred gender pronouns or personal gender pronouns (often abbreviated as PGP) refer to the set of third-person pronouns that an individual prefers that others use in order to identify that person's gender (or lack thereof). In English, when declaring one's preferred pronouns, a person will often state the subject and object pronouns along with the possessive adjectives—for example, "she, her, hers", "he, him, his", or "they, them, theirs"—although sometimes, only the subject and object pronouns are stated ("he, him", "she, her", "they, them").
I think it's just more clear that they are listing pronouns, especially if someone isn't familiar with the practice or in verbal speech. Also note that some people use two sets of pronouns, with equal weight, and may list them such as "she/they".
I have begun putting she/her in some of my profiles (twitter, reddit) because I still get mistaken for a man at times. To be fair, my middle name -- Michele -- can be a male name in some places. It's the Italian version of Michael, IIRC.
It is my personal policy to not correct people who misgender me in most cases, especially if that is the only thing I would be saying (I will sometimes clarify if it is part of a larger comment, but I try to be gentle about it). I would rather put that info somewhere and let them have the chance to learn of it without me having to correct someone.
Different people have different reasons for noting their pronouns. Some do it because it is trendy. Some do it because they have genuinely been misidentified in online spaces. Some do it to show themselves as allies to certain groups.
Without him saying why he did it, no one here can genuinely tell you why he chose to do so.
If you care so very little about this issue, why bother replying to me to suggest I shouldn't? If you really don't care, then you shouldn't even be reading this crap. There is plenty of other stuff to read elsewhere that isn't on this subject.
Fair point, fair point. It wasn't thought through. Sorry.
But it leads to the question of why it matters - should it? If we treat all equally, should it matter? I don't see myself as female, but neither a 'man', as I don't much relate to common cultural depictions of men, which I find distasteful. I am what I am, names won't change that so I don't care. Why do you?
BTW I know a few trans people and I suspect most of them would roll their eyes at this excessive care not to offend anyone with wrong pronouns. They expect people to get it right but it's no disaster if someone got it wrong. None of them are snowflakes. Consideration towards trans people needs more basic considerations such as not being called a freak on public transport (this happened to a tgirl I know).
But it leads to the question of why it matters - should it?
In theory, it shouldn't. In practice, it does.
People who think I am male speak to me differently than when they know I am a woman. Since I am trying to establish an adequate income, my experience is that if people have an issue with me being a woman, it's better for that to be sorted before they interact with me, not after.
If someone is willing to meet me in person, thinks I'm male and then meets me and sees I'm a woman, that's likely to go badly. I don't want to waste my time on that, much less risk facing potential drama because of it.
The reality is it ends up mattering whether I want it to matter or not. So I try to make it the least drama I can arrange given the tools available to me. I find that a quiet heads up is better than trying to hide my gender and is also better than putting people on the spot and correcting them in "public" and in a way that will make them feel attacked for simply not knowing.
The same happened in Norway on a governmental page a couple of years ago. It then was a memcache problem that made other people see one persons tax report.
Was until 2017, only the highest earners is fully public now. You can search on everyone, but the person can see that you have checked them (cleared every year).
There exist services where you can pay $5 per search to avoid this the history.
Judging from the error description, deleting cookies would likely not help in this case. This sounds as if sessions are mixed up on the server side - deleting your cookie will only remove your session token, not remove the session server-side. You’d need to actually send a logout request to the server.
I wish every service that has something like user sessions provided the ability to revoke other sessions of your account. Then, if the site isn't too hard to automate, you could write a program for that.
That doesn't seem difficult; just don't change the format of the session tokens.
Eg a web application: you can substantially rewrite a web application, without invalidating logged in sessions.
The point here is that the logged in sessions were suspected of being unauthorized. The unauthorized sessions had to be turfed, and the clearest way of being sure that all unauthorized sessions are turfed is to delete all the sessions.
Of course, some those with unauthorized access will also try to log in to resume that unauthorized access, but presumably there is some trap laid for that.
Maybe for the accounts suspected of having been breached, there will be a mandatory password recovery procedure or whatever. Or they will monitor for suspicious logins from different IP addresses.
You can have state sitting in (serverside) session storage that is incompatible with the new version but you don't want users to lose. So now you have to migrate it, which can end up being actual work depending on the change.
Logging people out seems like a hella low price to pay to pay for a potential security issue. Of course, hats off if you hold your engineering organization to such a high standard for normal changes.
Depends on the type of application, 99.9% of Wikimedia users are using the site without being logged in.
If the site doesn't work without being logged in you could frustrate users and they might just use a different product instead of searching for their login after being logged in for a year or longer.
Given their size on both the web in terms of employees this unusual for Wikimedia. They typically fly under the radar. How many times has Wikipedia ever been down?
I recall AWS, Google, Microsoft having more outages -- mind they probably are considerably bigger but still they're doing something right.