Atlassian: We estimate the rebuilding effort to last for up to 2 more weeks

Rantenki · on April 11, 2022

As someone who is impacted, this is obviously immensely frustrating.

Worse, outside of "we have rebuilt functionality for over 35% of the users", I haven't seen any reports from the people who have ostensibly been recovered.

Next, their published RTO is 6 hours, so obviously they must have done something that completely demolished their ability to use their standard recovery methods: https://www.atlassian.com/trust/security/data-management

Finally, there have been some hints that this is related to the decommissioning of a plugin product Atlassian recently acquired (Insight asset management) which is only really useful to large organizations. I suspect that the "0.18% impacted" number is relative to ALL users of Atlassian, including free/limited accounts, and that the percentage of large/serious organizations who are impacted (and who would have a use for an asset management product), is much higher.

mdeeks · on April 11, 2022

RTO is so hard to state properly even with regular testing. If someone blows away a critical database, sure you can meet your published RTO. What if we lose 300 of our databases and need to copy snapshots from another region. AWS limits you to 20 concurrent snapshot copies cross region. Which of those databases should you do first? Do you know your entire dependency graph for all 1000 of your services to make the right call? Meanwhile tick-tock your "6 hours" is slipping away. And what if someone nukes our entire AWS account with all of our prod resources? Databases, load balancers, S3 (no such thing as snapshotting there), EC2 instances, etc, etc

Those last two are examples are very unlikely but no company is going to say RTO = "probably 6 hours but it could be three weeks if we get ransomwared"

manquer · on April 11, 2022

For what you suggest some combination of these things should have happened.

  - Some employee has root access to AWS account and uses it operationally
  - Given wildcard S3 permissions to an IAM user and allowing delete bucket
  - Not enabled object versioning 
  - Cross Region replication not enabled
  - no large bucket protection
  - don't have basic security monitoring and setup of Cloudtail alerts
  - have not invested in full fledged tools for IDS and so on.

If some vendor have any of these issues I don't think any customer would approve these software to be used, these are not normal or best practices .

Large apps have detailed playbooks on how and what gets turned on in what order, and most do DR drills and time those runs periodically. These are well established workflows in any large org.

Yes in a real world downtime you can't have planned for every scenario, maybe you miss the target by 25 % like 2 hours more, or maybe in a very situation you double or even triple it say 12-18 hours. You don't go from 6 to 600+ .

The way RTO is calculated starts by looking at limits on cloud/ hardware / bandwidth/ machine sizes, if basic limits are not factored in like cross region concurrency there is no point in RTO being computed. Even if something like that was missed and you spend tens of millions of dollars on AWS then AWS will work with you and relax those limits .

100x missing the plan either means extremely poor planning or they screwed up something very very badly.

mdeeks · on April 11, 2022

I haven't ever seen it be as perfectly done as you've described. It is always shades of gray across teams and companies. They have most of what you described, but not uniformly across the company.

e.g. versioning may be enabled, but not cross region replication because it is cost prohibitive. Someone runs a job to clean up a bucket that includes deleting old versions. They point it at the wrong bucket or wrong path in the bucket. Or a malicious user does it on purpose. Monitors and alerts really tell you after the fact that you now have a major problem.

Also limits (like cross region concurrency) may not be known about until it is time to actually do a mass scale restore. DR tests might have been done but only in isolation of one app at a time. By the time you realize your mistake you're dealing with physics. Maybe AWS can bump it a bit to help you in that particular circumstance though.

No idea what happened at Atlassian. My only point is it is very hard to get it right without a huge amount of effort.

manquer · on April 12, 2022

It is hard to get it perfectly right yes , missing by small margins or even doubled/trippled the declared time would be reasonable if it is just prediction problem.

However going like 100x is not probably cause this is hard to get 100 % accurate it look more likely deleted data as being rumoured and more importantly not actually having functioning backups that were ever tested and manually reconstructing from logs and other sources.

More than just RTO, they are not going to be able to meet RPO objectives for affected customers , depending on how much loss that is going to pretty bad.

bigiain · on April 12, 2022

> 100x missing the plan either means extremely poor planning or they screwed up something very very badly.

Like most airliner accidents, this is probably an unfortunate combination of both of those things happening at the same time. My guess would be they have fairly decent planning overall but there's one (or more) small-ish areas where their planning is extremely poor - which crossed over with a screwup in a very specific fashion that laser focussed on that particular piece of poor planning. The "this can never happen" immovable object and the "You can't do that" irresistible force.

manquer · on April 12, 2022

Aviation has paid with so much blood resulting in tight regulations and internal controls to get the point where any current accidents are unlikely edge case scenarios. [1]

SaaS has lot more tolerance for failure, so my money is it is something simpler but difficult to get implemented in large org.

---

In a ideal world this incident should impact their revenue, growth and stock price substantially.

It is unlikely to do so, because of stickiness of enterprise customers, no better alternatives, compared to say Google, Facebook, Amazon where a minute of downtime is immediate quantifiable revenue loss so FAANG really obsess so much over how many 9s of uptime.

The typical management of enteripse app companies like Atlassian have no incentive to do anything beyond cursory lip service and get away with under investing in tech.

---

[1]3 years back i would have stood by that, but after Boeing 737 max twin disaster and systematic problems leading to it , i am not so sure those lessons are not forgotten.

bigiain · on April 12, 2022

RTO = "probably 6 hours unless we've fucked up so badly we just cut our losses and close the entire company down"

syshum · on April 11, 2022

Normally for short RTO you do not try to recover by to the orginal location

you fail over to warm replica;s that are already staged with data with in your RPO.

Vojojo · on April 12, 2022

We are one of the customers that has been restored with access to systems restored about 24 hours ago. We have some lingering but relatively minor issues with some plugins and the like but there doesn't appear to be any data loss and performance is healthy.

uuyi · on April 11, 2022

From experience: RTO = recovery time objective = lie

mey · on April 11, 2022

Don't have a way to verify this, since you know, but I don't believe we had this plugin installed/enabled, but are still caught in the blast.

Nextgrid · on April 11, 2022

I bet they screwed up royally, deleted some data and are down to either rebuilding it from logs, caches or other side-effects, or using data recovery software on the storage drives (which might involve third-party companies). I can't see many other reasons why this should take 2 weeks.

mrits · on April 11, 2022

A few years ago I depended on Vertica (snowflae/clickhouse) type database.

When a node went down there was no hope of ever coming back up unless you shut the other nodes down as well. While this was going down of course none of our ingress data was being inserted so it built a queue. When we turned things back on the queue would overload vertica again and we had to repeat the whole thing.

Fortunately for us we only stored analytic type data on vertica where customers usually were only interested in the last few hours anyways. So we ended up deleting all historical data and just reprocessing it over months while occasionally prioritizing customers that complained.

laurent92 · on April 11, 2022

Let me bet: Rebuilding from Jira email notifications. Yes, the diffs in the notifications.

daigoba66 · on April 11, 2022

I've actually done exactly that many years ago for a self-hosted Jira installation that didn't have any backups. You can bet we had backups with regular testing after that.

uuyi · on April 11, 2022

I would have just quit if I had that landed on me.

Aachen · on April 11, 2022

Honestly it doesn't sound too difficult and like a challenge to script something fun. To me. If you ever find yourself in that situation, email is in my profile and we'll work something out :)

PyWoody · on April 13, 2022

Whoops, looks like I commented on the wrong chain.

BFLpL0QNek · on April 11, 2022

Not much on this on the internet.

Gray•Duffy, LLP Settles Two Massive International Data Loss Claims Arising From Computer Server Failures

Arc Touch, Inc. vs. Atlassian PTY, Ltd

http://m.grayduffylaw.com/?url=https%3A%2F%2Fgrayduffylaw.co...

https://www.theregister.com/2012/05/09/atlassian_cloud_stora...

mirntyfirty · on April 11, 2022

But are they gonna be able to accomplish anything without Jira guiding their projects?

davorak · on April 11, 2022

> Yes, the diffs in the notifications.

In my limited experience these diffs can be missing information. I recently had to reconstruct an issue description using these email diffs after two people where editing the description at the same time and it was not 100% accurate, several lines were missing. Going to the 'history' tab on the issue I was able to get the missing lines however, if all you have are emails though you might be out of luck.

escot · on April 12, 2022

Its like a reverse event sourced system

jrib · on April 11, 2022

this is a scary thought. I need to start being more aggressive about backing things up that are "in the cloud."

syshum · on April 11, 2022

The cloud is just someone else's servers.

3-2-1-0 Applies to all data, at all time, in all places

_y3es · on April 11, 2022

Normally it's 3-2-1, what does the 0 represent here?

3 copies

2 different formats (ex. HDD, cloud)

1 offsite

0 lost data? :D

nix23 · on April 12, 2022

>2 different formats (ex. HDD, cloud)

That's not what "format" means, it's more like DB-Dump and DB-VM-Dump, or pure Files and VM-Dump, or something like restic-repo and rsync(pure files).

syshum · on April 12, 2022

2 Media, but the HDD / Cloud is correct

The idea is to not have the backups stored on the same hardware or even same type of hardware. Same hardware is obvious but same type of hardware is listed because if a manufacturing defect or a known vulnerability is present it would make all of your backups at risk. So you want to have backups stored on 2 desperate types of storage media. HDD and Tape, or Cloud etc...

_y3es · on April 13, 2022

Exactly. If tape was easier for me to implement in my setup, I would do it, but I'd rather just use cloud with a fast fiber connection for now. Offsite I can send data on an external drive to another physical location if needed.

I made efforts to buy HDDs from different sellers even, to avoid sequential failures from singular bad batches. That's something else I'd want to add to a "3-2-1", with regards to HDD as a form of backup or storage media.

nix23 · on April 12, 2022

for me it was always 2 different formats (db-dump, vm-dump) because i trust (good) storage-media more then backup-software, for example old veeam-backups cannot be restored with new version, old veeam-software runs not on new esxi etc...

syshum · on April 12, 2022

Veeam backup chains are backwards compatible with older version of veeam so I am not sure where you get that.

As far as running veeam on esxi, you would need to elaborate more on that

nix23 · on April 12, 2022

>Veeam backup chains are backwards compatible

I had some backups that where not backwards compatible from versions that worked with ESX but NOT on ESXi. There's your "backwards compatible".

>As far as running veeam on esxi, you would need to elaborate more on that

Yeah you know exactly what i mean, no need to elaborate on that.

syshum · on April 13, 2022

>compatible from versions that worked with ESX but NOT on ESXi

wow, ESX, I have not seen anyone talk about that for a long time, you must really hold a grudge...

I never used Veeam with ESX so I can not comment on that.

>Yeah you know exactly what i mean, no need to elaborate on that.

No I really do not, you do not run Veeam on either ESX or ESXI, veeam connect to vmware with API calls, so .....

_y3es · on April 13, 2022

Yes seems I used the wrong word here. Indeed I meant media, rather than format, as personally for my backup setups it's different media I want to trust, rather than the actual backup file formats (which are easily interchangeable depending on what's used; read and write data to and from formats if needed).

syshum · on April 12, 2022

0 Errors.

Technically alot of Backup Planning has moved to 3-2-1-1-0

3 Copies

2 Different Media

1 - Offsite

1 - Offline / Immutable

0 - Errors from Verification Tests

_y3es · on April 13, 2022

Interesting, thanks for clarifying.

My 3-2-1 comes from a personal non-professional standpoint, thus not having the extra 1-0. However I have been considering immutable offline backups, using burned DVDs or Blu-Ray discs. That's another project for another time though, for now I'm trusting paid cloud providers.

As for verification tests, hashsums are a simple solution in my opinion, but I've moved to ZFS and BTRFS to avoid having blips.

phone8675309 · on April 11, 2022

0 data leaving the system unencrypted would be my formulation

x0x0 · on April 12, 2022

Everyone who's used jira understands that they don't hire good engineers. If they did, it wouldn't be So. Damn. Slow.

They're almost certainly rebuilding something from scratch.

If it were an AWS system limitation, almost all of those can be lifted if you ask nicely and are a big account.

planb · on April 11, 2022

As the person responsible for running Jira and Confluence on premises at my employer I‘m looking forward for the next time one of their sales droids contacts me to make us move to their cloud services (despite me stating that we are not interested multiple times)…

breakingcups · on April 11, 2022

It's insane. Since they've all but killed off on-premise Jira and Confluence, they've been spamming me regularly trying to convince me to "upgrade" to the cloud. Eventually I gave in and replied to them asking two simple questions.

One is that we've had many bad reports from partners that Jira Cloud is incredibly slow, even when compared to the already underperforming Jira Server and I wonder what their performance guarantees are. The other one is that it's so, so pricey.

It helped, but not in the way I thought it would! There's been no reply, but also no more spam emails now!

deergomoo · on April 11, 2022

> Jira Cloud is incredibly slow

I haven’t used Jira Cloud in any great depth, but I did play around with it as part of trialling the free plan and was amused that you could quite easily come across warnings to backup your installation and consult your system administrator before proceeding…how exactly do I do that for a cloud service? Doesn’t exactly inspire confidence.

At my last job we used Bitbucket Cloud and that was awful. Dog slow, ridiculously low threshold for being unable to render diffs, and constant incidents. We used to joke that they could make the “Bitbucket is experiencing an incident” banner a permanent fixture on the page and it would be right more often than it was wrong.

We’re still using on-prem Jira at my current job, but we just migrated away from on-prem Bitbucket to GitHub, as Bitbucket was becoming infeasible and the cloud offering is a bad joke.

znpy · on April 11, 2022

> I haven’t used Jira Cloud in any great depth.

I have. It's painfully slow.

But slowness isn't really the problem, the problem is that it's unpredictable.

I wait for the interface to be fully loaded, so I click on a text box and i start typing. Then FU--ING something takes the focus to some other element in the web page and now i'm typing random shortcuts (like reassigning tickets, changing status or whatever).

It's painfully slow but the real problem is that it's unpredictable in its behaviour.

noptd · on April 12, 2022

>now i'm typing random shortcuts (like reassigning tickets, changing status or whatever).

This happens to me often and it's absolutely infuriating. I'd prefer a blocking spinning wheel of death over that nonsense. It's all but ensured that I'll be looking elsewhere when choosing project tracking software in the future.

postingposts · on April 11, 2022

Well you’re just a user, what do you know anyhow? It’s not like paying money for a service entitles you to be able to have something which works and delivers what you paid for. In fact, if you look at the EULA I’m positive that it states you’re paying to access the almighty godlike code of Jira. Also if you don’t like it, simply construct your own industry standard and train your users on it, maintain it, and keep the costs down!.

I think software companies need to have a serious “Come to Jesus” talk with their users about who needs to control what.

rconti · on April 11, 2022

My employer recently switched from on-prem to cloud. The cloud service is insanely slow, or maybe it's my aging Macbook, but every single component on the page seems to have to load separately. (It's a newer UI versus what we had on-prem).

Thankfully we haven't been impacted by this outage.

andrekandre · on April 11, 2022

  > or maybe it's my aging Macbook

on an m1 pro: its slow as molasses

sometimes i have to switch to the "old ui"[0] to get any use out of it (not sure what the cause is, but sometimes its literally unusable)

[0] https://community.atlassian.com/t5/Jira-questions/Re-How-do-...

judge2020 · on April 11, 2022

I've also heard that the cloud service is slow; you can easily check if it's your machine or the server by watching devtools -> network tab and seeing how many requests are `(waiting)`, because chances are it's Atlassian's server speed.

dijit · on April 11, 2022

Can try the native Mac app until it’s discontinued. Surprising how fast it is.

case0x · on April 11, 2022

>Jira Cloud is incredibly slow

I have to say it got better after they switched to AWS but Jira not working/being slow is still an inside joke in the office

uuyi · on April 11, 2022

I would love to listen to that call.

There’s nothing that makes me happier than the fearsome squealing noises that enterprise sales drones make when you drop the sales equivalent of a Paveway IV on their pitch.

My favourite one was running some software supply chain compliance software on itself and explaining how it was constructed on top of a CVE riddled garbage dump.

dylan604 · on April 11, 2022

My favorite experience with a sales rep (not Atlassian related at all) was when I went to the vendor's booth at NAB (huge industry convention for those not familiar). I saw our sales rep who quickly looked down to catch her breath before greeting me. I smiled and let her know that today was her lucky day if she could just introduce me to the tech team she promised I could meet. At one point in my conversation with the tech team, I noticed that a small crowd had gathered around my conversation. I was not intending to hold public court, but I was not going to miss a chance to talk directly in person to these team members. To a non-tech person, it might have been viewed as confrontational. To a tech person to tech person, it was just direct questions being held to the fire for an actual answer vs CSR/Sales rep platitudes.

senorsmile · on April 11, 2022

They are discontinuing on prem after 2022 though. What is your plan if not to migrate to their SaaS offering?

planb · on April 11, 2022

We will probably bite the sour apple (not sure if this is correct English but you’ll get the meaning even if it’s not) and switch to the data center edition which is still on prem but costs approximately twice as much.

MandieD · on April 11, 2022

A bit from the Boom Chicago show I saw in Amsterdam around 2004: “The Dutch expression ‘bite the sour apple’ means the same thing as ‘bite the bullet’ - Americans are obsessed with guns, and the Netherlands is full of shitty food.”

IncRnd · on April 11, 2022

> Americans are obsessed with guns, and the Netherlands is full of shitty food.

"Bite the Bullet" is a phrase from an Englishman, Rudyard Kipling, in his first novel, "The Light That Failed". It's believed to have come from the other English idioms, "to bite the cartridge" and "chew a bullet", which date back to 1891 and at least 1796. [1]

[1] https://en.wikipedia.org/wiki/Bite_the_bullet

9935c101ab17a66 · on April 12, 2022

Weird... As another commenter pointed out, the origin of 'bite the bullet' is not American at all. And 'biting the sour apple' doesn't imply an affinity for bad food (Apples are shitty? Really?). It's a metaphor about context necessitating action, not a celebration of consuming unpleasant foods.

netsharc · on April 11, 2022

You can probably use this 1 week (3 scheduled) outage to ask for a discount, "your cloud offering is a bucket of shit, your data center edition is too expensive, my higher-ups told me to find something else...".

gpm · on April 11, 2022

> not sure if this is correct English but you’ll get the meaning even if it’s not

Not an idiom I'm familiar with, but it's grammatically correct and clear so I'd count it as proper.

"Bite the bullet" is commonly used.

IncRnd · on April 11, 2022

You can say, "bite the bullet", which means you accept the the difficulties you know will arise. It's actually a perfect idiom for this case.

noddingham · on April 12, 2022

End of support is February 15 2024, not 2022. But your current server licenses are perpetual so you _could_ keep running them.

uuyi · on April 11, 2022

Everyone should say no. That'd change their policy pretty quickly.

crawl_soever · on April 11, 2022

What are disconnected customers going to do? Surely they won't cut off the entire US military and DoD by terminating that cash cow of on premises.

mrweasel · on April 12, 2022

You can still buy datacenter edition, and some of us are forced to do so, if we wish to continue to use Jira and Confluence. For us it's not a problem, we're heavily invested in the Atlassian suite of product, and sell consulting, so we get a significant discount. For some of our clients it's a massive problem, as Atlassian cannot promise them that data won't leave the country, if fact it's certain that it will, because there's no AWS datacenter within our borders.

Atlassian completely ignored the large number of smaller customers who are legally forced to use an on-premise solution. If the software industry was so hell-bent on SaaS there would be a great business oppotunity in creating an on-premise Jira competitor.

bobnamob · on April 11, 2022

They’re still offering on-prem for airgapped usecases afaik. It’s just become a “contact us” pricing plan

joshuaissac · on April 11, 2022

It is not a "contact us" pricing plan. The Data Center prices are publicly available on their website. It is more than twice as expensive as the Server offering, but still substantially cheaper than Cloud for the same number of users.

And it is the exact same software as Server with some extras enabled like support for multiple nodes, so upgrading to it is as simple as pasting in a new product key.

tatersolid · on April 13, 2022

> so upgrading to it is as simple as pasting in a new product key

You’ve never actually had to update an on-premise JIRA instance yourself, I presume?

oldshatterhand · on April 11, 2022

Switching platforms

mvkel · on April 12, 2022

You never would have converted anyway though, right? Black swan events are pretty weak justifications for any decision.

In other words, this outage is not -because- it's cloud software. It's because someone, somewhere, broke something fundamental. That can (and does) happen in on prem at a much higher rate.

perlgeek · on April 11, 2022

So, let me get this straight:

* It's been deleted for a week already, they estimate they might need two more weeks. Three in total.

* They claim to have "extensive backups", and hundreds of engineers working on it.

What? How? This simply doesn't go together. Why would restoring from backup take three weeks?

Either their backups aren't complete, or they need new software written for the restore, or something else doesn't add up.

I haven't administered their software yet, but what I've learned from the sidelines, at least Jira doesn't seem to be rocket science. A database, an application server (maybe a few instances for larger sites), a bit of config, some caches. This really shouldn't take three weeks to restore.

jpalomaki · on April 11, 2022

If you would permanently delete data for selected customers from a large multitenant system, it could actually take some time to restore it - even with proper backups.

You can’t just do a full recovery as that would mess those customers who were not affected (it likely takes time to notice the mistake - others have continued to use the system). You might need to write some tools to migrate the data from backups. Also you really need to test everything very carefully - otherwise you might be in even deeper trouble (looking at corrupted instead of lost data).

In large organization this kind of ”manual” recovery might require people from multiple teams as no single person knows all the areas. This adds overhead. Throwing too many people in does not help either. When you start thinking about it, few weeks is not that long.

And JIRA is definitely not simple. It’s complicated beast and likely the SaaS features combined with all the legacy makes it even more complicated.

Gigachad · on April 11, 2022

What a nightmare situation. Makes you wonder if some kind of 1 database per customer setup would be preferable here since you could restore only affected customers.

tcangussu · on April 12, 2022

I was in a similar situation many years ago (different ticket software though). What we did was spinning up a spare server with the backup data and script an extraction/injection tool to populate the production multitenant sass.

wonderwonder · on April 11, 2022

how would that require hundreds of engineers though? Would one not build a script and then just run it for each customer or each dept build a script? I honestly have no idea, never been in that sort of recovery situation before.

jlbooker · on April 11, 2022

Restore from off-site tape backup. The kind of service where you ship them ~dozen new tapes in a lockbox each week and they ship you the oldest dozen back. It's supposed to be the "if all of a data centers happen to burn to ashes simultaneously" option. If you say "give us all of our tapes, asap" and then have some pour souls swapping them out as fast as the data can be read... it would probably take a few weeks.

RL_Quine · on April 11, 2022

Tapes are not that slow, 360MB/s per drive, and on a large scale swapping them is completely automatic.

stonogo · on April 11, 2022

In support of your point, 360MB/s is an extremely conservative estimate. I'd expect that from LTO-6, which is around ten years old, and I would certainly hope their backups are on more modern gear than that.

Zircom · on April 11, 2022

I work for a pretty huge well-known fortune 500 company with a global presence and personally handle the tape rotations for several of our data centers, and they're all still on LTO-6 tapes

stonogo · on April 14, 2022

I'm not sure that's relevant. I would expect Home Depot, Berkshire Hathaway, Alphabet, and Atlassian to have wildly different priorities. Two of them are Fortune 500 companies that I think would do fine with LTO-6. Atlassian is supposedly a cloud-first organization which arguably should be pretty aggressively tiered anyway, much less tiered onto outdated tech.

KennyBlanken · on April 12, 2022

And that's just one drive. I ran a piddling departmental backup server and we had a dual-drive library.

Four drive libraries are pretty common, too.

Nextgrid · on April 11, 2022

My theory in my other comment is that they've deleted some data and are waiting on third-party data recovery specialists. That would explain the timescale.

mdoms · on April 11, 2022

Jira cloud became much more complex when they went all in on AWS. The reason for the cloud/server fork 4 or 5 years ago was so that cloud engineers could couple to a zillion AWS services without having to build back any backwards compatibility. So data stores are very much more disparate than just a PgSQL DB and redis (which is how it used to be).

cryptonector · on April 12, 2022

Restoring everything from backups is very hard.

Something that can make restore-from-backups harder, and that I've seen happen, is when the backup/restore systems themselves get destroyed by the same black swan event. Then you have to first recover those by doing fresh installs, and you have to have all the people on hand who know what the configurations would have been to be able to then use the backup library. Then you have to begin restoring a few target systems to check that everything is OK with the restore process, then you have to restore everything though you'll be limited by the restore system's bandwidth.

How could this happen? Well, a disgruntled employee could make it happen. It happened at Paine Webber in 2002 [0]. In that case the attacker left a time bomb in the boot process on all systems they could reach, and that included the backup/restore servers. Worse, the time bomb was in the backups themselves, so restored systems ate themselves as soon as they were booted, which slowed down the recovery process.

  [0] https://www.independent.co.uk/news/business/news/disgruntled-worker-tried-to-cripple-ubs-in-protest-over-32-000-bonus-481515.html
  
      https://www.justice.gov/archive/criminal/cybercrime/press-releases/2002/duronioIndict.htm

m0llusk · on April 11, 2022

If you are not regularly testing restores then you don't have backups.

raffraffraff · on April 11, 2022

I'd say this is the answer.

I remember a situation where we had a near miss with data loss (replica failed and master had a bad disk). We didn't want to put the production database under extra load by taking a live backup while it was handling all production traffic, so we restored a backup. But it was "bad". Tried the one before it, and the one before that. Apparently they were busted for over a month due to a config change. We restored a month-old backup and started applying binlogs (which thankfully we had been backing up). But that meant replaying a month of transactions into the restored database. I can't remember the details but I think we ended up replacing the bad disk, resilvering the array and live-cloning the primary before the binlogs got fully applied to the one we restored from the old backup.

tiahura · on April 11, 2022

Went through that once in the mid- late 90’s. Each restore and test took hours, so 3-4 attempts took 2 days - with me sleeping, crying and praying in a conference room.

tpmx · on April 11, 2022

I have no specific information on this, but in general and in theory:

Isn't this how an incompetent, insincere and desperate company being subjected to a ransom attack would communicate publicly?

Too · on April 12, 2022

My guess is they failed halfway through a major schema or api migration. If some of the services have already progressed too far, then rolling back another service to previous backup snapshot will make the two incompatible. Especially if one of the services is global and the other is per customer.

The only way out is to figure out the bugs and continue migrating forward, fixing issues as they appear one by one.

coffeeling · on April 12, 2022

What if they have backups of the data, but not the specifics of users' sites?

lapser · on April 11, 2022

For those of us not up to date, what exactly has happened? Their status page hasn't actually shown why they're having to rebuild.

btgeekboy · on April 11, 2022

Supposedly they’re having to basically restore everyone from backups because a system designed to delete old data was a bit more efficient than it should have been: https://reddit.com/r/sysadmin/comments/u14qqq/_/i4a0mk8/?con...

xenadu02 · on April 11, 2022

Reminder: never delete data for real as your first step. Always mark it deleted along with a time stamp saying when. Then you can hide deleted itemsfrom everything. When a maintenance script goes haywire you can fix the problem quickly. Have a daily job that really deletes records marked deleted after 30 days.

If that is too complicated to retrofit then have any mass cleanup script move the records to a CSV file or temporary table.

Never ever ever be in a situation where a rogue script or bad SQL WHERE clause means restoring from backups.

justin_oaks · on April 11, 2022

I agree that data must never be "deleted and forever gone" unless you've already been very sure about it a few time.

But I would like to warn people about certain implementations of database "soft deletes" that I'm not a fan of. To be clear, I'm talking about the idea of having a "deleted" and/or a "date_deleted" column and using those columns in the WHERE clause to filter out rows that shouldn't be visible.

That pattern complicates the table structure, queries, and indexes. It increases table and index size, thus more data has to be sifted through (either table data or index data) to ensure only non-deleted entries are returned. More data to go through means slower queries. It's also really easy for people to write SQL that accidentally leaves the "deleted" column out of the WHERE clause. Then old, irrelevant data is being returned.

Accidentally deleting data that needs to be undeleted is usually rare so I don't think people should optimize for it. We should optimize for things that happen frequently.

I have dealt with the rare "Oops! I deleted important data!" by restoring from backups and it has worked fine. I think it may be too strong to say you should never be in a position to restore data from a backup. In fact, I think it's important to streamline the restore process.

For cases where we know ahead of time that we want to query deleted data I'll move deleted data to another database table that exists solely for maintaining history. For example, an ORDER table will have a DELETED_ORDER table, or an ORDER_HISTORY table. The HISTORY tables can also record data overwritten from updates.

These tables take up disk space, but never affect the structure or size of the original table and its indexes. Queries to the original table don't need to be modified to account for soft deletes.

To guarantee that things go to the delete/history tables, I'll usually put a trigger on the original table to move data over to the history tables. This way no application-specific code is needed.

bartread · on April 11, 2022

> Accidentally deleting data that needs to be undeleted is usually rare so I don't think people should optimize for it.

That's very use case dependent.

We've made it easy for people to undelete data they've accidentally deleted simply because they used to do it so often and the only people who could get it back were our tech team. We're a devops org so part of our job is of course to support the systems we build, but our time is better spent on building solutions to business problems than to repeatedly providing support for issues that come up all the time. Part of building those systems is of course engineering in solutions that make it hard to screw up, and easy to unscrew when things inevitably do go wrong. No mean feat given our platform dates back over 15 years and still includes a lot of legacy from the time when tech was just a couple of people.

I suppose the object lesson here is that edge cases in one system or company can be part of core business in another so it's best not to make too many assumptions.

Trasmatta · on April 11, 2022

Agreed, soft deleting adds so much complexity to everything. And even has the potential for privacy related bugs. Like, say, accidentally forgetting to respect the deleted column in a query on a joining table that determines user permissions for some resource. Now people have access to something they had permissions revoked for. Whoops.

Gigachad · on April 11, 2022

>I have dealt with the rare "Oops! I deleted important data!" by restoring from backups and it has worked fine.

Usually this goes along with "Oh and the other team did some important work at the same time" so you can't just restore a backup. You either tell them to deal with it or start writing custom scripts to copy out only the data you want to restore.

A more sane solution would be soft delete for x days and after that it becomes a real delete.

Too · on April 12, 2022

I guess one could make a view for each table that always includes the where deleted = false, to not bother about it in application code. Still yes, it adds complexity.

izacus · on April 11, 2022

A "deleted" field type deletion is also how you get a massive fine from a GDPR agency when they find out that you're not actually deleting PII properly.

noasaservice · on April 11, 2022

One smaller social media app I used to sysad over actually overwrote data to be purged as xX0-Deleted-oXx and similar (there were a few variants depending on data constraints). There was no "show_deleted_when" garbage.

Then weekly, a task went in and then purged rows with those non-content placeholders to completely purge that user, if a user-purge was requested.

saati · on April 12, 2022

I don't see how moving it to another table is any way different from that perspective.

sitharus · on April 12, 2022

If you're using MS SQL Server it natively supports temporal tables, which take care of this problem for you. There's also an extension for Postgres, and of course triggers.

Temporal tables should really be used more often.

bpp · on April 11, 2022

Works until you get a bug in the deletion job. I've seen exactly this happen.

hyperman1 · on April 11, 2022

You don't even need a bug. Just a wrong system clock.

We had a few windows laptops where something caused them to time travel to 8000 years in the future. Then, they'd slowly spend a few hours deleting every local profile, as nobody had logged in to them for 8000 years. Then, they'd do something to their time zone database and travel back 8000 years.

When they started the process, it was unstoppable. Trying to modify the system clock to something sane just caused them to depart to the future again, even if disconnected from the network. None of our users was very amused by this behavior, even if everything important was backed up.

tetsusaiga · on April 11, 2022

that is almost cartoonishly nightmarish

hyperman1 · on April 11, 2022

It happened specifically to 1 type of laptop and we only had about 30 of them. So we pulled all of them out of roulation. Then covid struck, so I reformatted most of them with Debian and we gave them away for home schooling. I wonder if I managed to linuxify some kid in the process.

thatwasunusual · on April 11, 2022

Thanks to you, plenty of kids now think they live 8,000 years in the future! :)

tetsusaiga · on April 12, 2022

Oh man, so this was relatively recent no less. Hopefully you did!

ghostly_s · on April 11, 2022

Considering the sorry state of videoconferencing on Linux they probably all immediately had Windows reinstalled.

bityard · on April 11, 2022

Not sure what this means, I've been using video conferencing on my Linux laptop for work on a daily basis for the last 5 years.

jdbernard · on April 11, 2022

Yeah, the idea is that by expecting the deletion logic you can make it simpler and more rigorously tested than regularly changing business logic or application code.

If you organizationally cannot prioritize quality then nothing can help you.

danuker · on April 11, 2022

> or bad SQL WHERE clause

Tip: Begin an SQL session with BEGIN TRANSACTION; at the end you can either COMMIT or ROLLBACK.

bmurphy1976 · on April 11, 2022

Bonus tip: don't even type COMMIT until after you've run a few select queries to verify the data has in fact changed how you expected it to change.

beart · on April 11, 2022

super duper bonus tip: Don't execute SQL in a production environment that you just typed out on the fly.

sergiotapia · on April 11, 2022

I fucked up once and lost 2 hours of customer data. I was so lucky we were a small startup and had daily backups AND the backup was only 2 hours old. I would have been royally fucked. Never making that mistake again.

Always use a copy of prod on a staging server and run your queries there for testing.

mdoms · on April 11, 2022

It's incredibly depressing how common this is in the real world, but I can tell you from experience that this NEVER happens in Atlassian production systems.

Trasmatta · on April 11, 2022

Another tip: never enter queries directly in a production database connection with write access in the first place. (Ideally very few people even have that level of access.) Write it in your codebase, write tests for it, get it code reviewed, and run it in a dry run first and get a list of affected records before running it for real.

davio · on April 11, 2022

I always liked LIMIT when doing dangerous stuff in prod

progman32 · on April 12, 2022

For anyone doing this, just keep in mind that chucking on a LIMIT 1 can give you a false sense of security. For example, say you want to drop a single row but forget the WHERE. A LIMIT 1 will return "yep, deleted one row" but it's not the one you wanted (instead, it's whatever row came up first). Better to do the operation in a transaction that you can rollback - that way, you can better validate the results of your operation before committing.

m0llusk · on April 11, 2022

This is also true for people cleaning house or going minimal. Put it in a well labeled box and tuck it away somewhere. If it matters then you'll fetch it, otherwise just chuck it when the box gets in the way.

vbezhenar · on April 11, 2022

What's wrong with restoring from backups? This is one reason they exist after all. I don't think that making a mistake in delete statement is something you would do every week.

Gigachad · on April 11, 2022

Because you lose all the work done since the issue happened. It's very rarely acceptable to just give up and do a full backup restore unless literally everything is gone. If there was just some bug that cause partial issues, you have to find some other way to fix it.

btgeekboy · on April 11, 2022

This whole debacle is Exhibit A.

They’re lucky they have a sound backup strategy in place, and that the amount of data lost is appearing to be minimal.

wrycoder · on April 12, 2022

Yea, and this whole thread is very informative for a non-db specialist like me!

lonelyasacloud · on April 11, 2022

Could be that they are restoring from backups - just that that restoring from those backups is very very slow. Atlassian would not be the first where resourcing and testing a speedy disaster recovery strategy wasn't given the highest engineering priority.

ignoramous · on April 12, 2022

> Never ever ever be in a situation where a rogue script or bad SQL WHERE clause means restoring from backups.

As a second step, restore from the backup at a set frequency. This would force orgs to automate and optimize not just the backup flow but also the restore flow. Tear-down and restore entire systems from backups. Of course, doing so enormously adds to the cost, but when there's an outage, it will pay itself over.

schrismartin · on April 11, 2022

How does that work around GDPR and other "Right to be forgotten" legislation? Aren't we required to hard-delete this kind of data?

lazide · on April 11, 2022

You are within a certain period of time, not ‘instantly’ (depending on the exact situation you are referring to). The script could take that into account (using a shorter period of time or the like)

vbezhenar · on April 11, 2022

What is hard deletion? You can restore rows from database files before vacuum runs. You can often restore data from disk sectors. Some people say SSD can remap sectors under your chair and you won't even know that your deleted data is there.

Gigachad · on April 11, 2022

The law isn't a technical specification. You have to follow the spirit of the law. A soft deleted_at timestamp wouldn't be following the law in good faith. Having some data stuck in an unmapped section of an ssd would be within the spirit.

nwallin · on April 12, 2022

IANAL, but IMHO a soft 'deleted_at' timestamp along with a daily cron job that hard deletes everything with a deleted_at older than 24 hours would fall within the spirit of the law.

I agree that just having a deleted_at timestamp and old entries are never pruned would not be a good faith interpretation of the law.

Gigachad · on April 12, 2022

From what I have seen, there is no requirement for instant deletes. Even emailing a support address and having them manually delete the data is acceptable. Most places using deleted_at never clean up the data from what I have seen though.

12baad4db82 · on April 11, 2022

As long as the data is deleted within a month there should not be any GDPR concerns.

> The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay

> “Undue delay” is considered to be about a month

https://gdpr.eu/right-to-be-forgotten/

duxup · on April 12, 2022

Amen. I always default to “soft” deleting whenever possible.

mdoms · on April 11, 2022

> This data was from a deprecated service that had been moved into the core datastore of our products.

That is very interesting. This implies they are backing off, at least somewhat, from their very aggressive microservice strategy. Perhaps they feel like they have gone too far in decomposing their products.

jeffybefffy519 · on April 11, 2022

Or ransomware got them.

haunter · on April 11, 2022

>While running a maintenance script, a small number of sites were disabled unintentionally.

https://twitter.com/Atlassian/status/1511870509973090304

Most likely they wiped the data

seanstev · on April 12, 2022

I think sites are their account management stuff. Sounds like they deleted user accounts and not actual data. Notice that only native products are down and not acquisitions. They probably just haven’t migrated those yet.

yabones · on April 11, 2022

We're mere weeks aware from migrating to their could platform after the self-hosted rugpull. This really doesn't give me confidence in their ability to not break my stuff.

justin_oaks · on April 11, 2022

There are two ways this can go:

1) This outage will get their organization to prioritize work such that it never happens again.

2) This outage is representative of a dysfunctional organization that can't prioritize work correctly.

If you've been using Atlassian software for a while and are used to how they prioritize tickets then one of those options seems far more likely than the other.

borplk · on April 12, 2022

I can tell you #1 never happens. It will be a temporary effect of the management green lighting the years of neglected maintenance work until everybody forgets about it and it will go back to business as usual until the next incident happens and the cycle repeats.

laurent92 · on April 11, 2022

> 1) This outage will get their organization to prioritize work such that it never happens again

It has already happened in the past.

uuyi · on April 11, 2022

2012. https://www.theregister.com/2012/05/09/atlassian_cloud_stora...

lmc · on April 11, 2022

"An Atlassian spokesperson contacted by The Reg blamed the loss on the failure of “multiple” storage disks at its cloud storage provider on 28 April."

10 years ago to the month.

So as long as everyone plans a vacation for this time in 2032 it'll be OK.

justin_oaks · on April 11, 2022

Ah, you mean that outages have happened in the past, not that Atlassian has changed because of an outage.

uuyi · on April 11, 2022

Same situation.

Absolutely no one even knew this was happening and doesn’t give a shit now because it’s a project death march.

JIRA as a whole has been a fucking shit show of a product over the last decade even on-prem.

digerata · on April 11, 2022

I will say that back when Bugzilla was it, JIRA rocked. It was amazing the new power you had and the functionality it provided.

We self-hostd JIRA from 2008ish to 2014ish. (memory is fading on exact dates) By the time we decided to stop using JIRA, we fracking hated JIRA and would never return.

Since then, GitHub Issues, Trello, Clubhouse (neé Shortcut) all provide less friction in day to day use. As an Enterprise, I do believe Shortcut is your best bet.

cataflam · on April 11, 2022

Not trying to nitpick, just for your information

> Clubhouse (neé Shortcut)

It's the opposite: Shortcut (née Clubhouse), as "née" means "born", so it's the name it had at birth, the old name

The more you know!

paledot · on April 12, 2022

A valuable piece of pedantry, since I hadn't heard of either name and interpreted the GP to be referring to a product now named Clubhouse.

abraae · on April 11, 2022

Why would you continue with that plan? You couldn't get a clearer warning signal.

yabones · on April 11, 2022

To be quite honest, the problem is historical. We have over a decade of project plans, support tickets, change control logs, etc in our Jira instance. There's simply no painless way to export that into another product that will have approximately the same functionality and features. There's a few that come close, but all fall short of a drop-in replacement.

The only options now are the $$$$$ "datacenter" license, migrating to the dangerously unstable cloud, or not doing anything and running unsupported EOL software.

abraae · on April 11, 2022

I totally understandthat. But at the same time, a multi-week outage is really a sign of an org that simply does not have their shit together at all.

But the lack of transparency is the worst. Another post speculated that Atlassian has lost data, doesn't even have backups, and is re-creating it by munging their emails and diffing them to re-create history. I can't really imagine that's tue - but what if it is, and Atlassian is concealing things?

joshuaissac · on April 11, 2022

> the $$$$$ "datacenter" license, migrating to the dangerously unstable cloud, or not doing anything and running unsupported EOL software

Data Center is pricier than Server, but isn't it still cheaper than Cloud? And you control your own back-ups as with Server, so Atlassian cannot lose your data.

thanatos519 · on April 11, 2022

"could platform" ... perfect typo!

alligatorplum · on April 11, 2022

My company is in the middle of a multi year transition from selfhosting atlassian products to using their cloud offerings, and I am sure the infrastructure team/management is very thrilled to see this news.

croutonwagon · on April 11, 2022

While our tenant was unaffected, i told my management of this issue. They just shrugged and said we could watch and eat popcorn. I was half way expecting them to raise eyebrows

I was kinda like....."not really the point of bringing it up."

Its worth noting we have had them just delete things within our account before. In fact one of our Senior VP's had their account just....disappear one day. We couldn't @ them in chats, tickets etc. Atlassian just shrugged and "restored" the account and said it was some issue with a stored proc on their backend or something.

I have always felt uneasy about how flippant they are in their processes. But it seems that is not shared.

GeorgeTirebiter · on April 11, 2022

I'm sure they will also be thrilled waiting sometimes seconds for a character to echo. Good Luck, I suffer with cloud Atlassian every day.

ejb999 · on April 11, 2022

I use JIRA and confluence every single day, I have to, it is everywhere - but imo it is such a horrific toolset in every way (even before this outage), I can't for the life of me figure out how it got so much market-share.

alkonaut · on April 11, 2022

Because all the others are equally bad, basically. The options for enterprisey issue management are quite slim.

movedx · on April 12, 2022

I really like ClickUp. Never had an issue.

alkonaut · on April 12, 2022

Does it have good API for extending with what's missing compared to Atlassian/Microsoft/Jetbrains offerings, i.e. the tight connection between issue tracking other aspects than issue tracking such as builds, deployments etc? E.g. how pull requests are related to work items, or which work items end up in a specfic build/deployment etc?

jamesfinlayson · on April 12, 2022

It launched in 2002 - well before my time but I'm guessing that for the first few years of its life there weren't many competitors.

Now it has endless competitors and I'm led to believe that it has accumulated lots of features that businesses can't live without but which the average end user never touches.

rcurry · on April 11, 2022

Reminds me of the time a group I worked for at a National Lab decided to call the root folder for their project “core”. I bet you can’t guess what filename the backup scripts were configured to ignore…

lmc · on April 11, 2022

Makes total sense - the core is the part of the apple everyone throws away.

outsb · on April 11, 2022

This is an expensive lesson I think everyone gets to learn at some point. There's no such thing as a file worth excluding from a backup, it always fucks you, some (like me) more than others. Have to buy twice as much disk? Fine, at least you know you actually have a backup

ivanvas · on April 11, 2022

Looking at the company financials and timing, the wooden headquarters might have to wait.

I don't envy engineers there right now, some dpts. Stay strong, don't burn out!

https://architectureau.com/articles/worlds-tallest-hybrid-ti...

mh- · on April 11, 2022

Wow, those renderings are gorgeous. (If you click the top photo, there's a gallery of 5 more you can scroll through)

threeseed · on April 11, 2022

Atlassian Q2 revenue was up 37% y/y.

Those sort of growth rates are incredible for a company this size.

They can easily afford their timber treehouse.

Throwawayaerlei · on April 12, 2022

The "Edifice Complex" has been taken to be a bad portent for a company.

Obviously not universal, but if upper management has decided to spend a lot of time focusing on a marque building they're not focusing on the business itself.

syshum · on April 11, 2022

I bet it does not even blimp their financials

Sadly it seems society has assimilated vendors behaving poorly. Data breaches and incompetence does not seem to phase purchasing choices these days

clhodapp · on April 11, 2022

I wish companies would stop with this "small number of customers" messaging. It always seems disingenuous and, besides, that matters for your internal estimation of business impact but means absolutely nothing to the customers affected.

tedunangst · on April 11, 2022

I think they add that because otherwise you read about the problem, panic, and then spend hours digging through your own data to make sure it's all there. Unaffected customers like being told they're not affected.

debarshri · on April 11, 2022

I guess they are using JIRA for planning their sprints.

jds375 · on April 12, 2022

Wow! I didn’t realize the scope and duration of this outage. This must be doing some serious damage to some of their clients (catastrophic if this does impact JIRA, Confluence, and OpsGenie broadly on a company level). Is there any report of approximately how many (or specific) companies have been affected as a result of this?

floatinglotus · on April 11, 2022

Atlassian does not care about individual customers. They are purely driven by numbers. I listened to a presentation by one of their founders a long time ago where he admitted that statistics and number management was part of their DNA. They don’t think about the customer’s name, maybe this is right, maybe not.

hogrider · on April 11, 2022

That's ever single company under capitalism tho. Anything else is only empty platitudes.

mattweinberg · on April 11, 2022

More discussion here: https://news.ycombinator.com/item?id=30973808

barnabee · on April 11, 2022

If I was made to use Jira I’d be pretty happy about this

abrookewood · on April 12, 2022

Meanwhile, people have been waiting since at least 2013 for Atlassian to deliver a way to automate backups for their Cloud offerings: https://jira.atlassian.com/browse/CLOUD-6498

And there's still very little movement: https://community.atlassian.com/t5/Backup-Restore-articles/E...

But don't worry! It's in the Cloud! It's all fine!

ost-ing · on April 12, 2022

Whats the legal implications of such downtime for Atlassian? I could imagine thousands of companies unable to manage employees, product releases, bug fixes, rollbacks and more from this.

Could Atlassian be liable for damages?

aaronbrethorst · on April 11, 2022

The wildest part about this service outage to me is that Atlassian's stock is up 10% over the past month.

hogrider · on April 11, 2022

The stock market parted with reality sometime during 2020.

ergocoder · on April 12, 2022

Can you imagine screwing up this bad and customers still not leave?

I'll probably should buy their stock.

threeseed · on April 11, 2022

Might have more to do with them moving their HQ from UK to US.

This would allow them access to more investors.

ben-gy · on April 12, 2022

From what I can find online HQ is in AU, and is staying in AU?

Kwpolska · on April 11, 2022

Breaking news: stocks are meaningless numbers.

radicalriddler · on April 12, 2022

The outage has been in the past week and it's down over 10%... talking about the last months price changes is irrelevant compared to the last week.

api · on April 11, 2022

It's funny that they recently killed self-hosted Jira. If you'd self-hosted you'd be fine.

nolok · on April 11, 2022

To be fair, Jira screams SaaS.

I don't want any of my company trapped on it, but if they were I'm sure as well not going to self host that spawn of hell.

shrdlu2 · on April 11, 2022

Self-hosted full Atlassian stack (jira, confluence, bamboo, bitbucket) for 6+ years. ~200k tickets. To be honest, it just runs with minimal issues.

Mostly downtime is just upgrades. I can remember a few times we've had to add (JVM) memory as our usage increased. Not sure what we're going to do with the discontinuation of server product line. We self-host to keep source code, etc. more than one configuration mistake (or zero-day) away from exposing it to the world.

briffle · on April 11, 2022

We are in the same boat. We are looking at the cloud, but the migration tools are just a pain. I can migrate a project to the cloud from Jira, and it even gives me a report of workflow transitions I have to manualy update/change to fix, which is great.

But then, there is no way to keep it in sync. I have to blow that project away in jira cloud, and migrate it again.

So I Have to hard-cut over projects, on a system that has dozens and dozens of projects, and somehow have people figure out which ones are where. or one really, really ugly night to cut it all over, and hope it goes well.

I'm looking for alternatives, but our team is so invested in some very, very customized workflows, its going to be a pain.

Nemo_bis · on April 12, 2022

Is there any way to migrate away from Jira, or is it full lock-in? I mean, is there any migration tool or service to Redmine, Gitlab, MantisBT, Trac or whatever?

I think one reason Atlassian was successful is that they always invested a lot of effort in building tools to migrate to their products from any of their competitors (obviously not the other way around).

msla · on April 11, 2022

Self-hosting Jira is running a big Java application that has its own directory structure and talks to a Postgres database. It really isn't hard. Even upgrades are automated using shell scripts, although you do need to manually replace some configuration files afterwords for some damn reason. Configuring Jira is complicated, but that's the same regardless of whether you self-host.

api · on April 11, 2022

Self-hosted Jira sucked because it was shitty software, not because self-hosting has to suck. Mattermost and Jitsi are easy to self-host, to give two examples that are not complete shit.

thematt · on April 11, 2022

They've only killed off one version of self-hosted Jira. Their datacenter edition is still alive.

throwawayboise · on April 11, 2022

> If you'd self-hosted you'd be fine.

Maybe. But you're counting on your sysadmin(s), who are also managing dozens of other things, to keep up to speed on Jira and its quirks, and apply patches and new versions as they become available without missing any steps or screwing something up.

On average, you're still probably better off having a company that knows the product also host it for you, but obviously they can make mistakes too, and the downside is that when they do it might affect all clients, not just one.

nvr219 · on April 11, 2022

> and the downside is that when they do it might affect all clients, not just one.

This is also potentially an upside. For example when us-east-1 went down recently, customers were somewhat understanding because it was "amazon's fault" and everyone was down - it was in the news, etc. If we ran our own data center and that went down, our customers would've just said "why did you morons roll your own data center instead of just using aws?"

Overtonwindow · on April 11, 2022

I worked at a company that self hosted Jira, and it was miserable then, I can’t imagine depending on the cloud. I’ll never approve Atlassian products after that experience.

laurent92 · on April 11, 2022

How many tickets do you have in your non-Jira ticketing system? I count 100 tickets per person per year.

media-trivial · on April 11, 2022

We're using the cloud version and we're fine too (no outage). What's your point? Are you claiming that self-hosted is never down? Or that self-hosted is more reliable? Because I doubt that. Difference is just that when self-hosted goes down, it doesn't end up in the news.

rileymat2 · on April 11, 2022

I think there is a strange psychological trait that I have, and others may as well, where I am much more forgiving breaking my own stuff than having someone else do it.

_ktx2 · on April 11, 2022

When you host Jira yourself, the monthly subscription pays for Software + Software updates. When Atlassian hosts it for you, you're paying for Software + Software Updates + Service (hosting). When you hosted yourself, a team gets blasted for not monitoring it or updating it correctly. When Atlassian fails to do it (and charges for it) then they get the heat.

All that to say, I don't think it's a weird phenomenon, it's just you're realizing that you're paying someone else for something that's not delivered on.