Hacker News new | past | comments | ask | show | jobs | submit login
CrowdStrike accepting the PwnieAwards for "most epic fail" at defcon (twitter.com/singe)
405 points by teddyh 31 days ago | hide | past | favorite | 356 comments



I appreciate that we’re finding the humour in this catastrophe but what about the question of liability? I have seen a few stories on HN of the billions lost by this event but so far not much in the way of lawsuits.

What is the situation? Are the licenses so ironclad that customers have no recourse? I could understand this in the case of consumers who might suffer minor inconvenience as their home PC is out of service for a few hours/days but it seems totally unacceptable for industries to accept this level of risk exposure.

This is one of the big reasons civil engineering is considered such a serious discipline. If a bridge collapses, there’s not only financial liability but the potential for criminal liability as well. Civil engineering students have it drilled into their heads that if they behave unethically or otherwise take unacceptable risks as an engineer they face jail time for it. Is there any path for software engineers to reach this level of accountability and norms of good practice?


> Civil engineering students have it drilled into their heads that if they behave unethically or otherwise take unacceptable risks as an engineer they face jail time for it. Is there any path for software engineers to reach this level of accountability and norms of good practice?

The problem is that with civil engineering you're designing a physical product. Nothing is ever designed to its absolute limit, and everything is built with a healthy safety margin. You calculate a bridge to carry bumper-to-bumper freight traffic, during a hurricane, when an earthquake hits - and then add 20%. Not entirely sure about whether a beam can handle it? Just size it up! Suddenly it's a lot less critical for your calculations to be exactly accurate - if you're off by 0.5% it just doesn't matter. You made a typo on the design documents? The builder will ask for clarification if you're trying to fit a 150ft beam into a 15.0ft gap. This means a bridge collapse is pretty much guaranteed to be the result of gross negligence.

Contrast that to programming. A single "<" instead of "<=" could be the difference between totally fine and billions of dollars of damages. There isn't a single programmer on Earth who could write a 100% bug-free application of nontrivial complexity. Even the seL4 microkernel - whose whole unique selling point is the fact that it has a formal correctness proof - contains bugs! Compilers and proof checkers aren't going to complain if you ask them to do something which is obviously the wrong thing but technically possible. No sane person would accept essentially unlimited liability over even the smallest mistakes.

If we want software engineers to have accountability, we first have to find a way to separate innocent run-of-the-mill mistakes from gross negligence - and that's going to be extremely hard to formalize.


To add onto this, the Pwnie Awards also go to people who get attacked, which is something that e.g. civil engineers certainly don't get blamed for (i.e. if a terrorist blows up their bridge).

We would need a way to draw a liability line between an incident that involves a 3rd party attack, and one that doesn't, but things like SolarWinds even blur that line where there was blame on both sides. When does something become negligence, versus just the normal patching backlog that absolutely exists in every company?

And why are people aiming the gun already at software engineers, rather than management or Product Architects? SE's are the construction workers at the bridge site. Architects and Management are responsible for making, reviewing, and approving design choices. If they're trying to shift that responsibility to SEs by not doing e.g. SCA or code reviews, that's them trying to avoid liability.

Honestly, this reaction by the CEO is great for taking responsibility. Even if there's not legal liability, a lot of companies are still going to ditch CrowdStrike.


> the Pwnie Awards also go to people who get attacked, which is something that e.g. civil engineers certainly don't get blamed for

To be clear, this incident was not due to an attack—CrowdStrike just shot themselves in the foot with a bad update.


True, but the reason CrowdStrike has code running in a manner that is capable of bringing down the system, and the reason they push out updates all the time, is because they are in general combating attackers.

If there were no attacks, you wouldn't need such defensive measures, meaning the likelihood of a mistake causing this kind of damage would be almost nothing.


The trade is already a constant struggle with management over cutting corners and short term thinking. I’m not about to be blamed for that situation.


Do you think the situation for real engineers is different?


Look, I wasn't expecting anyone to thank me for my service when I went back to school for COBOL and saved all of your paychecks circa '97 - '99, but I'm not going to sit here and be compared to those bucket-toting girder jockeys.


Yes. Because whilst the same pressures exist, there's a short number of engineers licensed to actually sign off on a project, and they're not going to jeopardise that license for you.


Sounds like a case of real consequences for engineers working out well.


Only if you ignore downsides like drastically increased costs for most civil engineering projects.


> people who get attacked, which is something that e.g. civil engineers certainly don't get blamed for (i.e. if a terrorist blows up their bridge).

There's a really big difference though. In the physical world, an "attack" is always possible with enough physical force -- no matter how good of a lock you design, someone can still kick down the door, or cut through it, or blow it up. But with computer systems, assuming you don't have physical access, an attack is only possible as a result of a mistake on part of the programmers. Practically speaking, there's no difference between writing an out-of-bounds array access that BSoD's millions of computers, and writing an out-of-bounds array access that opens millions of computers to a zero-day RCE, and the company should not be shielded from blame for their mistake only in the latter case because there's an "attacker" to point fingers at.

Over the past few years of seeing constant security breaches, always as the result of gross negligence on the part of a company -- and seeing those companies get away scot free because they were just innocent "victims of a cyberattack", I've become convinced that the only way executives will care to invest in security is if vulnerabilities come with bankrupt-your-company levels of liability.

Right now, the costs of a catastrophic mistake are borne by the true victims -- the innocent customer who had their data leaked or their computer crashed. Those costs should be born by the entity who made the mistake, and had the power to avoid it by investing in code quality, validating their inputs, using memory-safe languages, testing and reviewing their code, etc.

Yes, we can't just all write bug-free code, and holding companies accountable won't just stop security vulnerabilities overnight. But there's a ton of room for improvement, and with how much we rely on computers for our daily lives now, I'd rather live in a world where corporate executives tell their teams "you need to write this software in Rust because we'll get a huge discount on our liability insurance." It won't be a perfect world, but it'd be a huge improvement over this insane wild west status quo we have right now.


> In the physical world, an "attack" is always possible with enough physical force -- no matter how good of a lock you design, someone can still kick down the door, or cut through it, or blow it up. But with computer systems, assuming you don't have physical access, an attack is only possible as a result of a mistake on part of the programmers.

It's exactly the opposite.

In the physical world, you mostly only have to defend against small-time attackers. No bank in the world is safe from, say, an enemy army invading. The way that kind of safety gets handled is by the state itself - that's what the army is for.

In the digital world, you are constantly being attacked by the equivalent of a hundred armies, all the time. Hackers around the world, whether criminals or actual state-actors, are constantly trying to break into any system they can.

So yes, many breaches involve some kind of software issue, but it is impossible to never make any mistake. Just like no physical bank in the world would survive 1000s of teams trying to break in every single day.


> state-actors, are constantly trying to break into any system they can.

I thought state actors prefer to buy over build. Do they really need to build a Botnet over your personal computer over just expanding their own datacenter ?


State actors breaking into systems aren't doing it to use them in a botnet...


Agreed on all counts.

> In the digital world, you are constantly being attacked by the equivalent of a hundred armies, all the time. Hackers around the world, whether criminals or actual state-actors, are constantly trying to break into any system they can.

This is why I think cyberattacks should be seen from the "victim"'s perspective as something more like a force of nature rather than a crime -- they're ubiquitous and constant, they come from all over the world, and no amount of law enforcement will completely prevent them. If you build a building that can't stand up to the rain or the wind, you're not an innocent victim of the weather, you failed to design a building for the conditions you knew would be there.

(I'm not saying that we shouldn't prosecute cyber crime, but that companies shouldn't be able to get out of liability by saying "it's the criminals' fault").

> So yes, many breaches involve some kind of software issue, but it is impossible to never make any mistake.

It's not possible to never make a mistake, no. But there's a huge spectrum between writing a SQL injection vulnerability and a complicated kernel use-after-free that becomes a zero-click RCE with an NSO-style exploit chain, and I'm much more sympathetic to the latter kind of mistake than the former.

The fact is that most exploits aren't very sophisticated -- someone used string interpolation to build an SQL query, or didn't do any bounds checking at all in their C program, or didn't update 3rd-party software on an internal server for 5 years. And for as long as these kinds of mistakes don't have consequences, there's no incentive for a company to adopt the kind of structural and procedural changes that minimize these risks.

In my ideal world, companies that follow good engineering practices, build systems that are secure by design, and get breached by a nation state actor in a "this could have happened to anyone" attack should be fine, whether through legislation or insurance. But when a company cheaps out on software and develops code in a rush, without attention to security, then they shouldn't get to socialize the costs of the inevitable breach.


> If you build a building that can't stand up to the rain or the wind, you're not an innocent victim of the weather, you failed to design a building for the conditions you knew would be there.

I genuinely have no idea how liability for civil engineering works, but the evidence of my eyes is that entire Oklahoma towns built by civil engineers get wiped off the map by tornadoes all the time. Therefore I assume either we can't design a tornado-proof building, or civil engineering gets the same cost-benefit analysis as security engineering. The acceptable cost-benefit balance is just different. But we can't be selling $10 million tornado-proof shacks, and we can't be selling $10 million bug-proof small business applications, if either is even possible.


> If you build a building that can't stand up to the rain or the wind, you're not an innocent victim of the weather, you failed to design a building for the conditions you knew would be there.

This is why I liken it to protecting from an army. Wanting to protect a building from rain is fine - rain is a constant that isn't adapting and "fighting back".

Find me a building that is able to keep its occupants safe from an invading army, and then we'll talk. It's impossible. That's what we built armies for.

> But there's a huge spectrum between writing a SQL injection vulnerability and a complicated kernel use-after-free that becomes a zero-click RCE with an NSO-style exploit chain, and I'm much more sympathetic to the latter kind of mistake than the former.

To be clear, I agree that there's a spectrum, and I wouldn't want to make it so that companies can get away with everything. But I'm not sure we have a good solution for "my company has 10k engineers, one of them five years ago set up a server and everyone forgot it exists, now it's exploitable". Not in the general case of having so many employees.

> The fact is that most exploits aren't very sophisticated -- someone used string interpolation to build an SQL query, or didn't do any bounds checking at all in their C program, or didn't update 3rd-party software on an internal server for 5 years. And for as long as these kinds of mistakes don't have consequences, there's no incentive for a company to adopt the kind of structural and procedural changes that minimize these risks.

I'm not a security researcher, but I'd guess that most exploits are even simpler - they don't even necessarily rely on software exploits, they rely on phishing, on social engineering, etc.

I've seen plenty of demos of people being able to "hack" many companies by just knowing the lingo and calling a few employees while pretending to be from IT.

This doesn't even include "exploits" like getting spies into a company, or just flat-out blackmailing employees. Do you think the systems you've worked on are secure from a criminal organization applying physical intimidation on IT personnel? (I won't go into details but I'm sure you can imagine worst-case scenarios here yourself.)

> But when a company cheaps out on software and develops code in a rush, without attention to security, then they shouldn't get to socialize the costs of the inevitable breach.

I agree, but there's a huge range between "builds software cheaply" and "builds software which is secure by default" (the second being basically impossible - find me a company that has never been breached if you think it's doable).

We want to make companies pay the cost when it incentivizes good behavior. That's sometimes the case, hence my agreeing with you for many cases.

But security is a game of weakest links, and given thousands of adversaries of various levels of strength, from script-kiddies to state actors, every company is vulnerable on some level. Which is why, in addition to making companies liable for real negligence, we have to recognize that no company is safe, even given enormous levels of effort, and the only way to truly protect them is via some state action.

The reason your bank isn't broken into isn't just that they are amazing at security - it's that if someone breaks into your bank, the state will investigate, hunt them down, arrest them and imprison them.


Show me a company that claims it's never been breached in some way, and I'll show you a company that has no clue about security, including their prior breaches.


Having such consequences would completely stop any innovation and put us into a complete technological stagnancy.

Which would of course result in many other and arguably much worse consequences for society.


Oh, it would do worse than that.

Every country in the world would see this as their big chance to overtake the US. Russia, China, you name it.

You would have to be an idiot to start a software company in the US. High regulation, high cost of living, high taxes, high salaries, personal liability, and a market controlled by monopolies who have the resources to comply.

They’ll leave. The entire world will be offering every incentive to leave. China would offer $50K bonuses to every engineer that emigrated the next day.


I'm confused. Why would they emigrate? You just said "high salaries"?

Moreover, China is hardly low regulation. You would get there and then not be able to check your email.


This is less complicated than you think.

Civil engineering rules, safety margins and procedures have been established through the years as people died from their absence. The practice of civil engineering is arguably millennia old.

Software is too new to have the same lessons learned and enacted into law.

The problem isn’t that software doesn’t have the kind of practices and procedures that would prevent these kinds of errors, (see the space shuttle code for example), it is that we haven’t formalized their application into law, and the “terms of service” that protects software makers has so far prevented legal case law from ensuring liability if you don’t use them.

Software engineering, compared to other engineering disciplines, has had a massive effect on the world in an incredibly short amount of time.


The other side of it is this. By law, a licensed civil engineer must sign off on a civil engineering project. When doing so, the engineer takes personal legal liability. But the fact that the company needs an engineer to take responsibility means that if management tries to cut too many corners, the engineer can tell them to take a hike until they are willing to do it properly.

Both sides have to go together. You have to put authority and responsibility together. In the end, we won't get better software unless programmers are given both authority AND responsibility. Right now programmers are given neither. If one programmer says no, they are just fired for another one who will say yes. Management finds one-sided disclaimers of liability to be cheaper than security. And this is not likely to change any time soon.

Unfortunately the way that these things get changed is that politicians get involved. And let me tell you, whatever solution they come up with is going to be worse for everyone than what we have now. It won't be until several rounds of disaster that there's a chance of getting an actually workable solution.


Engineering uses repeatable processes that will ensure the final product works with a safety margin. There is no way to add a safety margin to code. Engineered solutions tend to have limited complexity or parts with limited complexity that can be evaluated on their own. No one can certify that a 1M+ line codebase is free from fatal flaws no matter what the test suite says.


> There is no way to add a safety margin to code.

This is, in my opinion, an incredibly naive take.

There are currently decades of safety margin in basically all running code on every major OS and device, at every level of execution and operation. Sandboxing, user separation, kernel/userland separation, code signing (of kernels, kernel extensions/modules/drivers, regular applications), MMUs, CPU runlevels, firewalls/NAT, passwords, cryptography, stack/etc protections built into compilers, memory-safe languages, hardware-backed trusted execution, virtualization/containerization, hell even things like code review, version control, static analysis fall under this. And countless more, and more being developed and designed constantly.

The “safety margin” is simply more complex from a classic engineering perspective and still being figured out, and it will never be as simple as “just make the code 5% more safe.” It will take decades, if not longer, to reach a point where any given piece of software could be considered “very safe” like you would any given bridge. But to say that “there is no way to add a safety margin to code” is oversimplifying the issue and akin to throwing your hands up in the air in defeat. That’s not a productive attitude to improve the overall safety of this profession (although it is unfortunately very common, and its commonality is part of the reason we’re in the mess we’re in right now). As the sibling comment says, no one (reasonable) is asking for perfection here, yet. “Good enough” right now generally means not making the same mistakes that have already been made hundreds/thousands/millions of times in the last 6 decades, and working to improve the state of the art gradually over time.


Exactly.

Part of the evaluation has to be whether the disaster was due to what should have been preventable. If you're compromised by an APT, no liability. Much like a building is not supposed to stand up to dynamite. But someone fat fingered a configuration, you had no proper test environment as part of deployment, and hospitals and 911 systems went down because of it?

There is a legal term that should apply. That term is "criminal negligence". But that term can't apply for the simple reason that there is no generally accepted standard by which you could be considered negligent.


Except nobody is asking for perfection here. Every time these disasters happen, people reflexively respond to any hint of oversight with stuff like this. And yet, the cockups are always hilariously bad. It's not "oh, we found a 34-step buffer overflow that happens once every century, it's "we pushed an untested update to eight million computers lol oops". If folks are afraid that we can't prevent THAT, then please tell me what software they've worked on so I can never use it ever.


An Airbus A380 comprises about 4 million parts yet can be certified and operated within a safety margin.

Not that I think lines of code are equivalent to airplane parts, but we have to quantify complexity some way and you decided to use lines of code in your comment so I’m just continuing with that.

The reality is that we’re still just super early in the engineering discipline of software development. That shows up in poor abstractions (e.g. what is the correct way to measure software complexity), and it shows up in unwillingness of developers to submit themselves to standard abstractions and repeatable processes.

Everyone wants to write their own custom code at whatever level in the stack they think appropriate. This is equivalent to the days when every bridge or machine was hand-made with custom fasteners and locally sourced variable materials. Bridges and machines were less reliable back then too.

Every reliably engineered thing we can think of—bridges, airplanes, buildings, etc.—went through long periods of time when anyone could and would just slap one together in whatever innovative, fast, cheap way they wanted to try. Reliability was low, but so was accountability, and it was fast and fun. Software is largely still in that stage globally. I bet it won’t be like that forever though.


It seems to me if something is not safe and we can't make it reasonably safe, we shouldn't use it.


This is all true. But we _do_ have known best practices that reduce the impact of bugs.

A most trivial staged rollout would have caught this issue. And we're not talking about multi-week testing, even a few hours of testing would have been fine. Failure to do that rises to the level of gross negligence.


True but they are under time pressure to add definitions for emerging vulnerabilities.


Doctors, engineers, and lawyers aren't infinitely accountable to their equivalent of bugs. Structures still fail, patients die, and lawyers lose cases despite the reality of the crime.

But they're liable when they fuck up beyond what their industry decides is acceptable. If Crowdstrike really wasn't testing the final build of their configuration files at all, then yeah -- that's obviously negligent given the potential impact and lack of customer ability to do staged rollouts. But if a software company has a bug that wasn't caught because they can't solve the halting problem, then no professional review board should fault the license holder.

> we first have to find a way to separate innocent run-of-the-mill mistakes from gross negligence - and that's going to be extremely hard to formalize.

I think we just (oh god -- no sentence with a just is actually that easy) need to actually look at other professional licenses to learn how their processes work. Because they've managed to incorporate humans analyzing situations where you can't have perfect information into a real process.

But I don't think any of this will happen while software is still making absolute shit loads of money.


This entire comment boils down to "we can't be held accountable because it's soooo hard you guys", which isn't even convincing to me as someone in the industry and certainly won't be to someone outside it.


What a shallow dismissal of a comment that doesn’t even claim that there shouldn’t be accountability.


His dismissal is absolutely right though. Programmers have gotten way too used to waving their hands at the pubic and saying "gosh I know it's hard to understand but this stuff is so hard". Well no, sorry, there's not a single <= in place of a < that couldn't have been caught in a unit test.


You're right, in the case that it was known to be a problem. There are lots of places where the "<= or <" decision can be made, some long before some guy opens a text editor; in those cases, the unit test might not catch anything because the spec is wrong!

A major difference between software development and engineering is that the requirements must be validated and accepted by the PE as part of the engineering process, and there are legal and cultural rails that exist to make that evaluation protected, and as part of that protection more independent--which I think everyone acknowledges is an imperfect independence, but it's a lot further along than software.

To fairly impute liability to a software professional, that software professional needs to be protected from safety-conscious but profit-harmful decisions. This points to some mixture of legislation (and international legislation at that), along with collective bargaining and unionization. Which are both fine approaches by me, but they also seem to cause a lot of agita from a lot of the same folks who want more software liability.


> in those cases, the unit test might not catch anything because the spec is wrong!

That's why you have three different, independent parties design everything important thrice, and compare the results. I'm serious. If you're not convinced this is necessary, just take a look at https://ghostwriteattack.com/riscvuzz.pdf.

(Your other suggestions are also necessary, and I don't think that would be sufficient.)


I think that's a great idea, and when I've been in a leadership role I've at least tried to have important things done at least twice. ;)

And you're right, I was pretty much just outlining what might be called "a good start".


> This entire comment boils down to "we can't be held accountable because it's soooo hard you guys", which isn't even convincing to me as someone in the industry and certainly won't be to someone outside it.

When that cargo ship hit the bridge in Baltimore and people were calling for bridges to be designed to take that kind of hit, I heard a lot of "that's sooo impossible you guys" from 'real' engineers. Because it apparently is.

We can do (almost) anything, but we can't always do it for amounts people are willing to pay, where 'we' is everybody and 'willing to pay' means if you charge me what it would take to make it safe or secure, I'll redneck engineer it with none of that built in at all. People are not going to stop finding affordable ways to cross rivers or use web servers just because hard stuff is expensive.


If it's too hard for everyone to do, then yeah, it's too hard.

At the end of the day, what matters is if you can, y'know, do the thing. And people just can't.

> which isn't even convincing to me as someone in the industry

Then you're confident that you can write bulletproof software? Prove it. Thankfully, as an industry we're pretty good at compromising software even if we can't write uncompromisable software.

Since we're talking about serious liability, how about put up a multi million dollar bounty for any single bug found in a non-trivial program that you write?


> Contrast that to programming. A single "<" instead of "<=" could be the difference between totally fine and billions of dollars of damages.

Disagree. This is true purely at the coding level, yes. Anyone could make a typo.

If you're running a company that releases software with the risk exposure of crowdstrike, you better not have a release model where that typo goes straight to production. There need to be many layers of different kinds of testing. If carefully built, now there are many layers all of which have to fail for the bug to go live. You can bring down the failure probability down to negligible levels with enough layers of validation.

> find a way to separate innocent run-of-the-mill mistakes from gross negligence - and that's going to be extremely hard to formalize.

I don't think it's that hard. Not saying it is trivial, but it is well within the capability of the industry if we just focused a little bit on quality instead of 100% in profit.

Standardize models and layers of testing coverage. If you implement them all then you're not being negligent and thus should not be liable. If you decide to skip them, liable.


> Nothing is ever designed to its absolute limit, and everything is built with a healthy safety margin. You calculate a bridge to carry bumper-to-bumper freight traffic, during a hurricane, when an earthquake hits - and then add 20%. Not entirely sure about whether a beam can handle it? Just size it up! Suddenly it's a lot less critical for your calculations to be exactly accurate

That may have been true a couple hundred years ago. It's not been true for a couple decades now, because budget became a constraint even more important than physics, and believe it or not, you will have to justify every dollar that goes into your safety margin. That's where the accuracy of modern techniques matter: the more accurate your calculations (and the more consistent inputs and processes builders employ), the less material you can use to get even closer to the designed safety margin. Accidentally making a bridge too safe means setting money on fire, and we can't have that.

That's the curse of progress. Better tools and techniques should allow to get more value - efficiency, safety, utility - for the same effort. Unfortunately, economic pressure makes companies opt for getting same or less[0] value for less effort. Civil engineering suffers from this just as much as software engineering does.

--

[0] - Eventually asymptotically approaching the minimum legal quality standard.


> Accidentally making a bridge too safe means setting money on fire, and we can't have that.

There's a quote I've seen various versions of: anyone can build a bridge that is safe. It takes an engineer to build a bridge that is just barely safe.


> Contrast that to programming. A single "<" instead of "<=" could be the difference between totally fine and billions of dollars of damages.

I fail to see the difference between a misplaced operator and a misplaced bolt (think Hyatt walkway collapse), both of which could have catastrophic consequences. Do you think the CAD software they use to perform the calculations is allowed have bugs simply because it's software?

Maybe go back to entering code on punch cards if you're so fixated on the physical domain being the problem.


There's a reason we talk about the Hyatt walkway collapse but not the misplaced operator.


It could happen. People have been predicting it for years, and many think that it is only a matter of time. For a vision from 1982 of how it could happen, see: <https://books.google.com/books?id=6f8VqnZaPQwC&pg=PA167>

Consider the following scenario. We are living in 1997, and the world of office automation has finally arrived. Powerful computers that would have filled a room in 1980 now fit neatly in the bottom of drawer of every executive’s desk, which is nothing more than heavy glass plate covering an array of keyboards, screens, and color displays.

— The Network Revolution: Confessions of a Computer Scientist; Jacques Vallee, 1982


I like the analogy. What would the equivalent of « adding safety margins » to a piece of critical code ? Building three of them with different technologies and making sure all return the same results ?


did they take basic precautions like staged releases, code reviews, integration tests?

if not, then it's literally the engineer equivalent of gross negligence and they do deserve to be sued to oblivion.


Do people actually believe when a company says something caused billions of dollars of damage? unless you can quantify that, much like law enforcement and articulate suspicion, it's pretty useless as a metric. If you can pull something out of your ass, what does it matter?


Delta threatened to sue them for their $500M loss. Crowdstrike replied (publicaly) pointing out that their contract limits Crowdstrike's liability to single digit millions.

Then then gave them a list of things they would seek in discovery, such as their backup plans, failover plans, testing schedules and results, when their last backup recover exercise was, etc.

Basically, they said, "if you sue us, we will dig so deep into your IT practices that it will be more embarrassing for you than us and show that you were at fault".


It really seems funny that Crowdstrike’s defense is basically “you should have been better prepared for us to knock all of your systems offline.”

It’s probably true, but seems like an odd stance to take from a PR perspective or a “selling other clients in the future” perspective.


In the case of Delta, their outage was much longer than everyone else because they refused help from both Crowdstrike and Microsoft. So their defense is basically "the damages could have been mitigated if you'd listened to us".


> they refused help from both Crowdstrike and Microsoft

Link?

Anyway I find it highly amusing that Delta is seeking damages from Microsoft even though Microsoft had nothing to do with it.


There are many articles about them refusing help, but here is one:

https://www.theverge.com/2024/8/6/24214371/microsoft-delta-l...


Delta's position is the Microsoft actively recommended and coordinated with CrowdStrike to the extent that they are co-responsible for outcomes. In a large enterprise like Delta, the vendors do work together in deployment and support. Yes, there's often a great deal of finger-pointing between vendors when something like this happens, but in general vendors so intimately linked have each other on speed-dial. It would not shock me to learn that Delta has email or chat threads involving CrowdStrike, Microsoft, and Delta employees working together during rollouts and upgrades, prior to this event.

As far as refusing help, why is that funny? If someone does something stupid and knocks you down, it's perfectly reasonable to distrust the help they offer, especially if that help requires giving them even more trust than what they've already burned.


Changing vendors and choosing one that's more reliable is a perfectly sensible outcome of this situation once your system are back up and you're no longer hemorrhaging money.

During an ongoing incident, when all of your operations are down, is not the time for it though. If you think there's even a 1% chance that the help can help, you should probably take it and fix your immediate problem. You can re-evaluate your decisions and vendor choices after that.


> If someone does something stupid and knocks you down, it's perfectly reasonable to distrust the help they offer, especially if that help requires giving them even more trust than what they've already burned.

Yeah it smacks of Experian offering you a year of "free identity theft protection" after having lost your personal data in a breach.


That's kind of typical of how much companies have been allowed to externalize costs. It's never about how the company at fault should have done better, rather it typically boils down to some variant of "the free markets provided you with a choice about who you trust and it was up to you to collect and evaluate all the information available to make your choices".


That’s kinda what aws tells people when its services go down. If your backend can’t take a short outage without weeks of recovery then it’s just a matter of time.


Delta threatened to sue them for their $500M loss. Crowdstrike replied (publicaly) pointing out that their contract limits Crowdstrike's liability to single digit millions.

Delta's move seems like an attempt to assuage shareholders and help the C.E.O. save face.

Crowdstrike shouldn't be afraid of Delta. Crowdstrike should be afraid of the insurance companies that have to pay out to those businesses that have coverage that includes events like this.

Even if the payout to a company is $10,000, a big insurance company may have hundreds or thousands of similar payouts to make. The insurance companies won't just let that go; and they know exactly what to look for, how to find it, and have the people, lawyers, and time to make it happen.

Crowdstrike will get its day of reckoning. It won't be today. And it probably won't be public. But the insurance companies will make sure it comes, and it's going to hurt.


> the insurance companies will make sure it comes, and it's going to hurt

It could be as simple as a reinsurer refusing to renew coverage if a company uses CrowdStrike.


Which would be funny, since many companies are putting up with Crowdstrike to make insurers happy.


Availability (or not) of insurance coverage is surprisingly effective in enabling or disabling various commercial ventures.

The penny dropped for me whilst reading James Burke's Connections on the exceedingly-delayed introduction of the lateen-rigged sail to Europe, largely on the basis that the syndicates which underwrote (and insured) shipping voyages wouldn't provide financing and coverage to ships so rigged.

Far more recently we have notions of redlining for both mortgage lending and insurance coverage (title, mortgage, property, casualty) in inner-city housing and retail markets. Co-inventor of packet-based switching writes of his parents' experience with this in Philadelphia:

"On the Future Computer Era: Modification of the American Character and the Role of the Engineer, or, A Little Caution in the Haste to Number" (1968)

<https://www.rand.org/pubs/papers/P3780.html> (footnote, p. 6).

Similarly, government insurance or guarantees (Medicare, SSI, flood insurance, nuclear power plants) has made high-risk prospects possible, or enabled effective services and markets, where laissez-faire approaches would break down.

I propose that similar approaches to issues such as privacy violation might be worth investigating. E.g., voiding any insurance policy over damages caused through the harmful use or unintended disclosure of private information. Much of the current surveillance-capitalism sector would instantly become toxic. The principle current barriers to this are that states themselves benefit through such surveillance, and of course the current industry is highly effective at lobbying for its continuance.


That’s interesting because on the TV episode, it states that insurers wanted the risk of piracy spread out over many smaller ships that would be lateen rigged. I have one of the Connections books, so I’ll check to see if this is covered in it https://youtu.be/1NqRbBvujHY?si=WfysDHPLhSJkGhzd


Interesting discrepancy, yes. I'm pretty sure of my recollection of the book.

It may be that the opportunity to diversify risk (over more smaller ships) overcame the reluctance to adopt new, untested and/or foreign technology.


It doesn’t explicitly say insurers but it’s a pretty small logical leap from the wording (the timeframe is also c. 11th-12th century so could be before formal insurers)


Right.

The books and video scripts also differ amongst Burke's various series. I'll see if I can find a copy of the text to compare.


> they said, "if you sue us, we will dig so deep into your IT practices that it will be more embarrassing for you than us and show that you were at fault"

But CrowdStrike said this publicly. If they’d privately relayed it to Delta, it would have been genuine. By performatively relaying it, however, it seems they’re pre-managing optics around the expected suit.


It’s an argument that hits home at any bigcorp where the execs are entertaining the thought of suing CrowdStrike. Making it public once is a lot more effective than relaying it privately a hundred times. I expect most liability to come from abroad, where parts of the contract might be annulled because not in line with local law. But still I don’t expect it. CrowdStrike delivered the service they promised. The rest is on the customers IT. Hand over the keys and your car may be driven.


> It’s an argument that hits home at any bigcorp where the execs are entertaining the thought of suing CrowdStrike

Maybe? Discovery is a core element of any lawsuit. It’s also a protected process: you can’t troll through confidential stuff with an intent to make it public to damage the litigant.

If anything, I could see Delta pointing to this statement to restrict what CrowdStrike accesses and how [1]. (As well as with the judge when debating what gets redacted or sealed.)

[1] https://www.fjc.gov/sites/default/files/2012/ConfidentialDis...


Thank you. Nice read. Even given a protective order to keep discovery confidential, the ensuing discussion about the clients lacking IT-policies that exacerbated this crisis is public.

Most entertaining would be the discussion where CrowdStrike would argue that based on common IT-risk criteria, you should never hand over the keys to an unaudited party not practicing common IT-risk best practices and (thus) the liability is on the organization. Talk about CrowdStrike managing risks worldwide. They are doing it right now!


Or attempting to discourage it from becoming a pile-on.


It doesn't matter it was 100% crowdstrikes fault. Surprised its still worth 60billion dollars.


Part of the problem is assuming you can pay a contract to shift your liability completely away.


Right, the risk structure presumably protects the vendor if just one customer sues, even if the amount of damages claimed is astronomical. Because vendors try to disclaim bet-the-company liability on a single contract.[1] The vendor's game is to make sure the rest of the customer base does not follow this example, because as noted in the linked article while vendors don't accept bet-the-company liability on each contract (or try not to), they do normally have some significant exposure measured in multiples of annual spend.

[1] https://www.gs2law.com/blog/current-trends-in-liability-limi...


The assumption is not only perfectly valid, it's the very reason such contracts are signed in the first place! It's what companies want to buy, and it's what IT security companies exist to sell.


Yes, I know that's what everyone wants/thinks, but you actually can't do it. Because at the end of the day, you chose the vendor. So you are still liable for all of it.


Well if MSFT knew how to write MSAs Crowdstrike would have become property of Microsoft.


Yes and no.

Crowdstrike was the executioner of this epic fail for sure but their archaic infra practices made it even worse. Both Crowdstrike and Microsoft CEOs reached out only to be rebuffed by Delta's own. If I was the CEO - I'd accept any help I can get while you have the benefit of the public opinion.

/tin-foil-hat-on Flat out refusal for help makes me think there are other skeletons in the closet that makes Delta look even worse /tin-foil-hat-off


> I was the CEO - I'd accept any help I can get while you have the benefit of the public opinion

I’d reserve judgement. Delta may have been cautious about giving the arsonists a wider remit.


In this case, the fire was an accident, and the arsonists happen to be the expert firefighters, and they're very motivated to fix their mistake. They're still the experts in all stuff fire, whereas Delta is not.


Using your analogy - if MS/CS are the arsonists, then Delta are the landlords unsafely storing ammonium nitrate in their own warehouse.

Their lack of response to MS/CS isn't coming from a place of reducing potential additional problems but trying to shield their own inadequacies while a potential lawsuit is brewing in the background.

https://www.reuters.com/technology/microsoft-blames-delta-it...


It doesn't seem like arsonist is the right word. It implies it was intentional, which as far as I can tell there is no proof of.

I think the more accurate description would be some firefighters were doing a controlled burn. The burn got out of controlled and then you say that you don't want the firefighters help in put out the fire.


If you held the view that CrowdStrike and Microsoft were inherently to blame for the problem why would you trust them to meaningfully help? At best they're only capable of getting you right back into the same position that left you vulnerable to begin with.


Same reason why an aircraft manufacturing company would get involved in a NTSB investigation when there is an airplane crash. Just because they messed up one or more things (i.e. MCAS on MAX) doesn't mean they can't provide expertise or additional resources to at least help with the problem.

Your take also casually disregards the fact that Delta took an extraordinary time to recover from the problem when the other companies recovered (albeit slowly). This is the point that I'm getting at. It isn't that CS and MS aren't culpable for the outage; it's that DAL also contributed to the problem by not adequately investing in its infra.


> Same reason why an aircraft manufacturing company would get involved in a NTSB investigation when there is an airplane crash

Key difference here is that the NTSB is third party with force of law behind it. The victims in the crash – airlines and passengers – aren't rushing to the aircraft manufacturer to come fix things. Quite the opposite: the NTSB and FAA have the authority to quarantine a crash site and ensure nobody tampers with the evidence. Possible tampering with black boxes was an issue in the investigation of Air France Flight 296Q.


Being to blame is different than being actively trying to sabotage you. Many companies will be re-evaluating their relationship after this problem happened, but doing that while your systems aren't functional seems counter-productive.


Seems fair. Delta didn't privately relay their intentions.


Weirdly, we live in a society


That’s not the way legal process works. CrowdStrike might be permitted to conduct discovery, but that won’t entitle them to share what they might find with the public, embarrassing or otherwise. Business records and other sensitive information relating to parties in civil matters are frequently sealed.


I’m not sure anything else was material given that the machines were bricked and client roll-out approaches were evaded by Crowdstrike. What client actions would have helped?

Surely someone is looking at a class action? People died. The contract can’t make that everyone else’s problem, can it?


Sure it can. If every rock climbing company in the country decides that climbing ropes are too expensive and instead decide to by rope from the local hardware store, and that rope has a warning reading "not for use when life or valuable property is at risk", then it is 100% on those climbing companies when people die, because they were using a product in a situation that it was simply not suitable for.

The details, of course, depend on the contract and claims that Crowdstrike made. But, in the abstract, you are not responsible for making your product suitable for any use that anyone decides to use it for.

If a hospital wants to install software on their life critical infastructure, they are supposed to buy software that is suitable for life critical infastructure.


If someone's life depends on a networked Windows (or any similar OS) machine you chose to run for that purpose, you are the criminal.


Indeed. But this is how hospitals run.


I'd LOVE to see Crowdstrike do this. The last time I dealt with the specifics of this sort of validation testing for security software was a decade and from what I saw in the RCA Delta can just keep pointing out that whatever they had worked until Crowdstrike failed to understand that the number 20 and the number 21 are not the same:

The new IPC Template Type defined 21 input parameter fields, but the integration code that invoked the Content Interpreter with Channel File 291’s Template Instances supplied only 20 input values to match against. This parameter count mismatch evaded multiple layers of build validation and testing, as it was not discovered during the sensor release testing process, the Template Type (using a test Template Instance) stress testing or the first several successful deployments of IPC Template Instances in the field.

This combined with the lack of partitioning updates, makes me draw the conclusions they're missing table stakes WRT to validation.


Wtf how do you not check for ‘quantity of arguments’ in QA testing?


They should be providing all that information regularly to auditors anyway. If they don’t have it handy, then their IT leadership should be replaced.


That’s odd. One is an internal process which has no obligation to an external party, and the other one who is specifically responsible for being liable for any repercussions due to deviating from their own SDLC process[1]they totally skipped themselves?

If I were Delta, I’d get other affected parties and together sue CrowdStrike and get all their dirty laundry out in the open.

[1] I haven’t checked but they used to list all their ISO certs, etc. Wonder if those get revoked for such glaring violations…


Civil suits focus in a large way on determining how much damage is each party’s fault. So Crowdstrike would be saying “Of this $500M in damages, x% was from your own shitty practices not from our mistake”. Thats why it’s all pertinent.


Correct. The legal term is “contributory negligence.”


> One is an internal process which has no obligation to an external party

Delta has obligations to their passengers and similarly sidesteps screw ups with similar contractual provisions. How much would Delta owe for not following similar IT practices? Do they now owe customers for their IT failings? Should customers now get to sue Delta for damages related to their poor IT recovery compared to other airlines?


Sure but that’d be something passengers could bring up in a suit against Delta, not someone like CS, who themselves obviously skipped their own internal SDLC and whatever other ISO certs they prominently advertised on their website.


Crowdstrike's discovery process would greatly aid in passenger or general-public suits against Delta.


I assume the argument is that if they can show negligence in their IT practices, then the $500 million in damages can't be all attributed to CrowdStrike's failure.


They might find out delta does embarrassing things like not testing out of bounds array access or does global deployments without canarying.


There is recourse, just not for normal people, as you eluded to. Companies are and will be continuing to sue crowdstrike, and based on the papers that crowdstrike has posted, the impacted companies are extremely likely to be successful. It seems overwhelmingly likely that the companies are going to be able to convince a judge/jury/arbiter that crowdstrike acted grossly negligent and very plainly caused both direct losses and indirect reputational harm to the companies.

I’m not sure crowdstrike will even fight it, to be honest. I would assume most of this is going to be settled out of court and we will see crowdstrike crumble in the coming years.



TIL— thanks! Now it’s time to painfully go through my slack / email history and see how many times I made this mistake :)


> not sure crowdstrike will even fight it

To my knowledge only Delta is suing and CrowdStrike is kicking and screaming about it [1].

[1] https://www.cnn.com/2024/08/05/business/crowdstrike-fires-ba...


It’s a really bad look for crowdstrike to be going down this route. Then again, I don’t think many companies are going to be adopting crowdstrike in the coming years, so I suppose their only option is to defend their stock value at any cost while the company recoils


A lot of companies have insurance on events causing them to lose sources of income. Whether that's farmers having crop insurance, big box retailers having insurance for catastrophic damage to their big box, I would assume there's something for infrastructure collapse to bring sales to $0 for the duration.

Even if everyone that was affected sued ClownStrike for 100% of their losses, it's not like ClownStrike has the revenue to cover those losses. So even if you're a fan of shutting them down, nobody recovers anything close to actual losses.

So what would you actually propose? Bug free code is pretty much impossible. Some risk is accepted by the user. Do you seriously think that software should be absolutely 100% bug free before being able to be used? How do you prove that? Of course, the follow up would be how clean is your code that you feel that's even achievable?


>Bug free code is pretty much impossible. Some risk is accepted by the user.

This wasn't your average SW bug, it was gross negligence on behalf of Crowdstreike, who seems to not have heard of SW testing on actual systems and canary deployment. Big difference.

Yeah SW bugs happen all the time but you have to show you took some steps to prevent them, while some dev at Crowdstrike just said "whatever, it works on my machine" and directly pushed to all customer production systems on a Friday. That's the definition of gross negligence that they didn't have any processes in place to prevent something like this.

That's like a surgeon not bothering to sterilize his hands and then saying "oh well, hospital infections happen all the time".


> That's like a surgeon not bothering to sterilize his hands and then saying "oh well, hospital infections happen all the time".

And hospitals and doctors have malpractice insurance. They also go through an investigation where they have their own brotherhood where it is difficult to get other doctors to testify against. There's also stories of people writing on their good leg "The other leg" in Sharpie because such moronic mistakes of removing left appendage instead of right. So even doctors are not above negligence. We just have things in place for when they do. Why you think ClownStrike is above that is bewildering.

At the end of the day, mistakes happen. It's not like they have denied they were at fault. So I'm really not sure what you're actually wanting.


>It's not like they have denied they were at fault. So I'm really not sure what you're actually wanting.

Paying for their mistake. In money. Admitting for their mistake is one thing, paying for it is another.

If your doctor made a mistake due to his negligence that costs you, wouldn't you want compensation instead of just a hollow apology?


Want vs receive are two entirely different things. If someone did something against me in malice, damn straight I want ________. If someone makes a mistake, owns up to it, changes in ways to not make same mistake again, then that's exactly the opportunity I'd hope someone would allow for me to have if the roles were reversed. This particular company's mistake just happened to be so widespread due to their popularity makes it seemingly egregious, but there are other outages that have occurred that lasted longer and did not draw this much attention. Was it an inconvenience, yes. Was it a silly mistake in hindsight, yes. Was it fixable, yes. Was it malevolent, nope. Should you lose your job for making this mistake?


The bug was egregious.

Using regexp (edit: in the kernel). (Wtf. It's a bloody language.) And not sanitizing the usage. Then using it differently than testing. And boom.

There's people, and there's companies.

This company ought to be nuked.


Genuinely, what good does that do?

It’s all well and good to write dramatic meaningless comments on social networks like Hacker News, but if your desired had actual consequence, can you honestly say that “nuking the company” is a net positive?


Is keeping CrowdStrike around a net positive?


> can you honestly say that “nuking the company” is a net positive?

Yes of course it would be positive. On the short term, remove one incompetent high-risk company from the industry.

But more importantly long term, it would do a lot to encourage quality in the industry if it was known that such an outcome is possible.


Well, in the America we've got something called corporate personhood and it's this odd concept. It seems like an unfair concept to I don't know to me as a citizen of America.

And you know laws are supposed to keep feeling like you're living in a fair world right?

So, nuke the company That cause billions Of dollars in losses, millions of hours of wasted human time, potentially loss of life though we haven't you know had a study yet that identifies those people who lost their lives because of disruption to healthcare services, heart attacks that were due to stress, etc etc. Nuke them. Nuke that corporate person. Force the humans who comprise that corporation to rebuild it as a better corporation.


You should look up Arthur Anderson


Bug-free code is impossible. Stupid, negligent bug-free code, however, is very much doable. You just can't hire anyone who happens to be able to fog a mirror to write it.


If you think this was written by a moron vs a break down in procedures, then I'd think you'd be one that barely fogs a mirror. This is no different the multiple times that AWS us-east-1 has gone down and taken down a large portion of the internet when they've pushed changes. Do you think AWS is hiring moronic mirror foggers causing havoc or just examples of how even within a bureaucratic structure within AWS it is still possible to side step best laid plans?


> Is there any path for software engineers to reach this level of accountability and norms of good practice?

Yes, time. Civil engineering has thousands of years of history. Software engineering is much newer, the foundations of our craft are still in flux. There have been, at least in my country, legislative proposals for licensure of system analysts, electronic computer programmers, data processing machine operators, and typists(!) since the late 1970s; these laws, if approved, would have set back the progress of software development in my country for several decades (for instance, one proposal would make "manipulation and operation of electronic processing devices or machines, including terminals (digital or visual)" exclusive to those licensed as "data processing machine operator").


> set back the progress

> exclusive to those licensed

Sounds to me like it just would've made a lot of money for whatever entities give out the licenses.

On the other hand, I've read speculation on here that some countries are short on entrepreneurs entirely due to the difficulty of incorporating a small business, so maybe.


Civil engineering mostly requires you to have a government-verified certificate and to work in the country your infrastructure will be deployed in.

Software engineering doesn't, and that makes criminal prosecutions that much harder. There's no path to making it happen.

Financial liability for the company in question? Sure, that's probably doable. "Piercing the corporate veil" and punishing the executives who signed off on it? Harder but not impossible. Punishing the engineer who wrote that code, and who lives in a country with no such laws? Won't happen.


> Civil engineering mostly requires you to have a government-verified certificate and to work in the country your infrastructure will be deployed in.

It's a relatively small (and sharply defined) pool of people who can be called a civil engineer.

Are we saying we want to segment software engineering (from coding) - the same way civil engineering is segmented from construction?

Otherwise we're talking about placing specialist liability upon a non-specialist group. This seems unethical.


> If a bridge collapses, there’s not only financial liability but the potential for criminal liability as well

If a bridge collapses people die. To my knowledge, nobody died or was put in mortal peril as a result of the Crowdstrike debacle.


The deaths if any where probably indirect. E.g. ambulances not turning up in time etc. due to paper and pen fallbacks.

With all the hospitals victim of the attack, I would be surprised if the amount of patients that died are zero.


> E.g. ambulances not turning up in time etc. due to paper and pen fallbacks

Sure. Did this happen?

Why were the “emergency management downtime procedures” insufficient [1]?

[1] https://www.healthcaredive.com/news/crowdstrike-outage-hits-...


If they were equally good as the non-emergency procedures, why wouldn't we use them all the time?


> why wouldn't we use them all the time?

Because they’re more expensive. They’re all not “equally good,” they’re good enough to keep people alive. (You repurpose resources from elective and billing procedures, et cetera.)


I would expect them to be good enough to prevent "obvious" deaths-from-failed-procedures, but deliver a slightly lower quality of care, so that if out of 100 very seriously ill people 50 survived during normal operation, this would turn into e.g. 49.

All of this without the person obviously dying due to the alternative procedures - just e.g. the doctor saw the patient less often and didn't notice some condition as early as they would have under normal procedures.

Would you consider this assumption to be wrong? (I am a layperson, not familiar with how hospitals work except from being a patient.)


This belies a lack of understanding.

What resources are you repurposing from elective procedures exactly? Your patient load hasn’t changed, and day surgical instruments and supplies are from the same pool. There’s no “well this pile of equipment is only for elective procedures”.

I’m not even sure what “billing procedures you’d repurpose (especially in your context of “keeping people alive”).


> Your patient load hasn’t changed, and day surgical instruments and supplies are from the same pool

The outage didn’t change any of these things either.

> not even sure what “billing procedures you’d repurpose

At Mount Sinai, billing staff were redirected to watch newborn babies. Apparently the electronic doors stopped working during the outage.


> The outage didn’t change any of these things either.

Never said that it did. I just don't think your idea of emergency downtime procedures at a hospital are what they are. There's paper and offline charting, most meds can be retrieved similarly, and so on. I heard a claim (from someone here) that an ER was unable to do CPR due to the outage, which could not be remotely true. Crash carts are available and are specifically set up to not require anything else but a combination. Drugs, IV/IO access, etc.

> At Mount Sinai, billing staff were redirected to watch newborn babies.

That sounds like something I would have imagined security doing. To be clear, what they most likely meant here is in the sense of "avoiding abduction of a newborn", not any kind of access to observe and oversee neonates.


Probably because our incredibly inefficient, burdened, and splintered healthcare system barely functions as is, and they do not have the time nor resources to pause and put in place an emergency downtime operating protocol that works as well as their 15 year old windows cobweb


> because our incredibly inefficient, burdened, and splintered healthcare system barely functions as is, and they do not have the time nor resources to pause and put in place an emergency downtime operating protocol

You just responded to an article about the implementation of emergency downtime protocols by speculating, baselessly, that such protocols cannot possibly exist because your mental model of our healthcare system prohibits it. Ironically, all within the context of why software development doesn’t hold itself to the rigors of engineering.


"In Alaska, both non-emergency and 911 calls went unanswered at multiple dispatch centers for seven hours.

Some personnel were shifted to the centers that were still up and running to help with their increased load of calls, while others switched to analog phone systems, Austin McDaniel, state public safety department spokesperson, told USA TODAY in an email. McDaniel said they had a plan in place, but the situation was "certainly unique.”

Agencies in at least seven states reported temporary outages, including the St. Louis County Sheriff's Office, the Faribault Police Department in Minnesota, and 911 systems in New Hampshire, Fulton County, Indiana, and Middletown, Ohio. Reports of 911 outages across the country peaked at more than 100 on Friday just before 3 a.m., according to Downdetector.

In Noble County, Indiana, about 30 miles northwest of Fort Wayne, 911 dispatchers were forced to jot down notes by hand when the system went down in the early morning hours, according to Gabe Creech, the county's emergency management director."

https://eu.usatoday.com/story/news/nation/2024/07/19/crowdst...

I mean, even if the dispatch could handle it in some sense, certainly it was a problem, that might have increased average time to site for the ambulance or fire fighters. I've haven't seen any report of any direct death.


> I've haven't seen any report of any direct death

Exactly. Contrast that with a bridge collapse. It isn’t a mystery or statistical exercise to deduce who died and why.


There were numerous bridge collapses without casualties. Naturally if one company could suddenly collapse 80% of Earth's bridges, direct deaths would be assured. It's great there isn't one for some reason!


> were numerous bridge collapses without casualties

In how many of those cases were criminal charges brought? (It’s not zero. But it’s more limited.)


Because energencx downtime is not supossed to be local and global. Dont worry your startup will not eat those riscs, but neither will those customers stay once insurrance rewrites the guidlines. All that can happen,has already happened, its just consequences propagating now. Nothing we can do with simple blameshifting tactics.


I have argued for years every business should have an analogue operations guide tested every once in a while like a fire drill down to pre-printed carbon copypaper forms. A Lights Out Phones Off Business Continuity Plan would have helped American Airlines too.


Hospitals were affected too, I don't think it's that far fetched to think some people died, or at least some could not have been saved due to this incident.


> Hospitals were affected too, I don't think it's that far fetched to think some people died

Absent evidence I’d say it is.

Hospitals have emergency downtime procedures [1]. From what I can tell, the outage was stressful, not deadly.

[1] https://www.npr.org/2024/07/21/nx-s1-5046700/the-crowdstrike...


Apply additional stress to a sufficiently large system that human lives depend on, and someone, somewhere will die.


> Apply additional stress to a sufficiently large system that human lives depend on, and someone, somewhere will die

Sure. Who did?

When a bridge collapses, this isn’t a tough problem. We don’t need to reason from first principles to derive the dead bodies. That’s the difference.


Hospitals and doctor’s offices were paralyzed by the outage. Transplant organs are often delivered by couriers on commercial flights. Many pharmacies were unable to fulfill prescriptions.

It wasn’t just vacation travelers that were affected by Crowdstrike’s incompetence.


I am positive that people in hospitals died as a direct result of this incident.


> I am positive that people in hospitals died as a direct result of this incident.

I'm less positive than you, just because my experience of healthcare infosec is that all a doctor has to do is say "I cannot be slowed down or prevented from doing x or people will die" and that's the end of any process or technical controls on x.

Same with utilities. I've seen the ICS engineers say "No you cannot put a password on this console because I may need instant access to prevent a blackout / explosion" and that pretty much ends the discussion.

Often that's not even wrong. Of course when there is a security incident there'll be a kneejerk reaction to that, and of course that's why ransomware groups love healthcare, but in the meantime, those risks seem reasonable.

Which means I'm guessing Crowdstrike killed a lot of healthcare billing but not a lot of critical care systems because it got ripped off those 30 seconds after install if it was ever installed at all.


> I am positive that people in hospitals died as a direct result of this incident

Do you have clinical or hospital administration experience? A source with evidence, even circumstantial?


Yes


> Do you have clinical or hospital administration experience? A source…

>> Yes

You managed a hospital and failed to implement emergency downtime procedures? (Because that is actually criminal.) Or do you have a source?


Apropos of anything else, “emergency downtime procedures” do not guarantee the same level of care as normal operations. I’ve worked in and out of hospitals as a critical care paramedic for years.


> “emergency downtime procedures” do not guarantee the same level of care as normal operations

Agreed. It’s also plausible someone had a heart attack due to the stress of flight cancellations. Do we have any evidence of either?

The difference between a bridge collapsing and everything we’re discussing is there isn’t much of a discussion around who died and why.


Deft goalpost shifting, nice.


Are you the orangutan doctor from futurama?


The commenter said they did not believe hospitals “have the time nor resources to pause and put in place an emergency downtime operating protocol” [1]. That is a reasonable guess. It’s not something one would expect from someone with “clinical or hospital administration experience.”

It’s a glib response, but so is “yes” to a request for attribution.

[1] https://news.ycombinator.com/item?id=41217683


Small reminder that the law already has a way of deciding liability for damages, and you don't have to directly drop a bridge on someone to get in trouble.


I completely agree. When I've negotiated contracts for my workplace, and we explicitly write in the contract that the vendor is responsible for XYZ, it is my understanding (and confirmed by legal, multiple times) that this means in case of XYZ going wrong, they are liable for up to the amount in the SLA, however that isn't a cap on liability in extenuating circumstances.

If this all gets brushed away, it significantly devalues the "well we pay $VENDOR to manage our user data, it's on them if they store it incorrectly" proposition, which would absolutely cause us to renegotiate.


You aren’t showing us the specific language that you’re referring to, nor do we know what a typical CrowdStrike contract looks like. You could be talking about apples and oranges here. I’ve seen both.


I was pretty sure that someone was going to "ackshually" me here, and here we are. The specific wording doesn't matter.

I've negotioated dozens of these contracts and the value add of a vendor managing the data is liability. If they aren't liable for data mis-management, then their managed service is only worth the infra costs + a haircut on top, and we'll renegotiate with this in mind.


> Is there any path for software engineers to reach this level of accountability and norms of good practice?

There is no reason that software couldn't be treated with the same care and respect. The only reason we don't is because the industry resists that sort of change. They want to move fast and break things while still calling themselves "engineers." Almost none of this resembles engineering.


I’m a software engineer, with a degree, and SWE does have the same ethical principles and the same engineering process, from problem definition and requirements a the way to development lifecycle, testing, deployment indigent management, etc. none of it includes sprints and story points.

Suffice it to say most SWEs are not being hired to do actual engineering, bc the industry can’t get over the fact that just because you can update and release SW instantly doesn’t mean you should.


> SWE does have the same ethical principles and the same engineering process

The lack of certification means this training isn’t reinforced to the degree it is in engineering.


Right. If the coding industry mimics the construction industry, we wind up with one position called engineer that assumes most of the liability.

The other 99.99....% of software engineers will get different titles.

All of this ignores the individuals who are most responsible for these catastrophes.

Investors and executives deliver relentless and effective pressure toward practices that maximize their profits - at the expense of all else.

They purposefully create + nurture a single point of failure and are massively rewarded for the harm that causes (while the consequences are suffered by everyone else). Thanks to the pass they reliably get, we get their leadership design degrading every industry it can.


> If the coding industry mimics the construction industry, we wind up with one position called engineer that assumes most of the liability

If their sign off is required, this could work. The question is whether it’s worth it, and if it is, in which contexts.


> If their sign off is required, this could work. The question is whether it’s worth it, and if it is, in which contexts.

Civil engineers liability is tied to standards set by gov agencies/depts and industry consortium.

Standards would have to be created in software engineering - along with the associated gov & industry bodies. In civil engineering, those things grew during/from many decades of need.


To be fair, software and technology is so magically transformative that even with warranty disclaimers like “this software comes with no warranty, and we limit our liability to the purchase price”, every company in the world still lines up to buy it. Because for them it’s effectively magic, that they cannot replicate themselves.

No individual software developer, nor corporation, is foolish enough to claim their software is free of bugs, that’s why they put the risk on the customer and the customer still signs on the dotted line and accepts the risks anyway. After all, it’s still way more profitable to have the potentially-faulty software than needing an army of clerks with pen and paper instead.

Most software has to be this way or it would be exorbitantly expensive. That’s the bargain between software developers and the companies that buy the software. Customer accepts the risks and gets to pocket the large profits that the software brings (because of the software’s low cost), because it’s better than the software developer balking at the liability, no software being written at all, and having an army of staff every airport writing out boarding passes by hand. There are only a few softwares that aren’t this way - example the software in aircraft or nuclear power plants. That software is correspondingly extremely expensive. Most customers that can, choose to accept the risks so they can have the larger profits.


> Software engineering, of course, presents itself as another worthy cause, but that is eyewash: if you carefully read its literature and analyse what its devotees actually do, you will discover that software engineering has accepted as its charter "How to program if you cannot.".

— Edsger Wybe Dijkstra, 1988. (EWD1036)


I'm ok with that. I don't want to keep everyone out except just those who happen to have just the right mind set. Programming is about developing software for people, and the more viewpoints are in the room, the better.

Some pieces are more important than others. Those are the bits that need to be carefully regulated, as if they were bridges. But not everything we build has lives on the line.

If that means we don't get to call ourselves "engineers", I'm good with that. We work with bits, not atoms, and we can develop our own new way of handling that.


> I don't want to keep everyone out except just those who happen to have just the right mind set.

Neither do I. Neither did Dijkstra. EWD1036, “On the cruelty of really teaching computing science”, is about education reform, to enable those who don't "happen to have just the right mind set" to fully participate in actual, effective programming.


I prefer to call it "computer programming." If the title is good enough for Ken Thompson or Don Knuth then it's good enough for me.


> If that means we don't get to call ourselves "engineers", I'm good with that.

I suspect this particular title-exaggeration is fueling this particular fire.

Going forward, I believe we need to be aware that software controlled mechanics grew out of two disparate disciplines; it presently lacks the holistic thinking that long-integrated industries do.


Software (controls) engineers at VW during the emissions scandal went to jail, engineers at GM were held liable for the ignition switch issue (not mostly in software, but still). I expect we'll eventually see some engineers/low level managers thrown under the bus with Boeing. It definitely happens, but not as frequently as it could. That said, I definitely prefer Amazon's response to the AWS East 1 outage back in 2016 -- the engineer wasn't blamed, despite the relatively simple screw up, but the processes/procedures were fixed so that it didn't happen again in the last 8 years. Crowdstrike is a little bit gray on that regard -- people should have known how bad the practice of zero testing on config updates was, but then again, I've seen some investigating saying that the initial description wasn't fully accurate, so I'm waiting for the final community after action report before I really pass judgement.


> I appreciate that we’re finding the humour in this catastrophe but what about the question of liability?

One of the biggest and most used piece of software (the Linux kernel) comes with zero warranties. It can fail, and no one would be liable. Are we fine with that? Is the CS case different because it costs money? From an user perspective we don’t want software failing in the middle of an airplane landing, so whether the software comes from CS or github, it’s of lesser importance.


How many bridges, would you say, does the average civil engineering firm deliver each year, each on only 1 day notice, in response to a surprise change in requirements due to a newly developed adversarial attack?

Crowdstrike does this constantly.

You could demand the same level of assurance from software, but in exchange, you don't get to fly, because the capacity won't be there


I would find it more useful if liability here we're attributed to the need to purchase such draconian tools. Certifications that require it and C levels who approve it. We would be better by it.


Oh Christ. Just drop it. A by all accounts legitimate security function of a product targeted at company-owned endpoints.

Please don’t devolve this conversation into you being upset about not getting admin rights on your work computer or whatever this is about.

Any (esp. larger) org would be criminally negligent to eschew using something like CrowdStrike in order to capitulate to some nerd that thinks that they have ownership over their work equipment.


Bruh, chill.

I don't give two craps about having admin rights on my work computer. Crowdstrike is bad software and bad way to manage large deployments. They just proved it. I just think that we are also responsible for buying a solution that works that way.


On this website you are asking a population that would be responsible for this, so you will likely only get answers about how hard this is to solve and how it’s not software engineers fault and how we need to understand software engineering is not civil engineering and we need to be careful with this analogy and how it’s not our fault! Don’t blame us when things go wrong, but also, give us all the money when things go right.

This is not the place for this question is what I’m saying.


They totally deserved it.

Those who think running third-party closed-source Windows kernel driver(which parse files distributed from Internet in realtime) are good for the security, they must also accept the consequence.

I'm sick of these so-called security consultants who always insist check lists like installing proprietary close-source binary blob Linux kernel module to the system consists of otherwise mostly free softwares except for hardware drivers and think they did their job, or executives who pays a lot of money to these idiotic so-called security consultants.



> Are the licenses so ironclad that customers have no recourse?

Even on Hacker News, there was agreement that CrowdStrike screwed up, but then people also blamed IT staff, Microsoft (even after realizing it was a CrowdStrike issue), and the EU/regulators.

I imagine responsibility of each entity would need far more clarification than it does now.

If you want to define liability, there needs to be a clear line saying who is responsible for what. That doesn’t currently exist in software.

There are also considering how people respond to risk.

Consider how sesame regulation led to most bread having sesame deliberately put into it. Industry responded by guaranteeing contamination.

Crowdstrike and endpoint security firms might respond by saying that only Windows and Mac devices can be secured. Or Microsoft may say that only their solution can provide the requisite security.


I’m interested in what those who suffered outages as a result of crowdstrike told their insurers with respect to “QA’ing production changes”

It’d be interesting to see if anyone tries to claim the outage as some sort of insurance event only to lose out because they let Crowdstrike roll updates into a highly regulated environment without testing


Probably in a decade or so after the AI crash. I have yet to see anything that comes close to “liability” for the digital realm.

US governments and businesses get hacked/infiltrated all the time by foreign adversaries yet we do not declare war. Maybe something happens in the dark or back channels. But we never know.


Engineering safety culture is built on piles of bodies and suffering unfortunately. I suspect in software the price of failure is mostly low enough that this motivation will never develop.


> but so far not much in the way of lawsuits

It hasn't been that long? The situation might be that there hasn't been sufficient time to yet gather evidence to commence lawsuits.


Your barber has more licensing requirements than a senior software engineer.

Regulations have not caught up with developers (yet).


> Is there any path for software engineers to reach this level of accountability and norms

Potentially controversial stance here, but most software engineers are not engineers. They study computer science, which doesn't include coursework on engineering ethics among other things. I would say that by design they are less prepared to make ethical decisions and take conservative approaches.

Imagine if civil engineers had EULAs for their products. "This bridge has no warranty, implied or otherwise. Cross this bridge AT YOUR OWN RISK. This bridge shall not be used for anything safety critical etc."


> Is there any path for software engineers to reach this level of accountability and norms of good practice?

Heck, no.

Civil engineering doesn’t change. Gravity is a constant. Physics are constants. If Rome wrote an engineering manual, it would still be quite valid today.

Imagine if we had standardized software engineering in 2003. Do you think the mandatory training about how to make safe ActiveX controls is going to save you? Do you think the mandatory training on how to safely embed a Java applet will protect your bank?

Software is too diverse, too inconsistent, and too rapidly changing to have any chance at standardization. Maybe in several decades when WHATWG hasn’t passed a single new spec into the browser.

(Edit: Also, it’s a fool’s errand, as there are literally hundreds of billions of lines of code running in production at this very moment. If you wrote an onerous engineering spec; there would not be enough programmers or lawyers on earth to rewrite and verify it all, even if given decades. This would result in Google, Apple, etc. basically getting grandfathered in while startups get the burden of following the rules - rules that China, India, and other countries happily won’t be enforcing.)


Civil engineering doesn’t change. Gravity is a constant. Physics are constants.

Physics may be a constant, but materials and methods are not. There is a reason why ISO/IEC/ICC/ASTM/ANSI/ASME/ASHRAE/DIN/IEEE/etc standards have specific dates associated with them.

If Rome wrote an engineering manual, it would still be quite valid today.*

Considering many engineering standards from a few years ago are no longer valid, this is almost certainly not true.


>> If Rome wrote an engineering manual, it would still be quite valid today.

We have some ancient engineering manuals. A book I read, most likely Brotherhood of Kings, remarked that Mesopotamian engineering manuals are primarily concerned with how many bricks will be required for a given structure.

The manuals are valid today, I guess, but useless. We prefer pipelines to brick aqueducts. Our fortresses are made of different materials and need to defend us from different things.


That’s only a formality, but reality did not change, and neither did the fact that those standards would still work even if they would be slightly inferior.


Physics will still be the same when your faulty software tells an airplane to dive.


In Canada, we have software and computer engineering programs accredited by the same entity (CEAB) that does civil engineering.

My program is more out of date (Java Server Pages, VHDL) but the school can't lower the quality of their programs. Generally, the standard learning requirements aren't on technology but principles, like learning OOP or whatever else. The CEAB audits student work from all schools in Canada to make sure it meets their requirements.

The culture itself is probably the most important part of the engineering major. They don't round up. If you fail, you fail. And I had a course in 3rd year with a 30% fail rate. Everything's mandatory, so you just have to try again over the summer.

A lot of people drop out because they can't handle the pressure. But the people that stay understand that they can't skip out on stuff they aren't good at.


I've got an ABET accredited Computer Engineering degree from a US school. The only thing it got me in interviews was questions about why not CS.

I did not follow the path to become a licensed Professional Engineer, because a) there was no apparent benefit, b) to my knowledge, none of my colleauges were PEs and I don't know how I would get the necessary work certification.

Maybe there's corners of software where it's useful to become licensed, but not mine.


There is nothing saying that allowing for some standardization means that we have to be stuck at 2003-levels of state of the art. And actually, yes many engineering disciplines do change, Civil engineering brings in new construction techniques, methods for non-destructive testing, improvements to materials and on and on, but it doesn't do so like the coked-up industry of software does it in such a free-for-all manner. It's a proper engineering discipline because there's the control, testing the best way to do things and rolling that out.

If we (meaning software 'engineers' and I tepidly include myself in that group) had half the self control in introducing insanity like the 10000th new javascript framework to read and write to a database like the 'proper' disciplines do, maybe it would be better because there's less churn. Why does it have to move so fast? Software is diverse and inconsistent and rapidly changing because 'the industry' (coked-out developers chasing the next big hit to their resume to level up) says it should. I just don't agree that we need that amount of change to do things that amount to mutating some data. If the techniques didn't grow beyond what was cool in 2007, or they were held there until the next thing could be evaluated and trained, but the knowledge and process around them did, perhaps we'd be in a better position. I know I certainly wouldn't mind maintaining something that was created in the last decade of the previous millennium knowing it was built with some sort of self-control and discipline in mind, and that the people working on it with me had the same mindset as well.


Simple - if you restrict the software industry, the US loses to China or any other country that doesn’t give a damn. And unless you censor the internet, there’s absolutely no way to prevent illicit software from crossing the border.

Would a business get in trouble for using it? Sure. But if all the businesses in your country are at a competitive disadvantage because the competition is so much brighter elsewhere, and that “sloppy constructed” software is allowing international competition to have greater productivity and efficiency, your country is hosed. Under your own theory, imagine if the US was stuck with ~2007 technology while China was in 2024. The tradeoff would be horrific - like, Taiwan might not exist right now, horrific.

Regulating software right now would kill the US competitive advantage. It narrows every year - that would do it overnight. The US right now literally cannot afford to regulate software. The EU, which can afford it, is already watching talent leaving.

There’s also the problem of the hundreds of billions of lines of code written before regulations running in production at this very moment. There are not enough programmers on earth that could rewrite it all to spec, even if they had decades. Does Google just get a free grandfathered-in pass then, but startups don’t?


I hope you realize that "sowwy, there's too much code :3" will not fly with whatever government decides to regulate software after the next major cock-up. We can either grow up and set our own terms, or we can have them forced on us by bureaucrats whose last computer was an Apple II. Choose.


Bull - regulators can’t change reality.

The fact that China is X number of years behind us, is easily demonstrable.

The amount of code running in the US, Y, is relatively easy to estimate by asking around.

Proving the amount of time it would take to modify Y lines of code to match any given law will exceed X number of years, thus putting us behind China, is also fairly easy to demonstrate, even if the exact amount of time is not.

Even our Apple II-era regulators know that going beyond that much effort (call it Z) is suicidal politically, economically, technologically, you name it. They might not understand tech, but they know it’s everywhere, and falling behind is not an option.

On that note, stop stereotyping our legislators. They have smartphones, younger aids, many of the oldest ones are retiring this cycle, etc.


> If Rome wrote an engineering manual, it would still be quite valid today.

“How to conduct water efficiently: first, collect a whole bunch of lead. Then construct pipes from said lead…”


Gravity is not constant, instead it varies by location and by height.

Bubble sort however, is always bubble sort. A similarly large portion of what engineers do with in software is constant


I'd imagine we wouldn't have ActiveX controls in the first place.


Wishful thinking - the IRS is still running on COBOL; our nuclear weapons until a few years ago on Windows 95. The NYC subway still has a lot of OS/2.

Standardization does not stop bad engineering. Those who think it does have not witnessed the catastrophe a bad standard can cause. Go download and implement the Microsoft Office OOXML standard - it’s freely available, ISO approved, 6000 pages, and an abomination that not even Google claims to have correctly implemented.


You're making some points for me. You are assuming COBOL, Windows 95, or OS/2 are bad because they're old. Such assumptions are the antithesis of "engineering."


Old technology isn’t necessarily bad in itself. It’s well documented and understood.

Where it’s bad is when the equipment to run that software no longer is manufactured. You can’t get a new computer to run Windows 95. Not even in the military. Your only option is to virtualize, adding a huge possible failure mode that was never considered previously.

Where it’s bad is when changes are needed to adapt to modern environments, and nobody’s quite sure about what they are doing anymore. There’s no test suite, never was, the documentation is full of ancient and confusing terminology, mistakes are made.

And on and on…


It sounds as if you're saying that these were bad things because they were always bad. And maybe they were. But we might never have any software at all if we only had good software.


I'm not saying they're bad because I don't know.


Apologies. I misread your intentions.


This is so wrong

Most suspension bridges were built without a theoretical model, because didnt have one yet. Theory caught up much later.

Innovation often happens in absence of Theory.


>Most suspension bridges were built without a theoretical model

That's not true, even for the first suspension bridge ever built (in the early 1800s), but it is true for example that many useful and impressive aircraft were built before the development of a physical theory of flight.


Galloping gertie is an example in America.

Your definition of theory only fits if you scope it so narrow that it's useless to the problem space.... Because the point is that theory didn't entirely cover that space. And bridges did collapse because of that.

But lack of theory didn't mean lack of rigorous testing. Gergie was built based on theory. Many other bridges were based on testresults..and did fine.


You've retreated from, "built without a theoretical model, because didn't have one yet," way back to, "theory didn't entirely cover that space." This is commendable.

>Many other bridges were based on test results

I'm going to go out on a limb a little and assert that not a single bridge was built out of steel or iron in the last 200 years in the US or the UK without a static analysis of the compressive and tensile forces on all the members or (in the case of bridges with many hundreds of small members) at least the dozen largest members or assemblies.


It's disingenuous to read my comment as saying no theory existed, ever.

It should be obvious that when you talk about theory covering a product, there either is a theoretical framework, or there isn't.

In the case of suspension bridges there wasn't. There was no mathematical theory to explain how the bridge stayed aloft, or how much it could carry.

What bridge builders of high quality did, was make mock (small) models. And test how much rocks they could put on them.

I think you will concede that that isn't a theoretical model. It's a practical one.

And this happens very often. People experiment and build useful things, but no one understands why they work. Until later people come along and explain the phenomenon.


This issue goes beyond CrowdStrike and points to the general approach to security that is buying products off the shelf to satisfy regulators and insurers while not actually caring what it does or how it works.

I'm not saying tech shouldn't be regulated, but our current model of "buy this thing to shed liability" doesn't work. The worst part is, the people who saw this coming (i.e. your IT department) probably can't do a damn thing about it because it's mandated at high levels in the company either for "cyber insurance" requirements or some other regulation. Madness.


> The worst part is, the people who saw this coming (i.e. your IT department) probably can't do a damn thing about it because it's mandated at high levels in the company either for "cyber insurance" requirements or some other regulation.

I've worked with many excellent IT people who feel this way, but the vast majority of my experience with IT departments has been that as long as the contract covers what it needs to, they don't actually care if it solves the problem or not. At a previous job, software similar to crowdstrike was installed on my workstation over a weekend, and I came back to 20% slower compile times (I was working on them at the time so I had dozens of measurements). I had ETL traces showing the problem was the software, but IT refused to acknowledge it because the vendor contract said there was no performance impact expected for our workload.


That is my experience, too. I attribute it to IT / sysadmin jobs having a lower bar to entry and becoming more of a "watered down" business unit that just follows orders without much say or care for anything.


Most IT departments wouldn’t have seen this coming, and certainly would’ve been right to not base their entire security strategy around it. I’m not sure where this narrative is coming from. Falcon delivered and still delivers real, genuine security benefit to its customers. That does not mean that it eliminates all risk, and does not mean that it doesn’t introduce risk of its own.

It’s literally a game of trafeoffs like all engineering problems. This shouldn’t be that foreign to anyone here. Suddenly HN is full of security experts that are fuelled with 20/20 hindsight and recency bias, explaining how companies could’ve dodged this bullet without considering what very real bullets were being dodged by using Falcon in the first place.


> Suddenly HN is full of security experts that are fuelled with 20/20 hindsight and recency bias

That is incorrect, many in tech saw these blanket IT policies being implemented and didn't like the prospects but couldn't change anything. At my workplace, policies like password rotations every 90 days (NIST recommends against), resource heavy machine scans, and nonsensical firewall rules are all a result of the company buying "cyber insurance".

> It’s literally a game of trafeoffs like all engineering problems

Adding a single point of failure to all of your systems is a pretty big tradeoff to make for questionable gains.

> Falcon delivered and still delivers real, genuine security benefit

Rhetorical question but I'll ask why some of the machines affected in the CrowdStrike outage even needed EDR software installed in the first place? Examples are flight status displays, critical 911 and healthcare machines, warehouse cranes, etc. things that don't immediately pass the smell test for having an internet connection.


To your final question, those machines were likely to have a connection to the internet at some point or indirectly through something else, of which may have left it vulnerable.

It speaks to more than just EDR solutions but appropriate segmentation of critical endpoints on the network. Flight status displays may have definitely had an internet connection.

To your middle point, I don't think perhaps people understood the reality of how/if Crowdstrike would become a single point of failure on their systems. We now know it was a single point of failure that caused systems to completely shutdown, but up until that point I don't think that potential was overly understood nor considered how possible it was going to be.


This may end up in one of those court evidence videos or lawsuits - this isn’t a funny thing.

This would have been a closed moment (just a bunch of security nerds discussing something) but instead this is now freely available for the wider general public who had major grievances to lampoon them.


> This may end up in one of those court evidence videos or lawsuits - this isn’t a funny thing.

I didn't take the CrowdStrike's executive as making light of the situation, at all. If anything, I thought his speech took it seriously, acknowledged it was a major, major fuckup, and basically said he was accepting the trophy as a mark of shame and as a cautionary tale for future CrowdStrike employees.

I thought the exec accepting this was a true class act (to emphasize, saying that in no way should imply that I think it absolves CrowdStrike of responsibility, or liability, for what happened).


Context is everything. They had every chance to own up from the day of until now. A ‘lulz haha we goofed up’ in a nerdy security conference doesn’t seem like the right place or time.


I'm not sure what your definition of "own up to it" is, but they issued an apology day-of.


Apologies don’t mean anything from a c-level suit (George Kurtz) that has known history of causing outages. The culture at crowdstrike of being accountable is a facade.


Got it so to own up to it that have to change the culture... Overnight?


[flagged]


totally owned them with 'clownstrike'. glad you were proud enough of it to use it twice


> Love how you post from a throwaway account.

:eyeroll:

As if "xyst" somehow lets us identify you?


Can make it funny, via a T-shirt:

When I use REGEXP I use it in my KERNEL CODE

Tragedy and comedy, same coin...



Computer security issues cropped up during Viet Nam, and the US did the work, and found actually effective computer security models. We're in a society that has effectively memory holed them.

Why is it necessary to have scanners running 24x7 on everything a computer is about to run?

Why is it necessary to have Operating Systems that rely on ambient authority?

Blaming Crowdstrike does nothing but distract from the fundamental design failure we ignore every day in our operating systems, Linux, MacOS, Windows, et al.


While they're still blaming Microsoft as if it isn't possible to run the code they update outside of the kernel and only use the kernel mode code for observation and action, but not logic.


I work in IT and I happened to be the poor bastard on call when Clown Strike took out the majority of infrastructure. If it wasn't for my own personal refusal to use cloud based bullshit we would have probably been down for days instead of hours. The fact that people like my IT director saw nothing wrong with this and is taking 0 steps to negate such bullshit makes me quite worried that I will soon have to deal with some other catastrophic cloud based failure in the near future.

I keep repeating ad-nauseam "only idiots rely on other peoples computers" And I stand behind that statement 100%


What strange that there are mangers/executives who understand why it’s bad to have a SPOF (single point of failure) in the internal infrastructure but who are OK with having a product/service from an external vendor as an SPOF. Like if you have a contract and are paying money for it means that it is created and maintained by infallible superhuman (as opposite to internal engineers they don’t trust). Such misplaced trust puzzles me.


My director mistakenly thinks the more it costs the better it is, he refuses to even consider anything FOSS. Before he came along we ran everything in house and we still had issues of course but down time was nearly nonexistent because we could take action immediately instead of waiting for whatever cloud service messed up today to feel like getting around to it.

And of course we now get to pay monthly for the privilege of being at someone else's mercy as opposed to before when we paid once and went on our merry way.


I 100% agree. Additionally, I am amazed to see that people can pay outrageous fees for cloud services such as Azure VD. For the fraction of the yearly cloud budget, companies can create crazy stable, offline-capable infrastructure themselves.


You sound like a very aggressive person. I don’t think I’d like to work with you regardless of whether or not you were right. Maybe you’d get your point across better if you weren’t so aggressive about it.


k


Can only do so much when idiot CTOs take their advice from CTO summits, consultants with their own perverse incentives, and of course random conferences


Previous Pwnie Award winners for comparison:

https://en.wikipedia.org/wiki/Pwnie_Awards


most of the previous "most epic fail" awards are naturally dominated by microsoft.

blame shame can continue but if you believe if "production failures" such as this one is due to bad processes, then m$ definitely played a big role here.


I'm wondering how on earth has the CEO and CTO managed to keep their jobs after this fiasco?


So 911 not working in several cities and hospitals flows being slow down to an halt, and they have time to go to defcon for a joke?


Are there hospitals that haven't fixed their computers yet? Obviously CS messed up, but aside from paying for damages and making process changes, I'm not sure what else they can do at this time. Warning people to not repeat their mistakes doesn't seem like a bad use of time.


CS deserves the blame here but putting CS into a critical system like 911 IMHO is a huge mistake (but whoever done this likely knew they will dodge the blame so why they should care).


I think they’re owning up to their mistakes instead of dodging the issue. I still feel that if they did the right testing, they shouldn’t be blamed for everything. It’s pretty standard for IT teams to avoid auto-updates and instead manually review them—especially in critical sectors like healthcare, aviation, and government. For instance, at my workplace, we’re not allowed to auto-update VsCode.

They mentioned they ran tests which unfortunately returned false positives. While it’s true they could’ve been more thorough, the affected companies also dropped the ball by not doing their own checks


> I still feel that if they did the right testing, they shouldn’t be blamed for everything.

This update crashed 100% of the Windows systems it got installed on, which means either their testing did not involve actually loading it on real world computers at all or that blue screening and boot looping did not cause the test to fail. It is objectively clear that they did not do the right testing. There is no excuse for this update having ever left the earliest stages of a proper test process.

It's not like this is a case of an unexpected interaction with a configuration not found in the test lab.

> It’s pretty standard for IT teams to avoid auto-updates and instead manually review them—especially in critical sectors like healthcare, aviation, and government.

This component was not able to be controlled in this way. Systems that were configured to be delayed on other CrowdStrike updates still got this particular update immediately with no ability for IT departments to control them.

> They mentioned they ran tests which unfortunately returned false positives.

Again, whatever tests they actually ran clearly didn't involve actually loading the update in to the actual driver. Their explanation sounds like they may have validated the formatting of their update or something like that but then just sent it.

> While it’s true they could’ve been more thorough, the affected companies also dropped the ball by not doing their own checks

No they did not because they could not. They may have dropped the ball when installing Crowdstrike in the first place, but the whole reason this was such a widespread thing affecting so many high priority systems is that it wasn't able to be controlled in the ways IT departments would want.


> This component was not able to be controlled in this way. Systems that were configured to be delayed on other CrowdStrike updates still got this particular update immediately with no ability for IT departments to control them.

I had to look this up because I had not heard about this. I didn't understand that this bypassed companies' protections. I take back what I said, I guess I'm used to companies like those to having poor IT standards but once something goes wrong, they pretend that they had no part in it.


They weren't even nominated, so this is such an epic failure they won it as a late entry.

Also, that pony isn't typically super-glued to that structure, so this was also a special trophy! :)))))))))


This comes across as incredibly tone deaf. People suffered degraded medical care, billions lost in the airline industry, billions more lost in productivity, and ultimately its time that people cannot get back. Yet these clowns are accepting joke awards as if this is something to hang on your trophy wall.

This is actually a c-level executive at ClownStrike, by the way.

> Michael Sentonas serves as President and is responsible for CrowdStrike’s product and go-to-market functions, including its sales, marketing, product & engineering, threat intelligence, privacy & policy, corporate development, corporate strategy and CTO teams

https://www.crowdstrike.com/about-crowdstrike/executive-team...

The whole C-level executive suite at ClownStrike needs to go. This company needs a real CTO like Jeremy Rowley. Although I suspect a good person like him would never join the ranks of ClownStrike


Did people actually watch the video??? I just don't understand how they think Michael Sentonas was making a joke of all this. If anything, he was acknowledging the horrible outcome of what happened.

I don't think this absolves CrowdStrike of responsibility at all, but what would you like him to do, commit harikari on stage?


I watched the video. I saw this asshole executive with a huge shit eating grin on his face the entire time he gave his PR-managed speech and lapped up the applause of perhaps the stupidest audience in tech history.


I would like to have had him wear a red clown nose & walk to stage on soundtrack of The Empire Strikes Back. Nothing more, nothing less.


Clearly, step down. Including CEO (George Kurtz)

The shit rolls down hill, starting from the c-suite. These clowns clearly cannot change the org and are blind to the issues. Keeping the same leadership means nothing will change. The fact that they even poke their head up for what is clearly a marketing/PR stunt without showing any substance shows how clueless they are.

Guy has “20 years” of experience which clearly doesn’t amount to shit. Maybe 20 years of junior experience and falling upwards.


People who write "ClownStrike" aren't contributing to discourse. Downvote and move on


Yeah, the event was very clearly a crowd stroke.


Well, this is the second time their CEO caused a major outage by pushing a flawed update for a security product. This whole thing is probably a joke to him.


No, the most epic fail is still held by Boeing...they launched an alpha capsule to ISS and still cannot get the astronauts back to Earth...


It took our IT department until Monday afternoon after to put out a message reaffirming their confidence in crowdstrike. Before any postmortem on their side or ours. I’m guessing they got offered a big discount to renew.

My IT team have a lot more confidence in crowdstrike than I have in them.


Serious companies should have better QA. This includes Crowdstrike customers.


Wonder what kind of deliberation led to them accepting the award.


something along the lines of, how do you reach your most influential customers all at once with a sincere message. this was the right thing to do.

anyone who makes serious decisions will see acknowledging this in front of peers was correct. it's funny how the hacker ethic of celebrating failures as lessons becomes impossible when you have a chorus angling for leverage all the time. the failure mode of most tech is catastrophic, where all the convenience you get from it disappears suddenly and randomly. I'd be mad about the lost time during the recovery and over missed flights or even health services, but managing that risk is the job.

to anyone else, next time something fails and messes up your plans or puts you in a spot, try to remember a time when you had a chance to do something well but didn't because you were thinking, "not my problem."


They screwed up. They know it, and everybody else knows it. Trying to pretend they didn't would just make them look even more lame.

Or, viewed from the other side: Owning your failures makes you a grownup.


Shhh, the people here want blood.

BTW, did you know that there's an endless stream of "satisfying" drama on YouTube? I heard that Mr. Beast is finally in some hot water!


The outage caused actual human deaths. Yeah, most people here probably think the priority is criminal justice, which you might call "blood" in a dishonest attempt to make us appear cynical when they're the ones accepting funny nerdy awards after causing so much chaos.

Maybe next time a doctor causes death because of their negligence, they should accept an "oopsie award"? It would be le funny lulz XD


In the cases where outages caused human deaths you’re 100% right that there should be worse consequences.

But in many cases, hopefully it’s also a lesson to the places where people were harmed, to never let one piece of software get in the path of life or death without redundancies.

(Not at all defending their screwup, I just don’t think EVERYONE deserves the same restitution, some more than others.)


Refusing to show up generates the same, if not more, negative PR without the opportunity to show humility and promise to do better.


Hubris: the belief that they are special and will get away scot-free for all damages they caused.


Did you click through to the video? Because the acceptance speech seemed to show the opposite of hubris to me. Specifically in owning up to the mistake, and using the award as a reminder to do better in the future.


I’m sure he does not represent the PR and legal teams at CrowdStrike. I’d take anything he says with a grain of salt


CrowdStrike is publicly traded and he's accepting the award as president of the company. You bet your ass he does represent the PR and legal teams here.


Exactly. In fact when I saw this I was impressed that he said things that his PR and legal teams would’ve STRONGLY advised him not to say.

If anything, his attitude of (paraphrasing) “we accept this; we screwed up, and we will prominently display it as a reminder to our staff to never let this happen again” was about the best response he could’ve given.


Interestingly, multiple studies have shown that doctors owning up to mistakes and apologizing results in smaller med mal settlements. I believe a few carriers recommended it.

As an attorney, I’ve made mistakes and screwed things up. The initial instinct is to be less than candid and not admit anything. Then, after a sweaty sleepless night, I bit the bullet and was open and honest with the client, admitted fault, apologized, and offered to do what I could to make it right. Every reptilian synapse was screaming “don’t do it,” but it was the right thing to do, and I have no doubt, cost me much less in the end.


"No such thing as bad press" I assume.


Comes a point when some people just have an "ethical breakdown (breakthrough)". It's a positive thing. It's where recovery starts. He's owning it. There's no absolution until you throw yourself in front of the lions. At this point who cares what the PR and legal teams have to say. They'll be lucky to have a job in a few months.

I really hope he makes the most of a great opportunity to tell some truth, so that we can break the cycle of bullshit solutions causing further pain and loss in the future: Something like;

   "Thanks for the award. Well, we all knew this managed endpoint
   cybersecurity shit was never gonna fly. And on Windows? Seriously?!
   You all knew it too, but you pays yer money and takes yer chance
   for a lucky charm to keep the auditors and insurance ghouls
   away. So here we are. We all got caught with our pants round our
   ankles. It was a good racket while it lasted. Oh well... Anyone
   hiring?"


Probably firm they outsourced their public relations to thought this would be a good idea. It’s backfired


They understand who’s buying their product. It’s not the information security teams who cleaned up this mess, but rather the operations and end user compute teams.


Never in a hundred years I’d expect to see the CEO of Crowdstrike at Defcon. The two are the two extremes on a spectrum of corporate.


This is how counter-culture is systematically uprooted


It would’ve been the best place to say sorry, but as far as I understand this would have legal consequences.


The only reason this could be funny is because the software industry has found a way to excuse itself from any liability.

There is no other industry where someone could cause so much damage and laugh about it. Least of all because the liability itself would have led to its collapse.

Can you imagine a company hired to reinforce a bridge to protect it from damage from a ship instead causes its collapse.

How long is that company gonna last? Even if no one dies or is injured it will be run out of business.

Only in Tech can such a company not only survive but laugh about it.

And that’s even before we get to how amateurish the mistake these guys made was.


Fuck up the world's computers, piss off all of its IT teams, and then send people $10 uber eats gift cards as if that'll get you anything, maybe a coke at best.. but its further admitting fault. That's like, a tip. Like here, have a sandwich while you fix our fuckup.

They don't care. They'll all get 6 figure bonuses too in management for 'weathering the storm' after the mishap and probably get more money because look what they can withstand, literally technology-murder and get away with it.

It's almost movie-grade evil villainy tier stuff lol


Honest question, what should they do? Uninstall the company and give up? Ignoring their actual response, what is a good response to this?


If my company causes billions in damages and endangers human lives, I can't imagine why my company isn't bankrupted and dissolved.


Ok, but the market makes that decision, not the company. Crowdstrike has no choice but to accept the sentence the market hands it. It’s just that the market appears to have sentenced it to…barely anything. Blame those still using CrowdStrike after this incident.


> Blame those still using CrowdStrike after this incident.

I think you'd have to ask "why are they still required to use CrowdStrike or any AV provider?" I think once you find the answers to these questions you realize this is not a properly functioning product market.

How you can then build a publicly traded company on the back of a complete and total lie is another subject, but it's certainly also implicated in the above questions.


If your company causes damage at society-scale (hell, even if it does major damage to one person's life), the state should be ready to intervene and make the company pay the tab for the damage they caused? Like, that doesn't sound very controversial.


Yea. Their contracts likely have clauses for all of that. I say likely, but we already know this is true because it's come out.

The thing is, crowdstrike isn't the only incompetent party here. Many major companies (looking at Delta) probably made it worse for themselves with a very poor response after.

So should crowdstrike pay beyond a reasonable measure because of Delta's poor response?


No contract clause can protect you from a gross negligence tort.

(Or equivalent in one's respective civil law system.)

This might be the easiest gross negligence tort case to show and litigate-- still hard but if everyone starts the lawsuits they can not pull the contract to protect them. They will try of course and they will fail in most but the obvious cases.

What you can not sue them for is not forseeable damages -- e.g. I lost my dream job because the computer died during the interview. But ceasing operations of a company is generally fair game. And plaintiffs can argue that no reasonable person could forsee and mitigate against this disaster so the failure is not due to plaintiff's "fault" negligence.


Reckless typically requires conscious disregard of risk. Arguably, that would require Crowdstrike emails from programmers saying “this is risky, we need to test it” and management responding “F it! We’ll do it live!”

If nobody in CS realized how dangerous their process was, it’s not reckless.


That's interesting but my sniff test isn't passing. "Reckless driving" doesn't require me to know it's a bad idea to do 100 miles per hour in a 25, it is reckless whether I realize it or not right? IANAL but the only thing I can think of requiring knowing to be at fault is slander, at least in the USA.


Actually generally the legal system would decide that, not “the market”.

I.e. investors have assigned roughly zero probability to CrowdStrike bearing the full cost of this incident, and set the market price accordingly.


The market as a tool for punishing bad players is far from perfect. It's why we still have monopolies and see consumer antitrusts and other similar legal suits in court. Advocating for shifting blame to customers still using Crowdstrike is ignorant of the problem and further signals a dishonest approach to the issue at hand.


Equifax, for example. They probably caused way more damage leaking all our info


The company should be fined so heavily that they become a goverment asset.

With current shareholders wiped out entirely, it should then be re-listed.


IMO the company that this should have happened to is PG&E. I think California could have forced them into liquidation and bought their assets. No bailout required, complete loss for shareholders, and CA could potentially have fixed much of its disastrous utility situation at a reasonable price.

A company-ending fine or judgment against Crowdstrike wouldn’t come with any great reason for a public takeover — Crowdstrike could cease to exist and the overall ecosystem would be fine.


Outside of Alaska and Wyoming, Silicon Valley has the worst power and internet of anywhere I've ever lived (worse than AR, MN, and ND by a longshot), measured either in incremental cost or uptime/availability. The fact that PGE keeps requesting additional rate increases "for fire safety" and immediately kicks those back to shareholders isn't a great look either.


Hide themselves from the face of the world for at least 5 years until people somewhat forget about them.


Because people on the receiving end are the same - they accepted and rolled out the update without even as much as “canarying” it. SolarWind was the same - the customers weren’t bothered even by mismatched integrity hashes. It is a tacit pact in our industry - we all screw up and cut the slack to each other. Who will cast the first stone?


> they accepted and rolled out the update without even as much as “canarying” it.

Well, no; AIUI part of the problem was precisely that this update was pushed in such a way that it skipped any canary system in place. There might be a separate conversation to question what percentage of their users were taking advantage of its staged rollout features, but it's rather immaterial when the incident in question bypassed them even if users had configured it sensibly.


But the customer installed CS software could do this. So they are partly to blame. I do not think you will find that tesla would allow a third party update to its car. Or a oil rig would allow third party updates to critical parts of its systems. So its understanding the context. I think a lot of places this is an risk that is ok. But maybe not everywhere. And I hope some companies with critical systems will learn from this


> But the customer installed CS software could do this. So they are partly to blame.

It depends on if/how it was communicated. If there's a big red box in the user manual that says, "this software might take updates that completely bypass any phased rollout you configure", then yes it was probably irresponsible to use it. If, however, the software lets you configure phased rollouts and fails to mention that they might just get ignored, then I don't see how the customer can be blamed at all. (And in both cases, if CS shipped such an update with exactly zero testing whatsoever, which strains credulity but is what I've read, then they still get most of the blame.)


Crowdstrike can force-push an update at any time of their choosing that the connected device will grab and load, is my understanding.


Don’t you see that you’re only enforcing my point?


No, because "canary" in the context that you used it, has a specific meaning. If you believe they should have tested CrowdStrike more or been more skeptical of their claims before licensing, that's independent of the user/administrators doing canary-style testing.


What’s the default? And what did their technical account manager recommend? My guess, no canary ring.


Crowdstrike should be held accountable but so should any reasonably sized enterprise that allows code to be pushed to their whole enterprise without testing. All the large enterprises I've worked for required Windows patches to be tested before being pushed to production why are crowdstrike updates treated differently?


It clearly states in the post mortem that rapid response updates were pushed globally without any say from the customer: https://www.crowdstrike.com/wp-content/uploads/2024/08/Chann...


So in theory if a company like crowstrike were compromised by criminals/disgruntled employee/NSA the impact could not be avoided?


Given the cybersecurity landscape it is not unusual for security software updates to be pushed globally without the option to test or even stagger them. When a potent new vulnerability becomes public knowledge (or semi-insider knowledge) and especially if there's already a PoC available, organizations only have minutes to a few hours before threat actors begin utilizing it.

APTs and organized crime groups have 24/7 staff to weaponize and integrate new vulnerabilities into their workflow as rapidly as possible, or have other contracted other groups to provide this service.


So, you would prefer that they don't accept this "award", and in so doing admit that they messed up?

And honestly, crowdstrike is more likely to go under than a company that failed to re-inforce a bridge. Their mistake caused measurable harm to many well funded companies that have the resources to sue crowdstrike in court.

If crowdstrike survives, it will be because there isn't a lot of competition in their market, not because they can excuse themselves of liability.


I would have liked to see Crowdstrikes legal councils face as they accepted this award. There is no way they ran it by them.


Statistically speaking it seems likely people really did die from this mistake, if only indirectly due to for example delays in medical care caused by the outage.



I don't think this is all that accurate. In the engineering space, Boeing has so far accepted responsibility for two fatal crashes and the fucking door falling out of an airplane and is still in business.


We've professionalized industries like engineering and medicine because incompetent practitioners are a threat to public health and safety. Software is now in everything and incompetent practitioners have been a threat to public health and safety for a long time now yet we do nothing about it.


Blaming individuals when systems fail: does it work reliably?

And how can certification work across borders between jurisdictions?

I've seen a fair share of engineering disasters in my own developed country where a few signatures by engineers didn't prevent the causes.

Regulations and jail doesn't seem to be enough of a disincentive? How do you force someone to do a good job?

Open source is not very compatible with certification.

Certification either (1) makes all software proprietary or (2) requires people to sign that particular open source software is safe or (3) maybe we should disallow the liability evasion clauses of open source software licenses?


In construction, Grenfell happened and witnesses demanded immunity from prosecution to testify, because they knew they’d broken numerous laws in its construction and certification. Residents at similar buildings are the ones paying to make them safe for habitation, not the crooks that built them. Professionalisation is not a magic bullet.


As if "professionalized industries" like engineering and medicine don't ever commit mistakes.


Found the gatekeeper


Crowdstrike and countless other software failures before it is literally the proof that the gates need to be kept. The only question is whether we start doing our own gatekeeping or it eventually gets forced on us by heavy handed legislation like it was for doctors and civil engineers.


I'm sorry, when did the switch flip in this industry where we decided we didn't want to hire people with expertise and experience?


Make sure to pick a surgeon from the street next time you’ll need an operation. Don’t want to gatekeep the profession, after all.


Not necessarily from the street, but I do expect bio graduates to be trained in health care at scale, without being limited by residency programs. There's no point training 10 biology graduates only for 9 of them to work as waiters or on OnlyFans.


The time to take yourself seriously is before the stuff happens. Being stuffy now doesn't add anything. Accepting the award means they have something to show every single new hire and everyone in that office will have a physical reminder to do better in the future.

I get the other points about consequences, but I don't think accepting this award is in anyway problematic. It's one of those things that I expect only people that care about "appearances" would complain about.


A less charitable way to look at it is that they weren't taking things seriously before the incident, and they're still not taking things seriously now.

What the most appropriate way to view it is, I don't know. I think I'd need to know way more than I do about Crowdstrike leadership.


This wasn't their first serious blunder this year even, just the most damaging and visible. The nature of their mistakes seem to be exceedingly preventable too, with them failing at textbook SRE practices. Their CEO has now been at the helm of two different companies that have had similar problems under his leadership. The evidence keeps piling up and people want to keep making excuses for negligent behaviors. Why should we excuse facts for hypotheticals?


I agree. They show up, cop to it, and collect a memento mori that will hopefully help motivate improvement in the future. They have a lot of work to do to repair their reputation, and I don’t think they’re foolish enough to think that this is anything more than a small step on a long path.


Ever had someone make a mistake that cost you time or money and then tried to laugh it off as no big deal? That's what this feels like.

The security industry needs to grow up.

The best thing they should have done is fire the CEO, apologized profusely, and then use their army of sales people to help make things right on a one on one basis with their customers.


I'm sorry, how has CrowdStrike at all demonstrated that they're going to do better?


Wait till you learn about finance, or oil and gas, or mining, or countless other legacy industries that quietly run America since the Rockefeller days. Many parts of Texas lost power for weeks due to Beryl and all the power companies got was a slap on the wrist (despite being explicitly warned about this scenario many times).


Hell look at Grenfell in London! 72 people died and the people that designed it wanted immunity from prosecution to testify at the inquiry! Similar buildings are charging the cost of upgrading the cladding to residents instead of doing the right thing and eating the cost themselves.


You would not want for a sympathetic ear if you also criticized these companies. The point is not that CrowdStrike is uniquely incompetent, not at all. Every critical organization needs to be held to a higher standard, not just incompetent security firms.


Companies are too easy a target...

What were the regulatory incentives? Was the electricity market designed to encourage resiliency?

There was a systematic fault - it needed a collective solution not one that relies on individual companies doing the right thing???


I can think of at least one industry where the price of failure is almost always borne by the users and not the companies. Very closely integrated with tech as well.


100% agree. As mentioned in another post, the acceptance of this joke award is completely tone deaf.

Hospitals, banks, airlines, governments, and of course various IT operations at companies that are forced to use this endpoint security crap and Windows were impacted. Many people suffered degraded quality of care at medical facilities. Surgeons losing access to critical imaging/labs during surgery. Probably many canceled and rescheduled surgeries as well.

People’s flights were delayed or canceled. Imagine having to take a last minute flight to visit a loved one on their death bed only to get canceled because ClownStrike shit and incompetent IT departments/CTOs fucked them over.

Many people/businesses unable to access critical banking services.

Then the amount of lost productivity for office workers. Many hours lost for IT folks, often even working into the weekends. Time lost to dealing with ClownStrike bullshit when that time could have been spent with their families and friends.

Fuck ClownStrike, George Kurtz, and this latest clown, Michael Sentonas


It just illustrates how low the stakes are in tech really.


This attitude is why we have a culture of fear and do-nothings.


To add insult to the injury, these people call themselves "software engineers".


There is sparsely little real “engineering” that goes on in the field of “software engineering”, industry wide, in terms of ensuring that our software is reliable and secure. Performance and development velocity seem to take the driver seat always, in a whole lot of software.

See also this talk by Bryan Cantrill, “Scale by the Bay 2018: Rust and Other Interesting Things” https://youtu.be/2wZ1pCpJUIM where he talks about software platform values, in the sense of what different programming languages and other things focus on. He also touches on the fact that while higher level layers in the stack might value security it kind of falls apart when the very microcode in our CPUs sacrifices security for performance.

This was in a period of time where Spectre https://en.m.wikipedia.org/wiki/Spectre_(security_vulnerabil... and Meltdown https://en.m.wikipedia.org/wiki/Meltdown_(security_vulnerabi... had reared their ugly heads.


> There is sparsely little real “engineering” that goes on in the field of “software engineering”, industry wide

There surely is actual engineering but it's scattered unevenly across companies. It's funny that Crowdstrike did fuzz their code but didn't even check for correct arity. I think that the Cybersecurity industry isn't such a strong adopter of sophisticated engineering techniques as for instance in Web development where new testing techniques evolve every few years.


I really don't think that's true. All software is undertested and it's likely that there isn't a significant differences between web apps and security apps.

Having said that, writing ring 0 drivers an unsafe language sounds like an invitation to disaster. That's what went wrong with CrowdStrike. You don't need any testing to avoid crashing the OS when given a bad virus definition file. (Making the virus definition file do something useful... sure, you're gonna need tests for that.)


Performance is dead in the trunk next to a shovel and some quicklime. I know people say they take performance seriously, but as someone who is also a user of commercial software, I am reminded of James Baldwin: "I don't believe what you say, because I see what you do".


On the other hand (maybe I'm just playing devil's advocate here): nobody died (I hope at least! It's possible if some hospital equipment, 911 calls, ... failed...) from this incident despite being such huge scale that almost everyone knows about it. It's almost as if humanity can be... fine... when all this computing equipment fails.


Many hospitals, including emergency rooms, were shutdown. Maybe no single death can be directly tied to the event but I'm pretty this effectively resulted in greater death.

Maybe some of those inconvenience had to time stop and contemplate the world but there are parts of the world where computers stopping don't leave people just fine.


Agreed and I’m sure there are tons of anecdotesb just in this community. My daughter for example is a night shift labor and delivery nurse at a level three metro hospital. They were heavily impacted. All of the phones were down, all of their internal messaging was down, translation services are down, it was a rough couple of nights.

I don’t have any direct experience with crowdstrike, but in my experience with security vendors in particular, they make it very difficult for customers to inject useful change management into the process. I’ve been in infosec for nearly thirty years and “my people” also need to shoulder some of the blame for catastrophizing delays in delivery of updates to preventative and detective controls. I’ve always known this but spending the last year ‘outside’ in central platform delivery and operations for a large financial has really brought that to light. Fortunately I know how to speak security so it helps us navigate but many aren’t so fortunate.


Stopping air travel has probably canceled a lot of unnecessary meetings and slowed down global warming.

Maybe events like this provide value in that they indentify which systems are actually mission critical.


It also probably caused a lot of people to not be able to say goodbye to their loved ones, or miss their holiday, or whatnot. People don't travel exclusively for frivolous reasons.


I am not suggesting that airlines should see their operations as non-critical (and let a third party make arbitrary untested updates to their critical systems).

But maybe some organizations using air travel can learn from this. I am still hoping that the pandemic will have had some lasting positive effects (on top of all the pain it has caused).


Then what value do they provide?


Glad you find it funny Chaps, forwarding this to my legal dept.


This shows great grit and ownership. Props


That seems unwise and tone deaf - to laugh it up on stage after causing easily tens of billions of dollars in losses worldwide.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: