Hacker News new | past | comments | ask | show | jobs | submit login
CrowdStrike debacle provides road map of American vulnerabilities to adversaries (nytimes.com)
279 points by jmsflknr 55 days ago | hide | past | favorite | 379 comments




Really interesting to me that none of the commentators I've seen in the press have even hinted that maybe an OS that requires frequent security patches shouldn't be used for infrastructure in the first place. For just one example, I've seen photos of BSODs on airport monitors that show flight lists -- why aren't those built on Linux or even OpenBSD?

Security is not a feature that can be layered on. It has to be built in. We now have an entire industry dedicated to trying to layer security onto Windows -- but it still doesn't work.


> why aren't those built on Linux or even OpenBSD

The vendor who makes the software has always written for Windows (or in reality, wrote for either DOS or OS/2 then transitioned to NT4). History, momentum, familiarity, cost, and ease of support all are factors (among others, I'm sure).

Security is a process, not a product.

And yes, distros require frequent updates, though more to your point, you can limit the scope of installed software. I'm sure airport displays don't need MPEG2, VP1 and so on codecs, for instance.

It's also important to remember that there is a lot of 'garageware' out there with these specialized systems. Want SAML/OIDC support? We only support LDAP over cleartext, or Active Directory at best. Want the latest and greatest version of Apache Tomcat? Sorry, the vendor doesn't know how to troubleshoot either, so they only "support" a three year old vulnerable version.

Ran into that more than a few times.

Given the hypothesis of what caused the BSOD with Crowdstrike (NUL pointer), using a safe language would have been appropriate -- it's fairly easy in this case to lay the blame with CS.

Microsoft supplies the shotgun. It's the vendors responsibility to point it away from themselves.


> I'm sure airport displays don't need MPEG2, VP1 and so on codecs, for instance.

They don't, until the day the airport managers are approached by an advertising company waving the wads of cash the airport could be 'earning' if only they let "AdCo" display, in the top 1/4 of each screen, a video advertising loop. At which point, those displays need the codecs for "AdCo's" video ads.


Boy do I sure hate you for saying that. I mean at some point you are right. That is the future. But god am I mad at you for reminding me this is the world we live in.


That is not the future but the present. I have already seen flights information panels alternating with ads every few seconds in some airports.


Absolutely (sigh)! But with a deployment of devices like that, the operator has a solid central management system from which they could push software as-needed.


Wow,

Security is a process, not a product...

The vendor who makes the software has always written for Windows (or in reality, wrote for either DOS or OS/2 then transitioned to NT4). History, momentum, familiarity, cost, and ease of support all are factors (among others, I'm sure)...

That's starting the argument with "weight loss is about overall diet process, not individual choices" and then hopping to "ice cream for dinner is good 'cause it's convenient and I like it".

The statement "Security is a process, not a product." means you avoid shitty choices everywhere, not you make whatever choices are convenient, try to patch the holes with a ... product ... and also add an extra process to deal with the failures of that product.


The statement "Security is a process, not a product" refers to no _product_ can be a security strategy. _Processes_ are part of security. The security landscape keeps evolving and what was appropriate even 5 years ago may not be appropriate today. You have to evolve your strategy and countermeasures over time as part of your _processes_.


The statement "Security is a process, not a product" refers to no _product_ can be a security strategy.

That's the negative part. The positive part is that security considerations have to run through an entire organization because every part of the organization is an "attack surface".

The whole concept of CrowdStrike is that it's there to prevent individual users from doing bad things. But that leaves the problem of CrowdStrike doing bad things. The aim of security as process is avoiding the "what-a-mole" situation that this kind of thinking produces.


that's not what a CEO wants to hear.

They want to hear that they can pay $X dollars to this service provider, and tick all of the cover-your-ass boxes in the security checklist; where $X is the cheapest option that fits the bill.


[flagged]


Consisting of 4 (or 8) null bytes, the symbol for which is ␀ (letters NUL). I don't think we need this level of pedantry.


Right now on the frontpage: 'CrowdStrike broke Debian and Rocky Linux months ago, but no one noticed'

[1] https://news.ycombinator.com/item?id=41018029


> an OS that requires frequent security patches > Security is not a feature that can be layered on. It has to be built in

This is a common misunderstanding, an OS that receives frequent security updates is a very good thing. That means attention is being paid to issues being raised, and risks are being mitigated. Security is not a 'checkbox' it's more of a neverending process because the environment is always in a state of flux.

So to flip it, if an OS is not receiving updates, or not being updated frequently, that's not great.

What you want is updates that don't destabilize an OS, and behind that is a huge history and layers of decisions at each 'shop' that runs these machines.

Security is meant to be in layers and needs to be built in.

> but it still doesn't work.

It does work because the 'scene' has been silent for so long, but what we as humans notice is the incident where it didn't.


This sort of thinking is one of the main problems with the industry, in my opinion.

We've got a bunch of computers that mostly don't make mistakes at the hardware layer. On top of that, we can write any programs we want. Even though the halting problem exists, and is true for arbitrary programs, we know how to prove all sorts of useful security properties over restricted sets of of programs.

Any software security pitch that starts with "when the software starts acting outside of its spec, we have the system ..." is nonsense. In practice, "acting outside its spec" is functionally equivalent to "suffers a security breach".

Ideally, you'd use an operating system that has frequent updates that expand functionality, that is regularly audited for security problems, and that only rarely needs to ship a security patch. OpenBSD comes to mind.

If software has frequent security updates over a long period of time, that implies that the authors of the system will continue to repeat the mistakes that led to the vulnerabilities in the first place.


I think that’s an oversimplification. If you have a Windows system handy, look for a file named “errata.inf” [0]. It’s a giant configuration file that is full of tweaks to make dodgy hardware work reliably.

Hardware, software and firmware are all prone to mistakes, errors and corner cases that are surprising. Security issues generally live in the intersection of systems with different metaphors. Hardware is not immune from issues, and software can help reduce that impedance mismatch.

[0] Found an instance here, no claim to its veracity or safety: https://www.gherush92.com/documents/744E.asp?type=2&file=C%3...


> and that only rarely needs to ship a security patch. OpenBSD comes to mind.

How is that accomplished? Are OpenBSD programmers somehow vastly more competent, that they make security mistakes only 0.1% as often as other OS's?

I find that hard to believe. People are people.

> If software has frequent security updates over a long period of time, that implies that the authors of the system will continue to repeat the mistakes that led to the vulnerabilities in the first place.

Why would that be the case? Authors come and go, systems live on.

Security updates arise from a combination of auditing/testing and competence. 100 times as many security updates can arise simply because one OS is being used and battle-tested 100x more than another.

Nobody's smart enough to write code that "only rarely needs to ship a security patch". Not at the scope of an entire OS with thousands of people contributing to it.


> Are OpenBSD programmers somehow vastly more competent

Put simply, yes. If you read open OpenBSD's website what philosophies and practices drive how the OpenBSD project is run, you'll have an idea.


OpenBSD still has security updates. Software packages often installed on OpenBSD-based systems often issue security updates. OpenBSD has a much smaller footprint than Windows and still has security updates.


You realize that you are personally insulting 100k people you've never met by judging their individual skills and abilities despite knowing nothing about them?

It makes it very hard to put any credence into your opinion when you are so judgemental with no information.


> Are OpenBSD programmers somehow vastly more competent

It's not about competence, it is about priorities.

OpenBSD obsesses about security, so that's what drives the decision-making.

All public companies are driven by profit above all, with the product being just a mechanism to get more profit. As a direct consequence, quality (and security, which is part of quality) is not the top priority. Security is only relevant to the extent its absence reduces profits (which very rarely happens).


CCC has a talk about the effectiveness of OpenBSD decisions.

TL;DW It isn't as much as people think.


Remote update is a nice way of saying remote code execution. It is really really hard to ensure that only the entity that you want to update your system, can update your system, when facing a state-funded adversary. Sometimes that state adversary might even work in concert with your OS vendor.

That's before even addressing mistakes.


"If your adversary is the Mossad, YOU'RE GONNA DIE AND THERE'S NOTHING THAT YOU CAN DO ABOUT IT." [1]

Not patching is insane -- you'll let script kiddies in. Patching might not stop the next Stuxnet author, but you'll slow them down _and_ have fewer script kiddies.

A lot of people seem to be focusing on how the band-aid of automatic security updates can be ugly without considering the hemorrhaging that it's actually stemming. Nobody's stepping up with a realistic solution to the problem, which means we're stuck with the band-aids.

[1] https://www.usenix.org/system/files/1401_08-12_mickens.pdf


Is that really so hard? Isn’t the problem mostly solved by signing your update and verifying the update at the client? As long as you can keep the private key secret, that should be enough, right? Or are we assuming you can’t keep a single key private from your adversary?


Yes, this is really hard.

You could get a Solarwinds type situation where the adversary has the signing keys and ability to publish to the website.

You might also find that the vendor ships a library (like libxz) as a part of their invisible or hidden supply chain, that is able to be compromised.

You might find that one of the people working at the company makes a change to the code to enable remote access by the adversary in a targeted collaboration/attack.

The problem isn't that signing key (although I could delve into the lengths you'd need to go to to keep that secret under these threat models) - the problem is what they sign. A signed end release binary or series of packages isn't going to address the software source code itself having something added, or the dependencies of it being compromised.


Except for the first point, these things aren’t exclusive to remote updates though. I thought we were talking about the challenges of remote updates compared to other methods (like replacing the system or manually updating it with installation media). Supply chain and insiders would be affected that, too.


Frequent security updates are a good thing, frequent security auto-updates are not, at least when it comes to situations like this. Technology that runs 24 hour services such as airports and train stations should not be updated automatically just like that, because all software updates have high potential to break or even brick something. Automation is convenient and does saves money which would have to be paid for additional labor to do manual updates, but in cases like this, it should be understood that it's better not to break the airport and roll-out update manually in stages.


Airport staff need to be able to support them. Not HN types.

Most people know how to use a windows computer.

Most IT desktop support knows how to use and manage windows. Even building facilities folks can help support them.

Microsoft makes it easy to manage a fleet of computers. They also provide first party (along with thousands of 3rd parties) training and certifications for it.

Windows are the de facto Business Machines.

Most signage companies use windows.

Finding someone who knows a BSD is not easy.


Most people don't know how to tell what's going wrong with a windows computer

A windows computer that relies on cloud services, as an increasing and often nonsensical subset of the functionality on one does, can often only be fixed by Microsoft directly

Microsoft intervenes directly and spends billions of dollars annually on anticompetitive tactics to ensure that other options are not considered by businesses

And with this monopoly, it has shielded itself from having to compete on even crucial dimensions like reliability, maintainability, or security


Signage isn't running full-fat Windows. They are using stripped down embedded focused versions.


> Airport staff need to be able to support them.

I know of a very small airport where what is displayed over the HDMI part is essentially Firefox at fullscreen with powersaving disabled so the screen does not blank. Some of them are Intel NUC, some of them are Raspberry Pi with HSM in a box. These devices basically "boot to Firefox" with relevant credentials read off internal TPM/HSM.

Those among airport staff who do not know how to use a computer at all can get them working by just plugging them in.

> Most people know how to use a windows computer.

They know enough to open a browser.

> Most IT desktop support knows how to use and manage windows.

They know how to cope with Windows, at best.

> Finding someone who knows a BSD is not easy.

BSD is everywhere and in far more places than Windows, like almost every car sold after 2014. But you never ever see BSD because it's already-working with nothing for the end customer to do.


> Airport staff need to be able to support them. Not HN types.

Airport staff are not debugging the windows install. They power-cycle it and see what happens, otherwise call the vendor to come in.

So there's no actual reason other than lazyness to build kiosk mode computers on windows.


Airport staff don't maintain infrastructure, at best they maintain front ends to it


You consider signage infra? Same with conference rooms. Most of the places I have worked have facilities type people working on it. Tier 3 is usually a direct phone call away for them

You would send an engineer into an airport to reboot a sign?


Of course not. Thats the point I was making


At some airports, staff does maintain infrastructure.

At others, airline staff is responsible for it. And just like airport staff, a tech who can deal with Firefox on Windows is cheaper than someone who can troubleshoot the same in Linux or a more custom system.


Yup.

Another take to be done here is: computers shouldn't have unfiltered internet access all the time.

Whitelist it and once every 3 days open the internet gates.

(Easier said than done)


I know a BSD. Half of the things you wrote above are wrong.


For many CTO/CISO it is more important to have a good target to shift responsibility when things go awry than to have a reliable/secure system. A Big Brand is a good target, an open-source project like OpenBSD is not. I doubt any CTO will be fired for choosing Widnows+CrowdStrike (instead of Linux/BSD) despite many million losses.

"Nobody ever gets fired for buying IBM" is as true as ever at least in the corporate world.


> I doubt any CTO will be fired for choosing Widnows+CrowdStrike (instead of Linux/BSD)

I was personally involved in a meeting where my firm's leadership advised a client who did fire their CTO and a bunch of other people for what was ultimately putting what they thought were smart career moves over their actual responsibilities.

Unfortunately, as you did just point out, the CEO, other execs, and board are often just as incompetent as the CTO/CISO who have such shit-brained mindset.


Or don't use an OS at all. We need to think about minimizing the use of software in critical infrastructure. If that means less efficiency because you have to be near something to maintain it then so be it. That would be good for jobs anyway.


Even unikernel applications have an OS compiled into the application. It's necessary to initialize the hardware it's running on, including the CPU and GPU and storage.

I suppose you could build it as a UEFI module that relies on the UEFI firmware to initialize the hardware but then you get a text only interface. But then the UEFI is the OS.

But this outage was not an OS problem. It was an application bug that used invalid pointers. If it was a unikernel it still would have crashed.


I've read that it was a driver of some kind, not an application. Applications can't cause BSODs (I hope).


How exactly would a lot of end user systems function without one?


You can run single-purpose software on bare metal, and many OS-agnostic toolkits for things like user interfaces exist


I'd like to pose two questions:

1. How does the software obtain new data at run time? 2. How do you make sure that thing doesn't pose a security hole when a vulnerability gets discovered? (assuming this never happens is unrealistic)


Vulnerabilities in what though? If you make an application so simple that it can only fetch data through an API and display, there's simply not much more that it can do. And a simple application is easy to audit. So it would be ideal if we could bundle this (akin to compiling) and deploy on bare metal.


The answer to both questions is robust organizational infrastructure. To be frank, I think a minimal linux system as a baseline OS serves most use cases better than a bare metal application, but many applications have self-contained update systems and can connect to networks. Self-repairable infrastructure is a necessity, both in terms of tooling and staffing, for any organization for which an outage or a breach could be catastrophic, and the rise of centralized, cloud-reliant infrastructure in these contexts should be seen as a massive and unacceptable risk for those organizations to take on. Organizations being subject to unpatched vulnerabilities and inability to manage their systems competently are direct results of replacing internal competency and purpose-built systems with general-purpose systems maintained and controlled by unaccountable distant tech monopolies


> the rise of centralized, cloud-reliant infrastructure in these contexts should be seen as a massive and unacceptable risk for those organizations to take on

I agree with you but I also want to play the devil's advocate: using software like CrowdStrike is not what I would call being "cloud-reliant". It's simply using highly-privileged software that appears to have the ability to update itself. And that is likely far more common than cloud-reliant setups.


Yea, and use of highly privileged software with the ability to update itself that the organization has no oversight of should be the most suspect. Software is used by nearly every organization for drastically different needs, and I think there will never be adequate security or reliability for any of them if software providers continue to consolidate, generalize, and retain ever more control of their offerings. Personally, I think the solution is local-first software, either open-source or grown within the organizations using them, which necessitates having that capability within orgs. The whole "buy all our infrastructure from some shady vendor" model is a recipe for disaster


Most of these OS need to run a variety of applications from different vendors though.


To pick on your airport example a bit… all of the times I’ve gotten to enjoy a busted in-seat entertainment system, I’ve found myself staring at a stuck Linux boot process. This goes well beyond the OS.


It's typically Android.


To clarify: Android is based on Linux.


To clarify, the Linux kernel, nothing else from GNU/Linux is exposed as official userspace API.


The ones I've seen are definitely not Android. But I don't have any data to argue which OS is most common among in flight entertainment systems.


Those sorts of things just need to boot to a web browser in full screen with some watchdog software in the background, launching from a read only disk (or network image). Get a problem, just unplug it and plug it back in. Make it POE based so you can easily do it automatically, stick them on a couple of distros (maybe even half on bsd, half on linux, half using chrome, half on firefox)


A web browser is an unbelievably complex piece of software. So complex that there are now only two. And also so complex that there are weekly updates because there's so many security holes.


> So complex that there are now only two

There are more than two, and the vast majority of the time people don't need anywhere near the complexity that modern browsers have shoved into them. A lean browser that supported only a bare minimum of features would go a long way to reducing attack surface. As it is now, I already find myself disabling more and more functionality from my browsers (service workers, WebRTC, JS, SVG, webgl, PDF readers, prefetch, mathml, etc)


There are more than 2 browsers, but only 2 rendering engines, which are the complicated part of the browser.


More than two there too. For example: WebKit, Blink, Gecko, LibWeb, Servo, Goanna, Presto, and Libwww or whatever Lynx is using these days.


You’re right, I completely forgot WebKit. I would say there are currently 3 competitive engines, the rest are not very popular.


The first three cover basically 99% of browsers.


Yeah, options exist but it's not a very diverse ecosystem in practice. I'm excited and optimistic about ladybird for that reason. We need more options.


We've seen this week that the world does not want options. It wants a single point of failure in all infrastructure so that nobody is blamed for making the wrong choice.


When accessing a closed departure board display then that isn't a problem


>We now have an entire industry dedicated to trying to layer security onto Windows -- but it still doesn't work.

What makes you think so?

How is Linux better in that area?


I'm sure we've all heard the phrase "We're a Windows shop" in some variation.

I understand the reasons for it, and why large, billion dollar companies try to create some sort of efficiency by centralising on one "vendor", but, then this happens.

I don't know how to fix the problem of following "Industry Trends" when every layer above me in the organisation is telling me not to spend the time (money) to investigate alternative software choices which don't fit into their nice box.


The outage was not because of the OS. It was a kernel driver that attempted to use invalid memory.

The same thing crash could happen with any kernel driver in any operating system.

You've never seen Linux crash because of a driver bug?


Yes, I'm well aware. I wasn't trying to conflate a CrowdStrike problem with a Microsoft problem. Having said that, in this particular incident, the problems were specifically limited to Windows OS.


I read the T&C of this CrowdStroke garbage and they have the usual blurb about not using it in critical industry. Maybe we just charge & arrest the people that put it there and this checkbox-software mess stops real quick.


/set Devil's Advocate mode:

from the reporting so far, no one has died as a result of the Crowdstrike botch. For my money, that sounds like it's not being used in 'critical industry'.

/unset

There were several 911 service outages included in the news yesterday, so I would definitely say agree those fall into the category. I haven't seen how many hospitals were deeply affected; I know there were several reports of facilities that were deferring any elective procedures.


I almost had to defer a procedure for one of my cats because my vet’s systems were all down. This meant they couldn’t process payments, schedule appointments, use their X-ray machine, or dispense prescriptions. (Thankfully, they had the ingenuity to get their diagnostic equipment online through other means, and our prescriptions had already been dispensed so we didn’t have to reschedule.)

I would imagine it’s the same story at human hospitals too that ran afoul of this. I wouldn’t expect life-critical systems to go offline, but there’s many other more mundane systems that also need to function.


The public T&C is for small businesses. Any large business is going to be negotiating very different terms which are not public.


>Really interesting to me that none of the commentators I've seen in the press have even hinted that maybe an OS that requires frequent security patches shouldn't be used for infrastructure in the first place.

Nobody's commenting on that because it's the wrong thing to focus on.

1) This fuckup was on CrowdStrike's Falcon tool (basically a rootkit) bricking Windows due to a bad kernel driver they pushed out without proper hygiene, not on Windows's security patches being bad.

2) Linux also needs to get patches all the time to be secure (remember XZ?) It's not just magically secure by default because of the chubby penguin but is only as secure as it's most vulnerable component, and XZ proved it has a lot of components. I'd be scared if a long period goes by and I see no security patches being pushed to my OS. Modern software is complex and vulnerabilities are everywhere. No OS is ever bug-free and fully bullet proof in order to believe it can be secure without regular patches. Other than TempleOS of course.

The lesson is whichever OS you use, don't surrender your security to a single third party vendor who you now have to trust with the keys of your kingdom as that now becomes your single point of failure. Or if you do be sure you can sue them for the damages.


It's shocking to me how many people on HN are not understanding this concept that Windows had nothing to do with it.

It's just a likely they could crash a Linux machine by releasing an update to their Linux software that also referenced invalid memory.

Am I the only one that's seen drivers in Linux cause a kernel panic?


Because it suits their anti-Windows agenda, M$ and so, while ignoring Crowstrike also botched Linux distributions, and no one noticed, because they weren't being used at this scale.


> XZ proved it has a lot of components

microkernels, microkernels, microkernels! https://en.wikipedia.org/wiki/Tanenbaum%E2%80%93Torvalds_deb...


> Linux gets security patches all the time

1) While CrowdStrike can be run on Linux it is less of a risk to use Linux without it than Windows. I don't think most Linux/BSD boxes would benefit from it. It could be useful for a Linux with remotely accessible software of questionable quality (or a desktop working with untrusted files) but this should not be the case for any critical system.

2) There is a difference between auto-updates (common in Windows world) and updates triggered manually only when it is necessary (and after testing in non-prod environment). Also while Linux is far from being bug-free, remotely exploitable vulnerabilities are rare.


>2) There is a difference between auto-updates (common in Windows world) and updates triggered manually only when it is necessary (and after testing in non-prod environment).

Again, those auto updates that caused this issue were developed and pushed from Crowdstrike not from Windows. That tool does the same auto updates on Linux too. On Windows side you can have sys-admins delay Windows updates until they get tested in non-production instances, but again, this update was not pushed by Windows for sysadmins to be able to do anything about it.


> I don't think most Linux/BSD boxes would benefit from it.

EDR isn't antivirus. It logs and detects more than it prevents, and you need that on Linux as much as Windows. You can do incident response without it if you are shipping your logs somewhere, in the sense that you can do anything without any tool, but it's certainly a lot easier with.

Possibly you need it less than on Windows since it's easier (for now) to do kernel stuff with eBPF, but then somebody has to do the kernel stuff.

Speaking as a professional red teamer, no OS has a ton of RCE, but applications do, Linux applications no less than Windows ones. Applications aside I'd rather be up against Windows in the real world because of Active Directory and SMB and users that click stuff, but Linux running a usual array of Linux server stuff is OK too.


Ubuntu Pro? Specifically designed to push updates without requiring a reboot.


every year multiple times per year there's reports of Microsoft Windows systems having either mass downtime or exploitation.... it's kind of amazing that critical systems would rely on something that causes so much frustration on a regular basis.... I've been running systems under Linux and Unix for decades and never had any down time... so I don't know I mean it's nice to know that Linux is pretty solid and always has been the worst that's ever happened has been like a process that might go down during an upgrade, but never the whole system.


> why aren't those built on Linux or even OpenBSD?

Or even ChromeOS which has insane security.

> but it still doesn't work.

It works momentarily but there will always be 0-days the people who make the exploits intimately know the windows API internals.


> Or even ChromeOS

ChromeOS is a Linux distro BTW


Linux is vulnerable too (but not as vulnerable as windows of course) it’s just not targeted by hackers because it’s market share is so small. That wouldn’t be the case if, say, half of all users ran Linux.


There are far more servers running linux/bsd than there are Windows.


It's market share on servers (a juicy target) is not small at all.


And that sees plenty of attacks too. But here Windows wasn't under attack or a Windows vulnerability exploited, CS just fucked up and companies were stupid enough to put all their trust in CS.


I've never managed linux IT departments--how well are the management tools compared to what Microsoft offers such as tooling for managing thousands of computers across hundreds of offices.



There's no excuse in today's world to not write fantastic unit tests especially with LLMs. Plug for how we enable that here https://github.com/codeintegrity-ai/mutahunter


Layering is absolutely possible, but more at the network layer than the individual computer layer.

Minimal software and OS running on linux as a layer between any windows/whatever and internet connectivity. Minimize and control the exact information that gets to the less hardened and trustworthy/complicated computers


Remember when operating systems only got updates through service packs?

We moved to a more frequent update cycle because when a critical vulnerability was found, no one wanted to wait 6-12 months for the service pack.


I'm sorry but even Linux requires frequent security updates due it's large ecosystem of dependencies. It's more or less required by every cyber security standard to update them just like windows.


On the other hand OpenBSD doesn't require very frequent patching assuming a default install which comes with batteries included. For a web server there's just one relevant patch since April for 7.5: https://www.openbsd.org/errata75.html


OpenBSD is a non-starter for many companies because they don't have LTS and releases are relative frequent.


I agree that all dependencies should be treated as attack surface. For that reason, systems for which dependencies can be more tightly controlled are inherently more secure than ones for which they can't. The monolithic and opaque nature of windows and other proprietary software makes them harder to minimize risk about in this way


That's beyond their level of comprehension.


Security is not a feature that can be layered on.

There's an entire industry for guard-railing LLMs now. Go figure.


In the current economic environment, something doesn't have to be wise or even feasible to have an "industry"


> why aren't those built on Linux or even OpenBSD?

Because in the non-Silicon-Valley world of software, if you pick Linux and it has issues, fingers will get pointed at you. If you pick Windows and it has issues, fingers will get pointed at Microsoft.


This sort of emergent behavior is a feature, not a bug.

Operating systems that don't require frequent security patches aren't profitable.

Anyway, this is the step of late-phase capitalism that comes after enshittification. Ghost in the Shell 2045 calls it "sustainable war". I'd link to an article, but they're all full of spoilers in the first paragraph.

It probably suffices to say that the series refers to it as capitalism in its most elegant form: It is an economic device that can continue to function without any external inputs, and it has some sort of self-regulatory property that means the collateral damage it causes is just below the threshold where society collapses.

In the case of Cloud Strike, the body count is low enough, and plausible deniability is low enough that the government can get away with not jailing anyone.

Instead, the event will increase the money spent on security theater, and probably lead to a new regulatory framework that leads to yet-another layer of mandatory buggy security crapware (which Cloud Strike apparently is).

In turn, that'll lower the margins of anyone that uses computers in the US by something like 0.1%, and that wealth will be transferred into the industry segment responsible for the debacle in the first place. Ideally, the next layer of garbage will have a bigger blast radius, allowing the computer security complex to siphon additional margins.


I don't think CS type endpoint protection is appropriate for a lot of cases where it's used. However:

Consider the reasons people need this endlessly updated layer of garbage, as you put it. The constant evolution of 0-days and ransomware.

I'm a developer, and also a sysadmin. Do you think I love keeping servers up to the latest versions of every package where a security notice shows up, and then patching whatever that breaks in my code? I get paid for it, but I hate it. However, the need to do that is not a result of "late-stage capitalism" or "enshittification" providing me with convenient cover to charge customers for useless updates. It's a necessary response to constantly evolving security threats that percolate through kernels, languages, package managers, until they hit my software and I either update or risk running vulnerable code on my customers' servers.


You're making my point. You're stuck in a local maximum where you're paid a lot of money to repeatedly build stuff on sand. You say you hate it but you have to do it.

That's not strictly true, but it's true in an economic sense:

You could just move your servers to OpenBSD, and choose to write software that runs on top of its default installation. There have been no remotely exploitable zero days in that stack for what, two decades now? You could spend the time you currently use screwing with patches to architect the software that you're writing so that it's also secure, and so that you could sustainably provide more value to whoever is paying you with less effort.

Of course, the result wouldn't never obtain FIPS, PCI, or SOC-2 compliance, so they wouldn't be able to sell it to the military, process credit cards, or transitively sell it to anyone that's paid for SOC-2 compliance.

Therefore, they can either have something that's stable and doesn't involve a raft of zero days, or they can have something that's legally allowed to be deployed in places that need those things. Crucially, they cannot have both at the same time.

Over time, an increasing fraction of our jobs will be doing nothing of value. It'll make sense to outsource those tasks, and the work will mostly go to companies that lobby for more regulatory capture.

Those companies probably aren't colluding as part of some grand conspiracy.

It's also in their best interest to force people to use their stuff. Therefore, as long as everyone acts rationally (and "amateurs" don't screw it up -- which is a theme in the show), the system is sustainable.


> There have been no remotely exploitable zero days in that stack for what, two decades now?

Incredible how easy it was to prove this wrong in less than 5 minutes.

https://www.cvedetails.com/cve/CVE-2023-38408/


A pretty bleak picture and probably a little big exaggerated, but it could be a very good plot for a novel of some kind.


[flagged]


I would like to link to one of the many articles about it (singular), but all the articles (plural) about it ...

Two example pages (both have lots of spoilers; the second is worse about that):

https://www.reddit.com/r/Ghost_in_the_Shell/comments/og7ags/...

https://ghostintheshell.fandom.com/wiki/Sustainable_War


> I've seen photos of BSODs on airport monitors that show flight lists

The kiosk display terminal is not something I care about that much.

> We now have an entire industry dedicated to trying to layer security onto Windows

Too bad we have no such layering in our networks, our internet connections, or in our authentication systems.

Thinking about it another way there's actually no specific system in place to ensure your pilot does not show up drunk. We don't give them breathalyzers before the flight. We absolutely could do this even without significant disruption to current operations.

We have no need to actually do this because we've layered so many other systems on top of your pilot that they all serve as redundant checks on their state of mind and current capabilities to safely conduct the flight. These checks are broader and tend to identify a wider range of issues anyways.

This type of thinking is entirely missing at the computer network and human usability layer.


"What Happened to Digital Resilience?"

Was there ever such a time? If so then tell me when it was.

"The latest chaos wasn’t caused by an adversary, but it provided a road map of American vulnerabilities at a critical moment."

I've no doubt that road maps of American vulnerabilities are currently being planned, roadmaped and stockpiled for future use by those who aren't on the best terms with the US.

In one way I'm amazed at how laxadasical the US and others are towards these threats and that they have not done more to harden the vulnerabilities. On the other hand, it's obvious: cost is one factor but I reckon another bigger one is 'convenience'. Hardening systems against vulnerabilities means making them less convenient/easy to use and people instantly balk against that.

Remember, this happened big-time when Microsoft introduced Windows especially Windows 95. To capture the market Microsoft made everything as easy as possible for nontechnical users—just click on something and it'd happen, things would happen with ease. And all this happened without due consideration to security.

When viruses, vulnerabilities, breaches got out of hand restrictions were introduced which meant users had less freedom to do what they'd gotten used to doing. What Microsoft did was to get the world used to slack operating procedures and efforts reign this in has met with user resistance ever since.

We're now stuck with a major problem that was easily foreseeable even before Microsoft launched Windows 95. Fixing it will be extremely difficult.


> In one way I'm amazed at how laxadasical the US and others are towards these threats and that they have not done more to harden the vulnerabilities. On the other hand, it's obvious: cost is one factor but I reckon another bigger one is 'convenience'. Hardening systems against vulnerabilities means making them less convenient/easy to use and people instantly balk against that.

"Show me the incentives, and I'll show you the outcomes." - Charlie Munger.

We do not incentivize companies to operate secure, redundant, reliable computer systems. We incentivize companies to make the number at the bottom of the spreadsheet beat the expectations some analyst in Lower Manhattan set 90 days prior. And since companies handle the majority of societal work in the United States, that's how most critical systems are designed.

Now, there's a chance that this will play out in court, and that Crowdstrike will have to be bought out to make up for the damages their customers suffered starting on July 19th. However, that will take years, and the outcome could very well be that the plaintiffs will receive symbolic or even no damages. By then, the market will have hedged, captured regulatory authorities, cut its losses, and just altogether moved on. The assets will be purchased in a firesale by people who see this as "creative destruction" and won't care that peoples' lives were put at risk because of this.

And the cycle will continue.


> We do not incentivize companies to operate secure, redundant, reliable computer systems.

Except in the gambling industry. As part of a long-standing tradition, companies in the gambling industry are usually contractually required to take financial responsibility for errors. GTECH's annual report, before they were acquired by an Italian company, says "We paid or incurred liquidated damages with respect to our contracts in an amount equal to 0.61%, 0.18%, 0.50%, 0.47% and 0.14% of our annual revenues in fiscal 2006, 2005, 2004, 2003 and 2002, respectively."[1]

So, forcing a transaction process service to take full responsibility for errors cost, at worst, 0.61% of revenue. This is sufficient to force gambling companies to use unusually good security technologies.

The Nevada Gambling Commission has technical rules.[2]

* "On-line slot systems may only communicate with equipment or programs external to the system through a secure interface. This interface will specifically not allow any external connection to directly access the alterable data of the system." Which means no privileged "security" systems such as Crowdstrike.

* "Gaming device application access to the system based game must be logged automatically on the system component of the game and on a computer or other logging device that resides outside the secure area and is not accessible to the individual(s) accessing the secure area." Which means the really important info must not only be logged, the logs have to be kept where the people who run the systems can't get at them. There are more logging requirements. Most things require two logs, one used for normal operation and a remote backup with tamper resistance and secure hashes.

* "Conditions for changing active software on a conventional gaming device or client station that is part of a system supported or system based game: (a) Be in the idle mode with no errors or tilts, no play and no credits on the machine for at least two (2) minutes; (b) Not be participating in an in-house or inter-casino linked payoff schedule..." There's more, but the general idea is that to change anything, you have to take the component being changed down to the idle, fully backed up state. Only then can changes be applied. All of which are logged.

The gaming industry has faced hostile actors for decades. They have reasonably strong defenses. Yet they're still very profitable.

[1] https://www.sec.gov/Archives/edgar/data/857323/0000950123060...

[2] https://gaming.nv.gov/uploadedFiles/gamingnvgov/content/Home...


Baking in resiliency is expensive. Its not obvious to me that it would be better to deal with that than to deal with issues like this once in a blue moon. Why not let the markets decide? If this ends up costing a bunch of money it will be fixed, if it doesnt it wasnt that big of a deal.


Because there's stuff money can't buy back, and in a lot of cases, that's human life and health. (1)

And do the markets really decide? Do you really think the C-suite of Crowdstrike is going to spend the rest of their lives destitute for the losses they caused? Of course not. We have laws on the books that limit liability of businesses in these situations, and the "let the market decide" crowd are the first people to tell you these laws are a good idea because you can't possibly expect George Kurtz to do business in an environment where his 3 billion dollar fortune could be completely wiped out as the result of a court case, no matter how much damage his company did.

Meanwhile the people who were screwed by this whole thing will be lucky to get a few grand out of a class-action judgment or settlement in five years.

Markets _never_ actually decide. Not in a way that makes peaceful human society possible. You have to introduce systems to give minor players a way to redress grievances, or they'll find their own, often through less-than-sporting means.

(1)


> (1)

Citation needed



...And I can't agree more.

The question is what can be done, if anything. But I've a solution in my wildest dreams as a dictator. :-)


You introduce consequences against the people who have created the system that we're in now.


Even assuming you could narrow this down to a small enough set of people that can credibly be held responsible for creating the system we have now, and assuming you could impose consequences on them without violating their civil rights, and assuming they learnt their lesson and would actively take precautions to avoid their actions leading to such a systemic failure in the future, at best this would only influence those particular actors to avoid the previous failures. The next systemic failure would look quite different on the ground and come from different individuals pursuing different goals who would not have learnt any of the previous lessons. The only people who would see the connection would be more experienced people and/or intellectuals looking from a higher zoom level, but likely would not be empowered to really do anything to stop it given all the direct financial incentives motivating a much larger group of people to direct action.

If our culture had more respect for elders and/or thinkers that could be a start, but even then it would still be an uphill battle in a capitalist society.


"…failure in the future, at best this would only influence those particular actors to avoid the pervious failures. "

Not if laws were like the Monopoly square that has 'Go directly to jail' stamped in bold all over it.

Just a few decent lockups would put the shivers down the backs of those so included.

Trouble is governments have failed to implement the necessary laws. Unfortunately, as we've seen Big Tech is too big, too powerful, and too money-rich to be challenged effectively by governments.


GP said "the system that we're in now", not the specific executive decisions and operational practices employed at Crowdstrike.

I agree with you the latter could be addressed through accountability, but I struggle to see what kind of law would work the way you intend here. In general, regulation helps large corporations and because they have resources to maintain nominal compliance, as well as the layers/lawyers to maintain plausible deniability if things go sideways. Regulation tends to undermine competition which further cements their power and has many negative effects that span well beyond obvious failures due to poor engineering practices.


OK, but I'd argue regulation kept large corporations nominally in check before the greed-is-good mantra along with the belief that the only responsibility a corp has is to its shareholders—ideas that took hold and became prominent in the 1980s (Friedman, Hayek, Chicago School, et al).

Big Tech is now so big and powerful that it essentially does what it does with impunity, fines for breaching laws are just a part of doing business, they have negible effect on the bottom line.

The way of fixing the problem is not only to hold companies who violate the laws responsible but also equally so its employees, external advisers, accountants, etc.

Combine this with reqiring people responsible for certain corporate functions such as those who make policy decisions with respect to the way corporations police laws, check for breaches of the anti-trust/monopoly act etc. to be licensed similarly in the way electricians and plumbers are licensed. Take away their licenses and they'd not be able to carry out their Jobs.

I reckon this will eventually come to pass but I'd venture it'll come to Europe long before the US.


And (as you mentioned in your parent post) send people to prison, at least when the case is egregious enough! Or cause people to lose their ability to be employed. Microscopic fines for companies just aren't working. Judgments where they have to hand their customers a token $9 gift card or give them free credit monitoring aren't working. There needs to be real consequences for wrongdoing.


Yeah, right. I don't want to sound like some socialist demagogue that has it in for big corporations because that's not my position. For many things, vehicle manufacturing, semiconductors, etc., etc. we need large corporations with the ability to scale production, and so on.

The issue is with ethics and being fair and giving everyone a fair go. And for that companies have to behave ethically and within the law. Right, most would say that's just being naïve as that's not how the world works in practice, and I'd agree. And that's why we have laws, they ensure some semblance of balance or order is maintained. The trouble is that in a capitalist society where competition is encouraged that 'balance' can easily be tipped. And it's dead easy for this to happen, especially so these days given there's so much money at stake. To get the edge it's more than enough encouragement for players to start acting underhandedly.

I won't pursue that further because many books have been written about it except to say I don't believe we'll ever achieve an ideal world where everyone acts reasonably and fairly. I'd also suggest that living in a completely ideal world would be intolerable, we'd lose all sense of objectivity. Society needs some degree of things not going right or not working correctly to keep it on edge.

That said, I'm of the opinion that we've gone too far in the dog-eat-dog race to the bottom and that we urgently need a correction. This can't be just left to corporations to correct through self regulation because it won't happen, and more to the point society's view of the role of corporations has changed over the last 40 - 50 years.

When I was growing up decades ago most people perceived that corporations had a dual role which was to benefit both shareholders and society. That view has shifted—or least it has in the corporate world—to that where a corporation's primary or principal raison d'être is to maximize shareholder profits. The evidence is clear, for one, it's why Boeing is in trouble—its accountants now wield the power and these days engineers have precious little say when it comes to the amount spent on safety margins, etc. The consequences of the policy shift are now becoming obvious.

It's up to society to redress this imbalance. If those in corporations have broken existing laws then the Law shouldn't ignore it (as it seems to have done with antitrust laws in recent times). Not only should violating corporations be brought to heel and punished but so too should the perpetrators who drive them (corporations don't magically do things without human direction).

Nevertheless, I think it would be counterproductive to conduct a witch hunt. Instead, we need stronger, less ambiguous laws that restate the rules very clearly. It's just not good enough to assume that most people are both reasonable and ethical because there'll always be those down at the end of the bell curve who'll always push the limits. These people must be told in no uncertain terms what rules are and of the consequences of violating them. As I see it, society (hence governments) have not done enough to ensure this happens. And I'd argue, at least in part, why it hasn't happened is because of the shift in business ethics since the 1980s (for reasons mentioned in my earlier post). Evidence suggests that business practices are now so askew and out of balance it seems we're well overdue for society to correct them.

I recall a story from about four decades ago that emphasizes what's gone wrong with corporate ethics (which was somewhat of a shock to me when I read it). First, let me say that I read this quite a while ago so I may be contorting the facts somewhat. Also, I'm now uncertain where I first read about it but I think it was either from the columnist John Dvorak or Robert Cringely in InfoWorld. (Please don't hold me to that if I'm wrong.)

It concerns the second-sourcing of Intel's 8088 CPU (back then, government required second-sourcing of components to guarantee supply), and one of the second-source suppliers was NEC. Intel and NEC entered a patenting/cross-licensing agreement in the mid 1970s so that NEC could make the 8088. This meant NEC had copies of Intel's masks for production.

NEC thought it could improve on Intel's 8088 design and without obtaining Intel's agreement it took the liberty of launching its own chips, namely the V20 and V30 which were over double the speed of Intel's offering. Needless to say Intel was rather miffed and accused NEC of copyright violation for having reverse-engineered the 8088's microcode and used it in its V-series processors.

The scuttlebutt was that either before the matter went to court or afterwards during the private out-of-court settlement negotiations the conversation between both parties went something to the effect:

Intel: "You reverse-engieered our microcode for your V20 thus violating our copyright, this was outside our cross-licensing agreement ."

NEC: "Prove it."

Intel: "Whilst you rewrote the reverse-engieered code to hide and obfuscate your tracks, there was one small bit where you didn't. As it was a bug that we'd not removed, you did not know how it worked or why it was there so you included it just to ensure things worked property. This gave you away."

NEC: "So what, so now what are you going to do about it?"

Intel: "Sue you for damages."

NEC: "OK, even before production we'd anticipated you'd likely sue us so we had to estimate the all-up costs and whether it would be economically viable to proceed. We did the sums and figured out that in the event of you taking legal action it would take you x years to obtain judgment against us in the US court system and by then we would have not only amortized our development costs, paid you reparations but also we'd have made enough profit from our V-series processors for our actions to have been well worthwhile. We just made a pragmatic decision that made economic sense."

Note: the emphases are mine.

I'm unclear whether my summary has bearing in fact or is an apocryphal account given by columnists who were reporting on the case back then, but the way I've recounted it here is what I took away from those news reports.

What's key about this account is that a large public corporation would actually stoop so low in its business practices and so act in such a dishonest and disingenuous manner but also that it was prepared to get caught and that this was deemed or considered as an acceptable or valid way of making a profit. What's even more telling is that there was no large public out cry when it became known.

My point is we have to accept that the types of people who run companies and set their policies will likely always think like this and that they'd so act if given half a chance (especially so if their bonuses are linked to profits).

That said, at present there are insufficient checks and balances to ensure these people will quickly quash any such ideas the moment they come to mind. The only way I can see this happening is for society to deem such behavior to be so unacceptable that it pushes for laws that are strong enough to both sanction corporations to the extent that shareholders will revolt and that the perpetrators will be punished with actual jail time.

Trouble is I don't see such laws being introduced anytime soon. It's possible they will eventually, but unfortunately I'd venture that won't be until after things get much worse.

https://en.m.wikipedia.org/wiki/NEC_V20.


This is an area where studying Ukraine's experience will be very useful (and probably has already been useful)

There were years of cyberattacks against pretty much every peice of critical infrastructure they have. Things went down, there were disruptions, but they adapted. Sometimes by falling back to low-tech solutions, sometimes by developing new systems with robustness into new systems and purging the old (much easier to politically justify when the problem is tangible and immediate).

I seem to recall that one of the first things we did when tensions started ramping up was sending teams of cyber security experts from the NSA to help them lock down and root out infiltrations.


How nice of the NSA to help them after their exploit was leaked (vulnerability known for many years before that) and weaponized by Russia to attack Ukraine.


> This is an area where studying Ukraine's experience will be very useful

Are they unique in any way? Or is it just yet _another_ case of Windows software being deployed in critical roles and basic 0day vulnerabilities and exploits being applied against it?

If so.. the lesson has been known for decades.

> sending teams of cyber security experts from the NSA

It's nice to know our security agencies have time for games of whack a mole.


| Sometimes by falling back to low-tech solutions

My first thought in all this was wondering if there's a business opportunity for a consulting firm or startup that designs and manages offline paper backup systems that can quickly and seamlessly integrate back with digital systems once they come back online.


The problem is that if you aren't regularly training employees on those manual fallback systems, when you have to suddenly activate them, nobody will know what to do. Even if they have been trained on what to do, the processes will not be second nature. In real use, they will hit situations that the paper forms or training didn't cover, and will have to make up something on the spot, which they will each do differently.

Fully comprehensive, regularly trained manual operations are very expensive to develop and test. Only the most safety-critical organizations will be able to justify and have the resources to effectively implement them. Air-traffic control, hospitals, nuclear plants, etc. And, they already have done it.


An interesting idea. Another commenter pointed out the need to train staff regularly on its use.

Something else that would be a challenge for your idea would be how to handle the orders of magnitude efficiencies in scaling gained from digital technology.

Emergency call centres often (or at least should!!) have these paper or card based backup processes - notes are taken on the card based on the call, addresses written manually, and the card is carried to a dispatch desk (perhaps sitting on a handheld radio if the main system is unavailable), passing information to response teams. Handling of each call requires more people, and gets you a much lower throughput (manually writing addresses, without lookups for spelling correction, reading them with phonetics over the radio to drivers etc).

How many times have you tried to call a business during an incident or disruption and been unable to get through on the phone, because they aren't staffed to a level that can handle any significant % of their customers calling at once? (Often, these companies lack tech company style realtime status pages as well, which could arguably reduce call numbers).

I do think there's some merit in trying to help organisations improve process and procedure resilience, but it doesn't strike me that it will be effective unless normal staffing levels are nearer the levels needed for "crunch" operations (or people are kept "on call" at extra cost to be available).

There are however a lot of good lessons that should be learned from the wider fiasco around technology resilience and systems design, and part of that should include independent (with as close to entirely independent failure modes as possible) redundancy systems.


"…offline paper backup systems that can quickly and seamlessly integrate back with digital systems once they come back online."

It's not offline paper backups that are needed but rather the reverse—offline paper-based systems used as masters!

The heart of any critical infrastructure—specifically the part of the heart that's the most vulnerable is comparatively small compared to the large masses of ancillary data and thus could be managed on a paper-based database (as they once were before computers).

With computers and IT infrastructure as they're currently implemented—not as computer science says they ought to be—a secure filing cabinet/paper-based database is much more secure than an ephemeral one that has no physical or volumetric presence and which takes precious little to shove it from one end of the planet to the other. The caveat is of course, the database must be secured against physical access and located in a secure building, etc.

Let me state why. Comparatively speaking, in recent times there are very few secure bank vaults and such that criminals have breached. The number is so small I can't remember when I'd last heard of a bank vault robbery. Another way of looking at it is to ask yourself when was gold last stolen from Fort Knox or the Bank of England, or $100 bills stolen from the US treasury/mint before their distribution.

Why so few robberies you may well ask. We've had hundreds of years of experience locking up these valuables and although the current systems used to secure them aren't watertight and likely never will be, they're nevertheless sufficiently secure to the extent that the few breaches that do occur from time to time are manageable. With physical security, we've found a workable balance between security and workability.

With the few robberies that occur it's not worth the effort of tightening security further, to do so would not only add considerably to the cost but also physical access would be more difficult thus less convenient to use because of the additional protocols that would have to be put in place to reach the higher security level.

Also, think for a moment that if you could gain access to a secured paper-based database how quickly could you copy it, and how would you copy it? Right, both would be very difficult. On the other hand once an electronic database has been breached megabytes if not gigabytes can be sucked out within seconds.

In practice, the electronic/digital world has nothing as 'bulletproof' as a physically secure system. Given the statistics—the rate of cyber breaches, personal data stolen, Bitcoin thefts, etc., etc. that occur not on a yearly but rather on a daily basis, one simply can't argue that collectively IT/electronic systems are more secure than physical, paper-based ones.

Back to the physical database: a secure paper-based database would always be offline, if some data are needed from it then they have to be extracted manually, then vetted and encrypted before being put on line (that's if it's actually necessary to put highly secure stuff on line at all).

As things stand, owners of information have a choice, store it in an electronic system and take advantage of the operational advantages that it offers or use secure paper-based storage and suffer the inconvenience. One can't have it both ways.

The reason why we've so may data breaches is that the average punter far prefers electronic data systems for their convenience. On evidence, convenience is seen as more important, in practice its value far outweighs data security and integrity.


The "cyber agencies" focus on offence, because that's easy to score points with and appear to be doing something, whereas defence is a very boring job of securing a zillion outdated endpoints. Or trying to get profitable megacorps to do something less vulnerable and less profitable.


Offense is also easy in that there is a ton of software out there, and you just need to find one vulnerability. There is a "win" condition" Defense is impossible as there is a ton of software and you need to protect all of it every time, there is only a "lose" condition.


The "cyber agencies" could focus on pen testing of domestic companies, and issuing fines to insecure ones.


>Was there ever such a time? If so then tell me when it was.

The 90s and into the early 2000s at least. You would get laughed out the room and then fucking fired if you hooked anything critical up to the internet.


"You would get laughed out the room and then fucking fired if you hooked anything critical up to the internet."

Perhaps this happened where you were, and lucky you it seems you were in a good environment.

But back then I was in IT management and I had precious little power to stop it especially given other senior managers were the culprits. The operation had another function and not IT as its primary role. Moreover, I saw very simular problems in other organizations that I was familiar with.

Also, during that period I was with another outfit whose principal function was surveillance—not of people but of info and physical stuff and I can assure you that whilst the system worked well try as we might it wasn't watertight.


This happened everywhere. I worked in a company offering management agents that had additional features if they hooked up to the internet (“cloud management” or “SaaS” before that term existed). Hospitals would never hook stuff to the internet. Industrial control systems, etc were all huge show stoppers.

No offense, but I think you’re thinking of a later era. Most protocols back then had literally no auth or auth that was never deployed. The thought of critical systems safe enough to be exposed to the internet was just really unfathomable.


I agree. Constant internet access and the assumption that other people should be able to push new code to your machine and have it run without you even being aware of it has killed all hope of resiliency.

I miss the days when any application that dared to phone home even just to check for updates was considered spyware. Today there is are huge numbers of people who have access to install and run whatever new code they want on our systems whenever they feel like it. If it's not the AV software, it's the browser, or the video card, or the mouse driver, or windows itself. It's totally unmanageable.


> Was there ever such a time? If so then tell me when it was.

It was a goal for a long time, and I'd say we use to be more resilient pre-cloud SaaS auto-update everything. When every software solution installation is on private networks, with fundamentally different architectures (both machine and topology), along with a wide selection of even very poor quality software, was a lot more resilient than what we have today.

Today a single outage in a single service (say AWS) can grind a large number of companies to a halt. A bad update like this one immediately impacts everyone all at once and has a domino effect. That didn't use to happen.

We've been concentrating our collective architecture into a few best practice tools but that all become single points of failure for not only digital attacks, but misconfigurations, mismanagement, company failures, exhausted underpaid engineers, optimizations, etc.

> Hardening systems against vulnerabilities means making them less convenient/easy to use and people instantly balk against that.

This isn't necessarily true, and I'd argue quite the opposite direction has been happening in the security industry over the past decade or so. People realized that hard security would only cause users to find simple predictable bypasses that would overall _weaken_ the security posture. You just have to look at the evolution of NIST recommendations around passwords to see this happening.

Must change a password every 90 days that can't be the same as your last 10 passwords and complex password requirements? Well users are going to use the minimum size in predictable patterns and just increment a number at the end. Those old password hashes you have to keep around to check if the user is reusing the password? Those are a liability that, when broken, tell the attacker which pattern each user is using. Not the case anymore and there is a lot more usable security rolled that is entirely transparent to end users or almost entirely transparent.

Think about how prevalent and bad captchas used to be on the website and how easy they were to circumvent. Cloudflare's and Google's captcha solution are pretty transparent and has much greater efficacy than the old ones.

Did Microsoft's general and on-going laxness contribute to bad security practices? Absolutely, but that is one ecosystem that had weird other by the nature of how inherently unstable that environment was and is not and hasn't except for maybe a brief peak ever been a core foundation of the internet infrastructure, just enterprise infrastructure unfortunately. They definitely never got the memo about usable or transparent security. I hope they're at least trying behind the scenes now.


"This isn't necessarily true,"

Correct, but on evidence and in practice it's a totally different matter.

Read my other posts here, especially my comment on physical security vs IT security. Unfortunately, the evidence backs my assertions.


>> "What Happened to Digital Resilience?"

> Was there ever such a time? If so then tell me when it was

It seems very plausible that "digital resilience" that this has been buzz phrase repeated often enough in meetings of security-adjacent corporate bureaucrats that some number of people convinced themselves it was a real thing.

And the same divorced-from-specifics approach allows these decision makers to paper over any and all choices that inherently weakened security 'cause the triage needed to partially protect the resulting structurally insecure system can be presented with similar glowing buzz phrases.


What makes you think only a foreign adversary might want illegitimate access to our computers?


In a twisted way, Crowdstrike just gave western civilization a disaster recovery and resilience forced test. an actual attack won't be rolled back within an hour.

In case you don't know, Crowdstrike is hardly the only company with large scale access to this many companies,governments and resources. It takes one rogue employee to deploy a disk wiper that destroys every computer (including linux and macos) and affected systems won't recover at all. it would be months before critical systems are back online, the global economy would come to a halt worse than how it did with COVID in such a scenario.

It isn't "why didn't Crowdstrike do better" (although they should have), it is more, why isn't technology in critical systems more resilient to one vendor screwing up or getting hacked?

For example, let's say it wasn't just a boot loop but a disk wiper erased every boot disk, is there any reason pxe booting a recovery image or a backup image configured already on servers, atms, kiosks, point of sale systems,etc...? even if UEFI and bios were erased, it is technically not impossible to have an auto-recovery mechanism implemented right?

If you have never been in an incident response (IT and security incidents) root cause analysis, I don't blame you for not thinking deeper about the root cause, but that is the type of root cause analysis that has been missing despite over a decade of rampant ransomware, disk wipers, and supply chain risks.

Finding someone to blame and be angry at is easy and doesn't solve the root cause. Making hard technical decisions and not wasting this opportunity (never waste a good crisis) to push for resilient technology investments actually solves the root cause behind this and other repeating problems.


if the firmware is totally nuked, you'd need backup firmware. at some point, all of this crap can be made non-recoverable, but that isn't the real problem to solve.

imma take your comment one step further and say that the emphasis on security is coming at the expense of discussions on resilience. and security matters a lot less, especially financially, than resilience.


availability is one of the core tenets of security. Security = a measurement of confidentiality, integrity and availability.

Backup firmware and boot images can be configured as read only.


This has been an open secret for decades. Just a handful of major OS and browser vendors, constantly shipping patches to their systems and most software having such vast software supply chains that it's effectively impossible to audit anything, let alone truly certify anything as safe, and "security" software just expands the attack surface.

Everyone in the industry knows this.

Interesting to see the NYT just catching up.


> Interesting to see the NYT just catching up.

Maybe it has to do with some major incident that happened yesterday, and the fact they are a news company?


It's the equivalent of not writing about Boeing until the day a 737 MAX crashes right in front of your newpaper offices.


It feels more like writing about Boeing and then writing about Boeing again after the crash, considering the Times has been writing about cyber security and American vulnerability for a while:

https://www.nytimes.com/2021/02/06/technology/cyber-hackers-... https://www.nytimes.com/2018/01/03/business/computer-flaws.h... https://www.nytimes.com/2013/07/14/world/europe/nations-buyi... etc


Readers wouldn't have cared nearly as much.

NYT: Boeing is run by bean-counters and isn't taking engineering seriously anymore.

Boeing: That's not true. Our aircraft fly thousands of times a day, every day, and are very safe.

Who would find that very interesting, absent any relevant, dramatic current events?


Exactly, the problem with for profit media is it requires the attention of it's audience.

Everyone bitches about regulation and taxes, for reasons real and imagined, but applying laws and rules to businesses before something happens is the point of them.


How is that a problem of media?

Isn’t it more of a problem of the population at large?


Any media that does not have the attention of its audience is pretty pointless, no?


> Readers wouldn't have cared nearly as much.

This bears repeating.

If you complain about a risk before a disaster structs, you're fearmongerng.

If you complain about a risk after disaster structs, you're flogging a dead horse.


*strikes


> It's the equivalent of not writing about Boeing until the day a 737 MAX crashes right in front of your newpaper offices.

In order to write about Boeing they'd have to have an angle and resources to go on a fishing hunt to create an interesting story for people to read and talk about.


If you are a non-US company you have to be insane to use this CrowdStrike service. The FBI can legally use a secret warrant[1] and force CrowdStrike to inject a DLL into your infrastructure!

[1] https://en.wikipedia.org/wiki/United_States_Foreign_Intellig...


Are you sure that is correct? I was under the impression that US government could order companies to turn over data, but that they could not compel them to actually do work. This was the center of the dispute between the government and Apple after the San Bernadino shooting: Apple was within their legal rights to refuse to provide assistance. https://en.wikipedia.org/wiki/Apple%E2%80%93FBI_encryption_d...

The lengths that the NSA and CIA would go to to implant backdoors (interdicting shipments of laptops/phones and doing the work themselves) further suggests that they cannot compel this sort of action.


That case was theatre / kayfabe. The FBI was using an emotive case for turning public opinion against encryption and set some legal precedent. The goal wasn't really to unlock the phone which could and was be done by other means.

If they have a path to covertly compel action as a state secret under National Security / anti-terror laws we will only hear about it from whistleblowers. It won't be something the target can disclose let alone test in court.

FWIW I also don't believe in Apple's nobility as resisting on user's behalf. They happily bow to the state and remove apps for e.g. organising protests, monitoring deaths in US wars, csam scanning etc. IMHO their interest in encryption is to prevent jail-breaking and protect their app-store cash cow.

> interdicting shipments of laptops/phones and doing the work themselves

I don't think that proves anything about their powers. Given the option, I'm sure they would prefer to install things themselves without third-party knowledge or consent.

We have evidence of complicit action e.g. black rooms like Room 641A. I think the nature of "consent" and "obligation" gets pretty grey when it comes to the security agencies. They don't get results using court orders. I'm sure they have assets employed as staff in security sensitive positions.


How do you get PCI DSS compliant? That's more important in the real world than paranoia about the FBI.


I guess it isn't actually that hard since CrowdStrike can offer this operating recklessly. /s


You think they can't/don't do that to force Microsoft to push an "update" that does the same thing?


I doubt they'd even need to go through Microsoft


Just told my family yesterday that if we are ever in a real war expect everything to stop working within 8 hours. We will go back to cash and paperwork but it will be painful and slow.


Looking at two countries in an actual long running war, both kept using cashless means, with actually increases in usage:

https://cbr.ru/eng/press/event/?id=18776

https://bank.gov.ua/en/news/all/drugiy-rik-povnomasshtabnoyi...



It's not a war, it's a genocide.

At this point it's not cash that is missing, it's absolutely everything that sustain human life, and there's probably not enough working things to even barter.


This isn’t really all hell breaking loose actual war. If it were Kyiv would have been a ruin years ago.


Throughout this war, 62k Russians are certainly KIA because we know their names and faces [~], and estimates of total Russian KIAs vary from 120k from a Russian outlet [^] to 565k by Ukrainian Armed Forces [_].

In comparison, total KIA losses of Soviets in the Afghanistan war were 14k-26k, and Americans in the Vietnam war lost 58k KIA + 150k WIA throughout 10 years.

In short, this is the biggest war in Europe since WW2. But hey, it's not war enough because not enough Ukrainians are dead or something, idk.

[~] https://t.me/pechalbeda200

[^] https://meduza.io/en/feature/2024/07/05/a-new-estimate-from-...

[_] https://t.me/GeneralStaffZSU/16238


Its really not a full mobilization though. Yes 62k casualties seems like a lot. When Russia is fully mobilized in total war however, the sort of war that NATO planners fear the most, they go through millions of casualties and take over half the European continent in the process.


I don't think that "number of deaths" is a proxy for "infrastructure stops working".

One of the worlds (what we thought) super powers has been trying for the last two years to destroy the infrastructure of a country with 33M inhabitants. They may not have fully mobilised, but they are definitely spending all their military equipment. Long range / tactical missiles. Air assets. Naval assets. Cyber warfare.

The result in Ukraine : unimaginable human suffering, but electricity and the internet are still working over there.

When the nukes start flying, that's another matter though. But in that case our problem will not be that our credit cards stop working.


They are very much not sending all their equipment. They are very much not in a total war economy. The conflict is highly constrained. In an unconstrained war, Kyiv would be leveled already. Ukraine would be plowed over. Western powers have done a lot of work to set up guardrails for this conflict. The modern russian army is 1/30th the size of the red army, for reference on present level of mobilization and what is theoretically capable of being employed should russia actually be fighting a war for survival of the russian state.


You are again talking about mobilisation. Yes. They can enlist millions of untrained men.

But in terms of total assets deployed, they are currently all-in. Attrition being what it is, they are reducing their Soviet stockpiles at a prodigious rate and are currently activating 40’s and 50’s equipment.

This is all extremely well documented by open sources. People are counting tanks on military bases using satellite images. Check out Perun on YouTube. He’s a defense economics expert that posts a 70 minute PowerPoint presentation every Sunday, complete with sources and references.

https://m.youtube.com/channel/UCC3ehuUksTyQ7bbjGntmx3Q


Well yeah its still not going all out even if they are using what they have. Russia going all out produces like 1500 t34 a month, 3000 PPSH a day. Like I have been trying to mention, they aren’t in total war mode. People are still working for domestic companies doing normal work. They aren’t being conscripted to tank factories. If they were it would be a different story that’s for sure.


Where are you getting those production numbers from? This isn’t a video game where every turn the bear player gets 1,500 new tanks — their true capacity depends on complex supply chains, skilled people, and the impact of sanctions on their ability to pay for everything. You’re claiming figures at least order of magnitude higher than they’re reportedly hitting, and it’s really hard to believe that they’re slacking that hard on a core part of a war critical to Putin staying in power.


I’m citing wwii production numbers


The limiting number for a "Russia goes bananas and decides to steamroll into Portugal" event, based on what we've seen in the invasion of Ukraine so far, seems to not be the number of soldiers but the amount of functional equipment.

Russian production numbers sound good if you ignore that most of them refer to stored Cold War era equipment being reactivated. Their main and most successful staple has been artillery and that mostly worked because it could be fired from the safety of being on a side of the border NATO pretty much told Ukraine not to cross - for a time. It also seems like the "saber rattling" Russia did in the lead-up to the invasion by positioning military around the Ukranian border was less of an intimidation tactic and more of a necessary part of the process.

I'm not saying Russia couldn't do a lot more damage in an all-out war into the West even without involving nuclear weapons (which already assumes European countries with nuclear weapons or the US wouldn't use them either). But based on the underwhelming performance of the Russian military relative to its supposed numbers, I don't think Russia could have pulled off the kind of Blitzkrieg you're envisioning, let alone once supply lines become a problem. Especially if you consider that the plan for the invasion of Ukraine clearly was built around a surprise attack on Kyiv, which failed spectacularly because the terrain and weather meant the tanks had to drive slowly in a line and somehow Russia didn't bother providing infantry support.


I'm not suggesting that Russia vs the west would be successful. I'm only suggesting that this conflict is nowhere near as mobilized and driven as what Russia has shown it capable of in the past in WWII. Apparently this is a controversial take given the responses I've been getting.


It's not a controversial take. You just lack understanding of the English language.

OP: Looking at two countries in an actual long running war...

You: This isn’t really all hell breaking loose actual war.

Nobody used the term "all hell breaking loose". You did. You redefined the conversation to be about 100% mobilization (from Russia), and then got all pissy that people called you out.


I hope nothing I wrote could be considered “pissy.” I thought I was articulating my point well enough but I guess a nerve was struck given how many people pounced to say my opinion is wrong, using throwaways to boot. Opinions are opinions, they are neither right or wrong.


Given that Russia is not facing an existential threat comparable to that of Nazi Germany, I don't think there's any way the conflict could be anywhere as driven as that.

Could Russia mobilize more? Yes, absolutely. But based on what we've been seeing they lack the supply lines, resources and frankly military competence (which is unsurprising given how Putin deals with anyone but yes-men) to be able to do anything with that if they had it. Also, as I said, the NATO response to invading, say, Poland would also look extremely different from the current NATO response to invading a non-NATO border state.

The Soviet Union's main advantage during WW2, other than receiving immense support from the Allies (both in supplies and military equipment), was that it was fighting the Nazis on their back foot. The Nazis had made a similar fumble as Russia did in Ukraine by misjudging the climate and seasonal weather and they also didn't have a reliable supply line. The Soviet Union did deal a devastating blow in Stalingrad but the Nazis there were pretty much stranded in hostile territory without support at that point and many soldiers were suffering from frost-related health issues. When the Soviet Union actually invaded Germany the Nazis' troops were already spread out all over Europe and into Africa and losing ground in the various occupied territories.

The Soviet Union suffered massive casualties and contributed more sacrifices to defeating fascism than any other country involved, but militarily its capabilities do not translate into any scenario involving modern Russia NATO analysts would lose sleep over. Russia is "holding back", yes, but so is NATO and especially the US. Even the "modern" equipment the US has started sending to Ukraine is decades behind on what the US military has available. Ukraine's stronger allies have very much been holding out on "the good stuff" and instead mostly cleaned out their dusty reserves. Russia OTOH doesn't seem to have the production capacity nor resources to churn out its newer equipment (which is still years behind what the US etc have access to) and is already falling back on decades old stock and desperately buying ammunition from North Korea of all places.


If Russia is holding back, it seems like a strategic error. Why have they not brought in more conventional weaponry and personnel if it would bring them victory?

I would guess that NATO planners have been adjusting their assessment of what Russia is capable of when completely mobilized. The answer sure looks a lot like "way weaker than we imagined, pretty well ineffective in the face of significant resistance, the only reason to pay any attention them at all is they have ~half the nuclear weapons worldwide."


Simple game theory. If they escalate the conflict there is a potential that western allies would also escalate the conflict in response. Russia tows the line between funding a minor conflict and disruption of their domestic economy in favor of a centrally planned wartime economy. Having an active conflict to engage in is also a benefit in and of itself. The U.S. for example is the most advanced military in the world because they have engaged in more or less a continuous series of conflicts since WWII that allow them a unique opportunity to experiment in tactics and technology that for most other nations remains theoretical and simulated.


So, in addition to the losses figures from before, there was a major mutiny, with a 1mln city (Rostov-on-Don) captured for a day, with 7 units of aircraft downed. And recently the Russian ministry of interior released figures that cases of organized crime have risen by 76% compared to a period before 2022 [_], because that's what happens when you take a bunch of convicts - some of which were convicted for life - and give them all a "get out of jail" card for 6 months of running with a gun.

You are claiming this is actually beneficial for the Russian Federation because all of that is outweighed by experiments in tactics, correct?

[_] https://www.moscowtimes.ru/2024/07/15/mvd-otchitalos-oroste-...


Things can be good for the military establishment and bad for the people. Being able to iterate on military tech in an active conflict is a unique opportunity for military planners and engineers. Just look at the united states military and how much was learned in the last couple decades of war.


Oh my bad, I thought you were being serious for a second.


I am being serious. This is not a 100% war by any measure.


You've just said that there's a mode that the Russian Federation can just get into, that will allow it, a country with 144mln population, to gain an upper hand in a hypothetical war against the EU, a union of 447mln people, while having inferiority in all kinds of technology, from practically non-existent semiconductor manufacturing to inferior metalworking, thanks to which Russian howitzers have 8-10 km smaller effective range than European ones.

The high-speed "Sapsan" trains from Moscow to St. Petersburg are German, not the other way around. The cranes they used to build the Crimea bridge were Dutch, not the other way around. The optics they were putting in their newest tanks was French, not the other way around. But of course, all these guys need to take over Europe is just "mobilization", whatever that means. With multiple times less people and inferior technology. Right.

You are either seriously misguided or just trolling.


I have never said they can beat the west. I only highlight that once upon a time, Russia was producing like 1300 t34s a month. A true centrally planned wartime economy is really something else entirely when its applied to a continental power like Russia or the United States.


Russia can never go into full mobilization modus operandi because these armed forces are busy with their shenanigans abroad in Africa and the Middle East.

While we focus on democratic debates that are based on their spoonfed misinformation campaigns, Wagner is literally conquering central african countries one by one.


The Ukraine war paints largely the opposite picture.

Outages are largely limited to physical infrastructure that’s attacked by missiles. Russia isn’t a slouch in digital warfare, either.


Ukraine depends a lot on American services. Russia is not at war with the US.


That's a good point -- Russia doesn't want to massively escalate against the US with an all-out cyberattack. I've often wondered if total war against Russia or China would show how fragile our internet-connected infrastructure is, with e.g. important people's bank accounts vanishing with no evidence they ever existed.


Funny you mention only "important peoples bank accounts" Because if they just wiped all the poor peoples accounts that would be enough for complete internal revolt


I would hope that the accounts cannot be so thoroughly deleted.

But the point is valid.


Exactly.


Just storm EDR company offices slap guns to devs' heads, push geofenced destruction.


"Leave the world behind"


Israel is doing well after 10 months.

No lack hostile hackers.


They're not fighting a peer power.


thank god that israel has very strong defense and cybersecurity sectors


"Diversity" (but not in the sense of marginalized people)

If more of the critical machines were running different OS's, the damage would be contained.

When we talk about the dangers of "monoculture" it's usually about plants. The same danger applies to computing infrastructure.


We're already there. The fact that we didn't see civilization collapse is evidence that there is a ton of infrastructure not running Windows and Crowdstrike.


This wasn't nearly as bad as it could have been. What if the crash wasn't just a crash but resulted in data corruption? And what if it took longer to stop the rollout and deploy a fixed version? How long would it have taken to recover from this kind of incident? If affected machines didn't fix themselves after several reboots but needed to be actively reimaged?


For a long time after Burroughs was almost ancient history, banks still ran Burroughs machines. They've probably thrown in the sponge by now.

I'm sure IBM mainframes are still running critical stuff, too.


On top of that, I am still struggling to understand how the people in charge of running orgs that run highly critical systems were OK with the idea that a 3rd party software provider could push at anytime patches to the software they provide.

Sorry for being harsh with my following statement, but I believe that the companies affected by Crowdstrike share some responsibility on what happened yesterday.


You're making the mistake of assuming that the people running those companies care about anything other than their job security, and buying in solutions is the best way to have a ready-made scapegoat when things go wrong. The mantra "no-one ever got sacked for buying IBM" still holds, you can just substitute "Oracle", or "Microsoft", or now - apparently - "Crowdstrike".


The are OK with "push at anytime patches to the software" because that's a big part of what they are paying for. Rapid response to threats.


>Ping reply from 127.0.0.1

The threat is inside the building!


- pushing patches is objectively a good idea, rapid response to threats and all.

- Whats bad is instant global 0->1 rollout, instead of more gradual, blue/green/canary however you call it. With gradual rollout policy this whole thing could have been caught at their first couple guinea pig customers, and not the whole world


You don't understand the word objective. It is beyond arrogant to think that controlling when a customer's day gets ruined is your prerogative. Let them make that decision.


It's not harsh. The tide went out and it turns out a lot of people were swimming naked.


I think I agree with you. On the other hand, I can also imagine that if autoupdates weren't the case, then 90% of installations would be a terribly outdated and probably vulnerable version. It's hard to imagine a common sense middle ground.


One could make the argument that automatically patched software is, in aggregate, more secure/less problematic than chronically under-patched software that requires manual, human attention.


One could, but in the old days when vulnerabilities happened, they didn't hit everyone at once.

And if it hit your system, the vendor's first response would be "are you on the latest update? that's been fixed."

(Unless the latest update IS the problem. In that case, being lazy was a good defense.)


They share the whole of the responsibility of it. "my antivirus was updating" is not an acceptable excuse for a service to be down.


As I understand it, customers do have control, but in this instance CrowdStrike overrode the settings of the customers.


Surprisingly, the mantra "if it works, don't touch it" doesn't really work so great.


They chose a major vendor and it checks off a compliance requirement.


> If more of the critical machines were running different OS's, the damage would be contained.

Not if they were running the same CrowdStrike.


given it's a kernel module (AFAIK), how could that be if it were different OS's?


Regardless, much code would be shared. Likely including the offending null pointer access of this case.


> Regaress

"regardless" ?

So you don't actually know, is what you're saying.


Thanks. Typo corrected.


Not necessarily. CrowdStrike isn't even the #1 player in this space, but this still happened because of network effects. The number of platforms you'd need for this much safety is impractically high.


I'm not saying you're wrong, but:

"Network effects"? You mean like, "I'd be fine, but I depend on a service from a Windows machine, so I'm still screwed" ?

> The number of platforms you'd need for this much safety is impractically high

I don't see why this becomes an impossible problem. If all the essential services are not provided by a single software infrastructure, then we have the required diversity, right?


> "Network effects"? You mean like...

In the case of airports, losing ATC at just a few major US airports would effectively paralyze the network. Or yes, the case you mentioned where you depend on four SaaS offerings, and odds are one of them will go down.


Like I said: I'm not arguing with you.

It's ironic that the original DARPA justification for packet-switching was, if a nuclear war takes down some nodes, the packets will still get through somehow.


Computers are not people. No need to be afraid to discriminate.

Windows is shit.

Mac is more or less.

Linux is best of all.


You do realize that CrowdStrike also runs on Linux and that there have been a variety of instances of bad CrowdStrike updates breaking Linux machines, right?

https://access.redhat.com/solutions/7068083


Massive computer outage, worldwide affecting enterprises with Windows machines running CrowdStrike, a very popular software that is sold as hacking protection but which is, in reality, used by C-suite execs to spy on employee behavior. It is installed with extraordinary permissions and is difficult to fix or remove by design.

I wonder if this will teach absolutely anyone a lesson about anything.


I think it will. MS has published a number that it was 8.5 million machines, which I don't believe, bur seeing the effort that's gone into the response even at my own relatively mid sized org, there are super simple questions like how the heck do we even get to these devices when hald the crew work remote.

The response is and always will be - how much will this cost. We now have the opposing figure, how much will this cost if we don't do it.


Do you believe it was more or less than 8.5 million machines?


I suspect more.


I'm sure it was more, MS's crash statistics come from a long pipeline of WER reports. I know for a fact that some organizations disable WER or even blackhole it along with other diagnostics.


Wasn’t the 8.5 million an estimation? I thought microsoft took the telemetry they got and then adjusted it based on the estimated ratio of machines where telemetry is disabled.


Can we please get more information about the spying features? Some screenshots would be great! Thanks!


Not as evil as they make it sound. Process Execution, detailed timestamps, and network metadata capture are core features of every EDR tool (CrowdStrike, MDE, SentinelOne, etc) that exists. They can just be abused to monitor user behavior, in addition to threat hunting or malicious activity. Telemetry isn't inherently evil, but organizations need to establish privacy and usage governance around security tools to prohibit abuse.


> It is not hopeless.

> “We are optimistic that A.I. is actually allowing us to make significant — not transformative yet, but significant — progress in being able to identify vulnerabilities, patch holes, improve the quality of coding,” Kent Walker, the president for global affairs at Google, said at the Aspen forum.

I disagree. If the only hope is some vague promise of bs AI, there is no hope indeed.


Yep. This is like arguing that the best way to prevent school shootings is to give teachers guns.

I don't think AI would have done a better job convincing CrowdStrike's leadership team that a staged rollout strategy is worth the cost.

These issues are caused by people, not technology, so more technology isn't going to fix it.


There are some point where you should redefine what it mean to be an adversary. To be practically forced into a position that lead to this level of harm, by actors that you don't want to perceive, is something that you may want to analyze.

The purpose of a system is what it actually do, not what it claims to do but fails every time at that. Turning everything to vulnerable as fragile with some big strategic and global plan ahead makes you into a disposable asset, a sacrificial victim in some higher level chess game. And you can agree with that with your decisions.


Here's an interesting exercise: what's the minimum quantity of explosives that would lead to 1% drop in western GDP? would doubling it lead to 2% or 4%? is the relationship linear?

I don't have an answer, but thinking about it makes one understand how incredible fragile our complex logistic chains (and indeed our economy) are. One day all this complexity will collapse upon itself and we'll wonder what happened.


Bomb in TSMC clean room. Almost any size. That takes out the AI market. 1% of GDP gone. However, it's less than linear; not many targets of such critical importance.

The ability to get the bomb in the right place is far more important than the quantity of explosives, as was demonstrated by the recent suicide sniper missing.

The IRA https://en.wikipedia.org/wiki/1993_Bishopsgate_bombing was estimated to cause more economic damage than all other IRA bombing put together. It's interesting that (apart from the first WTC bombing) American terrorists have stuck strictly to guns and not attempted car bombs.


> It's interesting that (apart from the first WTC bombing) American terrorists have stuck strictly to guns and not attempted car bombs.

That is not true. Oklahoma City bombing is the first which comes to mind where the explosives was planted on a truck. But there are many others, there is a whole wikipedia list about them: https://en.m.wikipedia.org/wiki/Category:Car_and_truck_bombi...


https://en.m.wikipedia.org/wiki/Category:Deaths_by_car_bomb_...

16 is not "many", and quite a lot of those are decades old. Whereas https://en.wikipedia.org/wiki/List_of_school_shootings_in_th... is just an endless scroll. It's a form of terrorism that America accepts as routine as car accidents.


“stuck strictly” implies exlusivity. That and the “apart from the first WTC bombing” implies that the first WTC bombing was the only vehicular explosive in the USA.

I agree terrorist in the USA use guns a lot more than vehicle-born explosives. If that is what you would have written i would have not commented anything.


If you could get 4 people 81mm mortars (and some training) it's highly likely you could shut down 10% of us gas refining by attacking just 4 facilities along the TX/LA coast. It's very possible you could also do this with drones and avoid getting caught for some time, though your payloads may be a bit lighter. Refineries are large, but typically weak targets with critical areas. This has been something that Ukraine has been exploiting against Russia.


Probably not a lot. Blowing up a ship in the middle of the Panama or Suez canal might do it, especially if you wreck it badly enough to block the canal for months. Even easier if you target a big oil tanker.

I don't think this is linear though. It's easy to target a weak point to inflict a small amount of damage, but hitting say 10% of GDP would mean targeting multiple sectors of the economy and putting millions of people out of productive work.


How long before our evident incompetence as a profession comes back to bite us in the form of more draconian regulation about who and what is allowed to run in kernel space, or other privileged contexts, on critical infrastructure?


CrowdStrike's widespread deployment is encouraged by regulation.


Is that a "bite"? I have wished for this for a long time.


Robert C. Martin has been talking about this same topic for years.

He believes that just like in the medical field, the software industry must self-organise before government start imposing draconian measures about how software should be developed.


Whenever that time comes it would be at least 50 years too late.

But I hope to see it in my lifetime.


The problem I think is that it would just take the form of regulatory capture. A few companies would be blessed, and the rest of us locked out. And we'd still have screwups like yesterday, but this time with Government Approval.

Already it's amazing how the media is presenting this like it's a natural disaster, instead of an entirely preventable display of incompetence... A business entity whose shares only dropped 10% after causing untold billions of damage to the economy.

Gives us all a bad name.


> Already it's amazing how the media is presenting this like it's a natural disaster, instead of an entirely preventable display of incompetence

Amazing? Super predictable I'd say.

> Gives us all a bad name.

That is sadly true.


> The problem I think is that it would just take the form of regulatory capture. A few companies would be blessed, and the rest of us locked out. And we'd still have screwups like yesterday, but this time with Government Approval.

Yeah agreed. It would require no corruption... which is the true fantasy trope of our times.


I usually agree that we are heading towards regulation (software engineering is already a regulated title where I live) but in this case, crowdstrike had such a blast radius exactly because of regulation.


What security software runs in user space? Even on the Linux side I struggle to name any except snort or any of the open source root kit scanners. How would you enforce security policies in user space?


That entire thing was caused by stupid draconian regulation.

Systems like CrowdStrike are mandated, not hired freely.


1. It’s a good time to reread the article that got Dan Geer famous on “monocultures” => https://ccianet.org/wp-content/uploads/2003/09/cyberinsecuri...

2. Also a great time to start prepping for AI Incidents => https://thedataexchange.media/ai-incident-response/


There is no "Digital Resilience" because that is perceived as too expensive, a cost center with hard to quantify value. So it's easier to try and carve out everything that doesn't fit into a spreadsheet, everything that isn't core business, and everything that is not able to present what value it generated.

If general IT had the abilities of sales, marketing, or insurance, there might be a chance that the business would take the responsibility to have the internal knowledge and capabilities to assert control over their systems. But they don't, and as such they won't and instead shove that responsibility over to a third party generalist elsewhere with enough paperwork to have both parties feel their asses are covered.

As long as everything seems to be working, the signals that are still getting through is project failures, be it complete failures or just time and/or money being consumed more than planned and maybe some requirements getting cut. But as soon as enough stuff breaks at the same time, we get news outlets writing articles about resilience and the greater public suddenly no longer agreeing with that is effectively just the result of the status quo because it impacts them directly.


externalizing a threat, from a national news source.. Thought experiment -- a healthy society has plural viewpoints, and plural economic strengths. What if a core and entitled group of groups imposed their "security" on a plural society, for their own profit at the expense of the majority? What if their security is monoculture and internally inconsistent, without the ability to admit error ? What if there is a reflex to blame external groups specifically to divert attention from an internal and unbalanced chain of actions, controls and monetary flows?

What is the response of a Free Press to news stories exercising reflexive blame-game from allied core groups with major monetary interests in the outcomes?


> externalizing a threat

Yes, it's illustrative of the USA. Due to monopolies, lack of local control of infrastructure etc., a feature is rolled out that grinds hospitals, airports etc. to a halt. Surely due to forces we're surely familiar with - a rush to get profit-making features out, a neglect of correctness and stability, cost cutting etc.

Then we have the New York Times, considered the sober voice of the establishment. What is discussed? Reflection on how entirely US-internal corporate processes led to this? No. A thought experiment about what if some external actor, perhaps one tired of US imperialism or something, had performed this.

I read this after seeing Hulk Hogan rip his shirt off at the RNC in an Idiocracy prophecy manifested, while the other presidential candidate immersed in the dementia of the gerentocracy clings to power amidst his cohorts pleading he step aside.

As I watch the US arming the Ukraine to fight Russia, I think back to 1986 and Gore Vidal's plea for an alliance with Russia lest Americans become either farmers or just entertainment for the more efficient Asians. Another prophecy which seems due to come to pass.


If CrowdStrike's system wasn't able to prevent a kernel driver thats all zeros from getting by, you can be sure a malicious payload would have breezed right through.


It wasn’t a driver.


Oh yeah, at a quick glance looks like that file could have had any payload and it would have been loaded right into the kernel.


To send a malicious payload into the kernel, you would have to take over crowdstrikes deploy infrastructure first, right? I hope the program doesn’t just accept updates from anywhere


What makes you think so?


Earlier there were some screenshots showing an entirely zero-filled .sys file but now we know that's not what the payload was.


The fire more deadly than enemy fire is friendly fire. For adversaries, they cannot do any harm unless they get in, even if they get in, the damage is limited to the access of the account they run on. But for AVs, they are invited in, which renders the 1st line of defense useless. Making it worse, they are running with SYSTEM privileges, which is higher than Admin privileges. And we just witnessed what could happen if AVs went rogue.


The only vulnerability here was CrowdStrike's EDR product that runs exclusively in ring 0 and the entire corporate & technical class that lazily relied on this flawed security model and centalized this incompetence.

As much as some people want to believe that Microsoft is blameless here, I hold them partly responsible. They need to create a stable API in their kernel and force third party security vendors to use it.


I haven't worked in a Windows environment for a long time so was a little surprised how much of the online commentary suggests people in that environment are comfortable or at least resigned to the necessity of unattended live third party updates on critical infrastructure. I can't see any justification for that on the *nix side of things and hope that culture never transfers over.


I have worked at a few Windows places and they did not allow live updates. All updates were managed by IT and tested before being pushed out.

This situation was caused basically by an anti virus definition update.

Without any information, my guess is that CrowdStrike specifically doesn't provide a means for enterprises to manage the CrowdStrike updates because that would cause potentially weeks or months of delays for critical anti virus updates to be released.

I'm more curious why they need CrowdStrike on every system in the first place. I can understand employee computers but servers and critical systems should have other security measures in place to make them less available to attack in the first place.


> I'm more curious why they need CrowdStrike on every system in the first place. I can understand employee computers but servers and critical systems should have other security measures in place to make them less available to attack in the first place.

The boring answer(s) is compliance and necessity. You need a security solution installed & running for a bunch of certifications, and maybe you also need it to secure some unmaintained software you have to run. Plus, even if the machine isn't front-facing, a lot of these EDR solutions are installed to prevent or monitor for lateral movement.


For the non-tech folks, this probably felt like one step away from an attack from an adversary.

I have a different take. This was still far from being an adversarial attack. There was no security breach. The failed configuration came from an SDLC that remained secure and fully in control of CrowdStrike. It was a terrible bug, but it was not an attack


I would not call it a bug. I would call it a severe process or systemic failure. Their SDLC clearly did not include any sort of phased rollout or canary deployments. Bugs are inevitable, what matters is being able to catch them before you push them to every end user on the planet.


If CrowdStrike's system wasn't able to prevent a kernel driver thats all zeros from getting by, you can be sure a malicious payload would have breezed right through.

There was no validation, phased roll-outs, almost certain no multi-person verification. I'd bet dollars to donuts this was pushed out by a low/mid-level functionary that could be carried out by dozens if not hundreds of employees. There may have not been a security breach, but it was still one minor security breach, distracted open laptop in a cafe, or disgruntled/paid-off inside actor away from absolute armageddon.

It wasn't an attack, but it was a raccoon who came in through an unlocked screen door in the back of Fort Knox.

If someone had used this to deliver a ransomeware package, they'd be buying a mega-yacht right now.


It’s not a driver, but a configuration file.


Sources I've seen was that there was a .SYS file with all zeros that caused the BSOD. A configuration file shouldn't cause a bluescreen.

EDIT: It is in the 'drivers' directory, has a .SYS extension, but was something called a "channel file" but I couldn't get much info on what a channel file does other than "something something named pipes"


It's a ".sys" file but it's not a driver binary at all. It's a binary configuration file, and from what I gather it's a sort-of packed table. The actual kernel driver mounts it, parses the contents, and uses it as configuration. The ".sys" extension is probably for the believability of being a driver so users would leave it alone.


Why does IT even pay $$$ for crowdstrike? Time to uninstall it and figure something else out. Just use linux or chromeOS.


People downvote you, but in the context you are mostly right. In case of airlines there is no reason to use Windows there, checking software is web-based and ChromeOS is a perfect fit there. Same goes for banks, bank tellers mostly use web browsers to access banking applications.


Probably has to do of IT wanting to keep using windows to justify its own existence ;).

Web + chrome is so much better. Then just use qnx or something for embedded. Why is the actual reason that our $600k confocal microscope has to run windows?

Qnx or Unix is much better for scientific and healthcare equipment.


How do you detect if your ChromeOS gets breached? Linux apps runs in sandboxes so even user level HIDs won't function.


Mission critical use of Linux still needs malware and breach detection software. It's not as simple as switching OS's.


As if Linux doesn't have malware or security breach


It's like these people didn't notice the raft of cves on hardened VPN and firewall devices lately. Cisco iOS regularly has cves. Android and iOS both have critical rces. And remember Mirai?


Can someone from one of the major services comment on why they don’t run the N -1 policy on Falcon? My onboarding sales engineer recommended this to me years ago to avoid this situation. Why do critical infrastructure companies run bleeding edge updates like this?


So yea let’s not use a company like this as best practice. Everything about this reeks of worst practices rising the wave of regulatory capture.


the problem is that for a security scanner to scan threats properly, they need to sit on the kernel, there should be a mode where they allow scanners to read but is not able to crash the system. Some sort of sand box for all these kernel access


> Some sort of sand box for all these kernel access

Yeah, like eBPF in Linux or System Extensions in macOS.


Kent Walker's betting the farm on AI spotting future f*ckups? One born every minute!


Kent Walker of…Alphabet?


The same!

“We are optimistic that A.I. is actually allowing us to make significant — not transformative yet, but significant — progress in being able to identify vulnerabilities, patch holes, improve the quality of coding,” Kent Walker, the president for global affairs at Google, said at the Aspen forum.


Really, the problem is that all this critical infrastructure runs on Windows. Critical systems should effectively be appliances that run with a very minimal footprint. If you absolutely need to monitor them you can export disk snapshots or something out of band that can't impact operations.


On the one hand - you can read this as a PSA for the apathetic and/or clueless 99.9%.

On the other hand - it's d*mn hard to imagine that any of America's "A List" or "B List" adversaries didn't have a far-more-detailed road map, years ago.


I'm sure there's a few adversaries who could pull something like this off, and have 0-days ready. But if they use them, the US could see that as a hostile action and get very upset about it.


they can just bribe a company to do it :D


Does the last part of your comment imply that USA should just give up and accept all its adversaries already have backdoors and nothing can be done about it?


> accept all its adversaries already have backdoors

This is actually a really useful hypothetical standpoint to work out security from.

Designing systems that start from the assumption of insecurity helps us build more robust protocols and management. Qubes OS starts from the position that all VMs are or soon will be compromised. Zero-trust in network design assumes the bad guys already have the whole network. Plenty out there would like to shrug and say "the endpoints are all rotten too" (especially with phones which are a veritable hell to secure) and move trust into the application via trusted execution methods.

> and nothing can be done about it?

No, That doesn't follow. It's prudent to be realistic about threats. but there's always a way out, at a cost. The cost, in a complexity crisis, is throwing away a lot of what we've done.


> Qubes OS starts from the position that all VMs are or soon will be compromised. Zero-trust in network design assumes the bad guys already have the whole network.

So what does Qubes OS do to protect against a hypervisor bug? Those must exist.

How do you ensure that your systems are still working and retrieving data from databases, etc, when bad guys have the whole network and can block all communications?


The answer to both those questions is you can't. So you you either need to make other provisions at different levels of the stack or design your architecture to make them irrelevant to your security model.


Yeah I am completely with you and I agree.

But it seems that reducing costs is more important even compared to preventing people on life support in hospitals from dying.

What a world.


No, and I've no idea where you got that from. Here's the HN Title:

"CrowdStrike debacle provides road map of American vulnerabilities to adversaries"

My assertion: America's serious-threat adversaries already had far more detailed road maps, years ago. The intel value of whatever "road map" data they got from the CrowdStrike debacle was pretty marginal.

Neither the HN Title nor I said anything about backdoors. And within 2 para's, the NYT story makes it clear that CloudStrike's Big Oopsied had nothing to do with bad guys hacking anything.


Did any large company clean their datacenters after they stopped using Solar Winds?


Crowdstrike has really redefined malicious compliance


Or it’s just a front for the nsa and cia.


Yeah, didn't the US just ban Kaspersky, over fears that Kaspersky could cause such an outage (among other fears)?

Turns out our homegrown CrowdStrike was just as bad as our fears over Kaspersky were. Perhaps worse.


The Kaspersky issue could have been better handled by simply requiring divestment or by having requiring an US-appointed auditor to investigate produce reports to assuage such concerns; as was proposed in the case of Tiktok.


> over fears that Kaspersky could cause such an outage (among other fears)?

Citation? I thought it was the "other fears"; this is the first time I'm hearing accidental outages were one of the concerns.


Not GP, but the decision and reasoning is at https://public-inspection.federalregister.gov/2024-13532.pdf (I am not claiming any specific “other fears”, just linking to the source)


Thanks. Yeah, i, ii, and iii all talk about malicious events, not accidental.


Yes but any accidental outages from an entity like Kaspersky would have been considered non accidental regardless of the actual root cause. If crowdstrike was Russian, the headlines would be a bit more suspicious about yesterday's event. or if they had brought down Russian infrastructure Russia would have probably been suspicious about American involvement, even if it's just accidental.


I suspect that regardless of which country CrowdStrike is from, the question would still arise: "should we really outsource information security protection of our critical infrastructure to country X?"

Naturally, the question of malicious intent would most likely be more or less prevalent depending on whether country X is considered an adversary.


Umm, they (adversaries) already knew? Been in cybersecurity for 18 yrs. We told customers about issues like this all the time


> It was, by all appearances, purely human error — a few bad keystrokes that demonstrated the fragility of a vast set of interconnected networks in which one mistake can cause a cascade of unintended consequences.

Cute. It's always those bad keystrokes. If only these crowdstrike employees worked on their good keystrokes that morning. I blame management.

> Russian hackers working on behalf of Vladimir V. Putin bring down hospital systems across the United States. In others, China’s military hackers trigger chaos, shutting down water systems and electric grids to distract Americans from an invasion of Taiwan. ... Among Washington’s cyberwarriors, the first reaction on Friday morning was relief that this wasn’t a nation-state attack. For two years now, the White House, the Pentagon and the nation’s cyberdefenders have been trying to come to terms with “Volt Typhoon,” a particularly elusive form of malware that China has put into American critical infrastructure.

So we have cyberwarriors and cyberdefenders? And the russians, china, etc have 'hackers'. If ever there was a doubt what the nytimes really is.

> The fear is, in an election year, that the next digital meltdown may have a deeper political purpose.

Oh dear. More bad keystrokes on the way?

Did anyone glean anything of value from the article? There was a lot of words but no substance.


This piece was written by someone covering national security and the Biden administration for the NYT. It’s a global issue exposing vulnerabilities across the board. It’s journalism like this that’s the real vuln. Word.


Agreed. They would have never written this if they remembered to don their tinfoil headgear first.


Huh?


Expect more depth and less bias from the media you consume.


That doesn't seem related to your first comment?


Besides ticking off a few boxes, there’s not much substance to the piece. It’s framed as a domestic issue, not a roadmap of vulnerabilities on a global scale. If I’m reading the NYT, I expect more effort.


Wouldn't any memory-safe language help prevent this NULL pointer access? Why are all these crucial pieces still written in C/C++, when it's obvious to anybody keeping even remote track of CVEs that these languages are just not up to the task with today's climate of a 24/7 shadow internet war? (The one that's likely been going on for at least 25 years at this point?)

When will we learn?

You hate Rust -- fine (not fine but OK, I guess people get super triggered over it and it's a reality I can't change but I am still baffled by it because they throw away reason for emotions and these people should really know better). Fine. Just use Golang or any other GC language really (Java or C# as well, if you must).

When will we abandon convenient routine and start adapting to modern realities? ("Modern" being at least 25-year old here but hey, I am willing to give you some leeway and not roast you too much. Let's assume these are "modern" realities, f.ex. just the last 5 years.)


We're all waiting for your anti-malware Rust Win32 kernel module...

Ok, but seriously I don't believe this will ever happen and I don't really think this is a language debate nor do I want to engage in one.

This is about putting critical infrastructure connected to the internet that's running an operating system that you can't trust out of the box. Since the Windows OS is susceptible to so much malware you need all these third party services (which you also can't trust or audit, but it's absolutely better than not having anything) on top of the OS.

There was a whole host of companies that had zero problems, not because they're using Rust, but because they have much better security practices and quality infosec employees.


> This is about putting critical infrastructure connected to the internet that's running an operating system that you can't trust out of the box. Since the Windows OS is susceptible to so much malware you need all these third party services (which you also can't trust or audit, but it's absolutely better than not having anything) on top of the OS.

Agreed, they should not be using Windows in the first place. That should have been the first line of defense.

> There was a whole host of companies that had zero problems, not because they're using Rust, but because they have much better security practices and quality infosec employees.

Fair enough, I only commented on one layer of the security stack -- so your remark that expands the scope is valid and welcome.

> We're all waiting for your anti-malware Rust Win32 kernel module...

I am done working for free. If I am paid to do it I am sure I would have done better than this poor confused soul who allows NULL pointer dereferencing which is a mistake that most C/C++ interns quickly learn to avoid.


>Agreed, they should not be using Windows in the first place

Crowdstrike borked RHEL 1 month ago https://access.redhat.com/solutions/7068083 Literally the same situation, unbootable machines.

The reality is that shitty software broke everything. Why do we have to drag the OS into this?


Dunno, I guess I naively thought the quality of Linux drivers is higher but on the other hand, if the same confused randos are writing them then you're right that it would not make a difference.


I didn't know that. So that makes this two strikes?


My understanding is this was not a case of null pointer access that could be caught by a compiler really... but of a corrupt data file making a mess all over the place... running in kernel space, where no segfault is safe.

The root issue is giving privileged access to a business entity you think you can trust, but clearly can't.

I'm a fulltime Rust developer, but I don't think Rust saves you here.


Wouldn't a strongly typed language like rust sill catch a bad datafile?

E.g. loading it would require you to setup a maximal size and a valid configuration struct?


It could.

We haven't seen the code but it could be something like:

  char *ptr = parsefile(file_we_released_without_testing);
  if(ptr[0]=='A') { } // BSOD loop
parsefile returns NULL unexpectedly.

So this style of error can be addressed by using a safe language. Or static analysis. Or code reviews. Or not doing this stuff in the kernel. Or formal methods. Or fuzzing.

As someone else said you likely can't easily use Rust for Windows kernel modules/drivers. I'm sure a strong enough engineering team could do it (e.g. transpile Rust to C) but I'm not sure it's the biggest engineering problem CrowdStrike has. Microsoft has a complete tool-chain for developing these and it's usually C/C++ or assembly.


unhandled null in rust will still cause panic. still cause the bootloop.


I'm not a Rust expert but wouldn't you pick some ("null-safe") type that can't be null in Rust? A reference?


i dont think it matters, if you have any exception in the critical boot part, you will end up with this. Rust cannot fix this. Microkernels might.


Something like this Go snippet:

  func parsefile(string) string {
  }

  func thatfunctionthatcrashedinC() {
      defer func() {
      if err := recover(); err != nil {
        log.Println("panic occurred:", err)
      }
    }()
    result := parsefile(badfilethatcrashesC);
    if result[0] == 'A' {
    }
  }
so... using a type that can't be nil. recovering from runtime panics (you have to do that but this can be enforced by standards and also it can happen up the stack for all code, e.g. like http handlers do by default in the Go standard library). More importantly these errors are not segfaults in Go, i.e. there's "exceptions" you can and should catch and there are exceptions you can't.


You have all that in C++ too. Exceptions are near zero cost and used everywhere, sometimes even in embedded stuff too.


Sure. I speak C++ ;) You can do this in C++ but I think it's generally more crash prone than Go. Based on personal experience of ~20 years of C++ and ~10 of Go I've debugged many a core dump in C++ and I think zero in Go. You can restrict yourself to the somewhat safer parts of C++ for sure.


If what you say is true -- OK. Then would you say that the tweet posted earlier that showed the NULL pointer access was incorrect or misleading?


Null pointer access caused by bad data is entirely conceivable... esp when you overwrite parts of kernel memory with nulls.


I see. That doesn't make it better though, and OK let's forget about other languages.

I mean that the least you can do before pointer dereference is just check for several bad sentinel values, NULL being one of them.

Seems like a rather amateur mistake to me.


From that point of view, there are several million amateurs, some of them quite highly paid, out there writing terrible code.


Well, that statement is 100% true.

I've been an idiot as well in the past. Happily some of us actually learn though!


A safer language like RUST won't help you against bad practices and poor QA processes. This is a kind of error that you should catch with automating testing, even before pushing the change to main branch.


Not just QA; security assurance, code reviews, static and dynamic testing, threat surface analysis, unit testing, and pentesting either didn’t exist or weren’t sufficiently applied.

I have to imagine that this bug has existed for quite some time and I’d be curious to know what other input validation errors they have, considering the amount of untrusted input they evaluate at ring 0 originating from userland.


Again, there are safe ways of doing this. For example, Wuffs exists: https://github.com/google/wuffs

At the very least, big money security software companies should be parsing untrusted content with some kind of rigorouly safe approach, not just squirting it through a big pile of C/C++.

And don't get me started on the whole concept of undefined behavior in those languages. To quote I. I. Rabi, "Who ordered that?"


>At the very least, big money security software companies should be parsing untrusted content with some kind of rigorouly safe approach

the malformed files were updates from crowdstrike itself. It's not exactly "untrusted content".


It is untrusted data in the sense of files being read from disk that are not part of the signed kernel driver code.


The files in question reside within C:\windows, which requires admin privileges to write to. If untrusted data can end up there, you're already on the other side of the airtight hatchway[1].

[1] https://devblogs.microsoft.com/oldnewthing/20220907-00/?p=10...


Fuzzing...

I'd love to hear from an engineer on the project but unfortunately we're likely not to.


Highly unlikely anyone except governments or top-paying corporations with custom-negotiated T&Cs will see a detailed post-portem, unless someone blows the whistle. Would love to read an AmA.


I agree. My point was that using a language whose compiler will not allow you to build your production binary if you make a certain mistake could have been one extra line of defense and who knows, that might have prevented this problem this one particular time.

But I am in full agreement with you that sloppy programmers cannot truly be helped. They just screw up and move on like nothing happened. Sigh.


You can still do unsafe, and you do need unsafe in some cases.


And human beings can indeed write safe "unsafe" code. But to do so consistently, you have to be very smart, very cautious, and somewhat lucky.


Indeed, ensuring that unsafe is isolated and obeys certain semantics is a superpower that few languages have rust+kani is a good and modern way to achieve this.


OK, sure, but don't you think that being a world-wide antivirus vendor should have warranted a better process?


Oh absolutely. This is utterly unacceptable. The ease with which CS pushed willy nilly a bad build to prod in what seems to be a monophasic release is absurd.

Something of this nature would have had our entire team fired. The number of phases and the thoroughness and exhaustiveness of the protocols we have to ensure we don't push bad builds would have most engineers taken aback... but we have to. With great power comes great responsibility.


Memory-safe lanuages (for goodness' sake, even the crap I write in Python qualifies!) are the very minimum that is needed; not to use them for anything critical is simply crazy. Yes, do all the other things, but at least put out the blazing fire in your basement while you are implementing your fire-safety strategy.


Even with memory-safe languages you can shoot in your foots and on Windows, AFAIK, you need to stick with c/c++ for this kind of low level programming.

BTW, using your metaphor, until 2 days ago they didn't even know that there was a fire in the basement, nor a basement.


How do you write a driver in Python?


You can do bugs in any language. The problem here was monoculture and critical dependence on one supplier, not a programming language choice


You can indeed write buggy/unsafe code in any language. But it's a lot easier to do in notoriously unsafe languages like C/C++, which for some maniacal reason we seem to have based the world's digital infrastructure on.

C++ was a terrible, terrible mistake.


I'd argue it's several things, these two things included.

To screw up so legendarily requires a concert of bad decisions.


Yes. The term "normalization of deviance" comes to mind. Even just a phased rollout would have caught this one with just a tiny fraction of the damage observed.


1. It wasn't a null pointer: https://xcancel.com/patrickwardle/status/1814343502886477857

2. If the driver crashed due to a Rust panic, the result (boot loop) would be the same.


Your plan is to replace all the software written in C/C++ with... software that doesn't exist.

It's good to criticize the current state of things, but don't pretend you have a solution.

Also, do we know if rust would have helped here? Rust doesn't guarantee no crashes -- in fact, panicking (aka crashing) is the default.


From what I've seen, it was a NULL pointer dereferencing. Dynamic, not static, so still requires diligence even in Rust.

RE: panic default, don't get fooled by hobby projects, professional Rust code always does pattern matching and does not defer to panics.

The "software that doesn't exist" point is somewhat valid, though it's also the chicken and the egg problem, as in that not many people are working to make it happen because the current state of affairs is wrongly deemed as good enough. And it really is not.


> RE: panic default, don't get fooled by hobby projects, professional Rust code always does pattern matching and does not defer to panics.

That's a "programmers who know what they're doing don't make that mistake" argument. If that were tenable there'd be no need for rust in the first place.


Yeah I don't disagree, you do have a point.

But this is easier to scan for compared to all the potential memory unsafeties in C/C++. It's an improvement.


Windows does not support drivers based on rust language, so perhaps may be another 5 years


You can write drivers in Rust - it's just quite hard at the moment. Microsoft published metadata packages for WDK APIs and started creating samples: https://github.com/microsoft/Windows-rust-driver-samples


Good. Microsoft doing something sensible.


How can that be possible? Are drivers not binary files?


Rust can't compile DLLs? AFAIK it can?


Look at Linux. Getting Rust to work with the kernel is a long story of defining APIs, cleaning up the C-side API to make it tenable, coding test filesystems and whatnot to make sure it all works, and getting buy-in and maintenance for all of the above.

Doing the same with zero control over the non-Rust side of the kernel seems completely untenable.


I am not saying there are no challenges. I am saying that CrowdStrike does not seem to have even tried to have a better process. Rust would be only a small part of the picture; just one more layer in the security posture (a small one at that, admittedly).


Exactly. It's like the brown M&Ms in the Van Halen rider; it's not that the M&M's were the problem, but that it was a test of diligence. People who don't care about detail are likely to screw up the big things just as badly as they screw up the little things.

Being a multi-million dollar company and using unsafe languages today is not a good look. But everyone gets away with it because everyone else is doing it.


This is a kernel driver. Runs in kernel space. Intercepts syscalls. You'd definitely be fighting uphill to write it in Rust. And your code would be riddled with `unsafe` by necessity anyways.


Fair enough, still Rust's unsafe is not dropping all of its guarantees. Quite a lot of them remain in place.

Not saying you can't write bugs in Rust, of course -- that would be crazily delusional. I am saying they needed a better process. And I am saying that a stricter language could have improved the process a bit as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: