Panics are safe though (they're a controlled crash). The safety we're discussing...

zahllos · 2024-07-21T09:54:01.000000Z

A blue screen of death as caused in this case is also a controlled crash, in fact. The processor has fired an interrupt indicating invalid memory access and piece of windows code does some emergency logic, namely, dump memory so you can maybe diagnose it later, and reboot.

The reboot part happens because the system is assumed to be in a bad state and allowing it to continue would possibly corrupt data, or in the worst possible case execute exploit code.

This panic handler runs in the same privilege as the faulty driver and can itself be prevented from running correctly. Notably file system drivers are required to function correctly to write the memory dump. If they, or filter drivers attached to them, also fault, well, fun times.

You can have faults in an interrupt handler too, for example trying to access paged memory in a page fault handler. That'll trigger a double fault handler and if you fault in that, the processor will perform a reset and not bother even notifying software. Luckily the double fault handlers and other such cases are usually solely the preserve of OS vendors.

I have no particular point except to illumate what's going on and that processors (in this case x86 terminology is used) and that actually recognizing and aborting from an invalid state is exactly what's happening here and what rust memory safety does. In spite of the disruption that's better than silently corrupting data.

jamwil · 2024-07-22T06:56:37.000000Z

Forgive my ignorance, but to your last point about Rust aborting an invalid state… Isn’t Rust considered more memory safe because it catches many mistakes at compile time and not runtime?

rabite · 2024-07-21T09:46:50.000000Z

> Panics are safe though (they're a controlled crash).

Here's Linus's commentary on that:

https://lkml.org/lkml/2021/4/14/1099

> I think that if some Rust allocation can cause a panic, this is simply _fundamentally_ not acceptable.

> Allocation failures in a driver or non-core code - and that is by definition all of any new Rust code - can never EVER validly cause panics.

Panics are not acceptable in countless contexts. Plenty of things need to be written to keep working through entire categories of errors. The casual attitude of Rust developers towards error handling is one of the many reasons people have trouble taking it seriously. Reliability and robustness is generally more important than language memory safety for almost all contexts.

zahllos · 2024-07-21T10:14:26.000000Z

There are indeed many cases where errors need to be recovered from and the subject of one angle in secure rust code training was quite literally "don't just panic, don't blindly unwrap or leave errors unhandled because that'll kill your thread/process on failure, you should still code for failure cases". If you do, you are coding denial of service bugs.

But, in the incident in question, the code is fundamentally not correct. Spatial memory safety violations, or in plain English "trying to call functions or use data that isn't at addresses your code or data lives at" fundamentally is an error. There's a missing part of the state machine to detect and stop before just exploding. In userspace this is a segfault and your process dies. In kernel, you get a bugcheck and the whole system reboots.

There are scary alternatives. The first, in kernel, is that you suppress all invalid writes and allow the errant code to keep writing, until it hits some other data. The system stays up, but you have out of control data writes so who knows what that's doing.

The second is that the execution flow of the process can be hijacked, i.e. Sergey Bratus' weird machines, or in plainer language, owning kernels in critical infrastructure. This is usually undesirable.

pixelesque · 2024-07-21T09:49:57.000000Z

Panics in a user-space application are likely safe and the correct thing to do.

Panics in a real-time system or a kernel are quite possibly not.

immibis · 2024-07-21T11:18:59.000000Z

In a hospital system nobody cares whether it was the kernel or the application that caused people to die.

bdd8f1df777b · 2024-07-21T09:46:20.000000Z

This incident, the blue screen of death, is exactly the same as a panic.

uecker · 2024-07-21T09:42:12.000000Z

The problem here was that the kernel process got a fault, so a panic wouldn't have made a difference.

tomohawk · 2024-07-21T09:49:13.000000Z

Panics are only safe if you have an OS to catch you. They are definitely not safe in the CS context.