Coincidentally, I also stumbled upon a way to make the kernel of Apple Silicon Macs panic and restart while developing the https://lowtechguys.com website.
That's.. wow, ok. How exactly did you end up in that source code at that specific line?
I know you're well versed in reverse engineering Darwin, and I'm reading your posts and trying to improve my skills in this daily, but this seems way over my skillset.
Did you debug this using KDK or m1n1? Do you have a setup always ready for debugging a Darwin kernel?
In theory I have all of those, but currently I have none, so it's manual work. Your best friend in diagnosing a kernel crash is a KDK. If you have one that matches your build, it will have symbols in it. With a little bit of math you can take the backtrace in the crash log and slide it appropriately to match the binary. Personally I use LLDB for this. Here's an example of what this looks like on an x86-64 kernel (Apple silicon has its own math but it's largely the same): https://github.com/saagarjha/unxip/issues/14#issuecomment-10.... The kernel is typically compiled with optimization, so there's a lot of inlining and code folding, but with function names, source files, and instruction offsets it's pretty trivial to match it to the code Apple publishes.
In this case I do not have a KDK for that build. In fact Apple has been unable to produce one for a couple of months, a inadequacy which I have repeatedly emphasized to them because of how critical they are for stuff like this. Supposedly they are working on it. Whatever; in lieu of that I got to figure out how good the tooling for analyzing kernels is these days, which was my real goal anyways.
For this crash log I downloaded the IPSW file for your build, 22A400. All of them get linked on The iPhone Wiki, e.g. https://www.theiphonewiki.com/wiki/Firmware/Mac/13.x. Once you unpack the IPSW (it's a zip file) there are compressed kernelcache files inside. Apple changed the format of these this year so most of the tooling breaks on it, but https://github.com/blacktop/ipsw was able to decompress them. Then I loaded it in to Binary Ninja, which apparently doesn't support them either but compiling this person's plugin (+166 submodules, and a LLVM & Boost build) gets it to work: https://github.com/skr0x1c0/binja_kc.
From there you can load up the faulting address from the crash log and see what the function looks like. In this case, a bunch of junk has been inlined into it but there's a really obvious and fairly unique string reference for "invalid knote %p detach, filter: %d". From there, you can compare it against the actual source code to see which one matches the "shape" of the function you're looking at. I happened to also pull up an older kernel which did have a KDK available and then compared its assembly to the new one to match it up to ptsd_kqops_detach. The disassembly of the crashing code is obviously doing a linked list walk so you can figure out exactly which line it is from that.
If I wasn't lazy I might also fire up a debugger to see why the function had walked off the end of the list but without KDKs things get pretty bad, not that they're very good to begin with. I don't have a m1n1 setup (I should probably do this at some point) and the things I do have, like remote debugging or the VM GDB stub, are not really worth suffering through for a Hacker News comment.
I was in the process of trying to get I²C working through the built-in HDMI port of the Apple Silicon Macs (the one containing the MCDP29xx HDMI-to-DP converter chip) and been hitting a lot of dead ends while looking at kexts and opaque firmware blobs. This is going to help a lot as the KDK seems to contain logging messages related to DDC that I've never seen before.
I also found SIP disabled + Frida very useful for debugging without going through the KDK/m1n1 route. Not sure if it also helps with kext code though, I mostly used it for SkyLight and other private frameworks, but it's very nice to be able to also alter the code while it is running in realtime, or sometimes simply log specific function calls with argument value to get an idea what action causes which code to run.
Unfortunately patching the kernel or injecting your own code into it is quite difficult, unlike the situation in userspace. Though I haven’t gotten a chance to try it I think running a kernel debugger through m1n1 to be the best strategy to doing dynamic analysis of the kernel.
I don’t really see the problem with getting faster responses by contacting Apple’s security team directly for potential vulnerabilities when compared to the general-purpose bug tracker.
Usually big companies such as Discord give perks to the bug hunters who find bugs. Apparently Apple doesn't have that. There are probably people at Apple who won't admit that they have bugs, when every operating system has bugs, the code is too big to not create a single bug or exploit.
I have found 2 crashes in osX back in Yosemite. I have reported them with every release since.
I have no idea of they work on the arm Macs, but I will have the ability to check in a couple of days. Probably nothing exploitable, but still a hard crash.
If that still fails, virtualization tools provide debugging interfaces you can use to step the execution of the virtualized CPU; e.g. VMware’s “debugStub” feature.
You can't with Apple Silicon. It's shameful in my opinion. You still can load a core dump or view the state after a NMI but you can't run the kernel under a debugger.
It's surprisingly easy to stumble into crash bugs when playing around with processes.
I remember a decade or two ago I ran into a linux bug where the kernel would panic if a process was killed with an open descriptor on its /proc entries. That is:
open /proc/$pid/something;
kill -9 $pid
#kernel crash
We unfortunately discovered this when using fuser in a runscript to kill stale versions of a process, eg:
sudo fuser --kill --namespace tcp 80 # kill whatever is listening on port 80
This would reliably cause kernel panics every so often, with one straightforward shell command. This ended up causing a big problem because it was part of a runscript which ran on bootup. But, it normally would do nothing so it went unnoticed until the app in question had a startup problem, leaving a copy of itself dangling listening on the port -- and instead of killing the old instance, it began crashing the entire system in a loop. Oops.
I remember a time when you had to be careful to not reveal your IP address to untrusted peers (e.g. on IRC) because a single specially malformed packet called the "Ping of Death" would reliably crash any internet-connected Windows PC.
That was a wild time. Nobody talked about security back then. The idea that everything in our lives would eventually run over the internet just wasn't on people's minds.
It's freezing the querying of process status, which is very not good, but that isn't the entire kernel. If it was the entire kernel, you wouldn't be able to use Ctrl+C.
In the long dark ago there was a program called 'crashme' which would generate and run random code from user space to see if it could cause kernel panics.
It's very easy to freeze a system as a non-root user; cause too many interrupts, consume too many resources, etc. Many kinds of infinite loop will lock a system hard. Hell, you can crash systems with too many packets.
And it's very easy to cause ps to hang. Many different kernel syscalls hang / are blocking. Mostly you see this with kernel features dependent on a resource that doesn't resolve itself, like a stuck disk, network filesystem, etc. But other various quirks of the system can cause blocking.
While what you say is true, these are nonetheless kernel bugs.
The kernel should never let any user process consume so many resources as to cause a system freeze.
The kernel must not only be able to preempt any user process at any time, stopping it to consume all CPU time, but it must also prevent any user process to completely fill a SSD or HDD, because that can prevent many programs from starting.
Preemptibility is an optional kernel design feature. Not all kernels have it and not in all ways. If it's intentionally designed that way it's not a bug. No kernel stops users from filling up disks (though some filesystems have such limits as features, which most of us turn off)
Pretty much the only kernels that are totally preemptible are RTOS and they still don't stop you from shooting yourself in the foot.
A computer's job is to do whatever you ask it to do; that includes using all disk or memory up, if that's what you really want. There's not really a way to prevent breaking the OS (or just using up the battery) without preventing you from using all the computer you bought.
A phone is different since it always has to be able to make phone calls.
The things you mention cause "freezing" by asking the kernel to do too much so that it doesn't have time to deal with other stuff. Those issues are unfortunate, but really hard to completely avoid.
The interesting thing here is that the described bug isn't just overloading the kernel with work or starving it of resources, it's something which seems completely innocent.
Not surprised. I wrote some kqueue code in C once that not only froze the Mac Kernel. It caused the entire computer to crash. I reported the issue to Apple, and never really heard back. They don't really care, as long as all the mac store apps work, in my experience.
This was one call to kqueue with incorrect (but not particularly malicious, just normal C silliness) arguments, and boom!
I also just tested now and reproduced the bug. The key thing is that you are pasting back into the calculator – which presumably is just stripping letters.
The behavior is maybe a bit surprising, but I could also see it being defensible. You can't type "1e20" into the calculator, so why would you be able to paste it in?
Outside of plain text, I don't think clipboard operations are necessarily expected to be reversible.
That’s a good point. Maybe you should be able to type “1e20” in scientific mode? It would make the calculator more feature full and prevent “losing” data if you copy/paste in the middle of a long calculation.
"e" in scientific mode is a shortcut for the "ln" function.
That said, I tested again, and once again copying "1e20" and pasting it into the Calculator works just fine. It's definitely not treating it the same as pressing each key separately. I'm testing on macOS 13.0.1.
Bugs do get fixed (eventually; they're not always timely about it depending on severity), but Apple's feedback systems are and always have been a black hole. Basically, as a reporter, the only time you hear anything back from Apple about a bug report is if they need additional information; nothing else in their process is visible externally (until you go back and retest a few macOS releases later and your bug is or isn't fixed).
I distilled the problem in a repo so it can be reproduced with a single command: https://github.com/alin23/m1-panic
I found it while on Monterey and reported it 2 times through Feedback Assistant, but it still happens on Ventura.
NOTE: Don't try it without saving all your work, it has a very high chance of restarting your computer forcefully.