Our software[1] got broken by a kernel change a few years ago. The experience was quite interesting.
We were making use of `/proc/PID/pagemap`, which is a kernel-generated file that previously would show the physical addresses of all the pages in a process's address space. Unfortunately, with the Rowhammer exploit, exposing this information - even for one's own processes - to unprivileged users went from being harmless to a security risk.
The first we saw of the change was when newer kernels started reporting zeros for all physical addresses, unless we ran as root. We raised this the LKML, explaining that we'd been relying on this feature to implement a somewhat esoteric optimisation.
Linus replied very helpfully - the security fix trumped userspace compatibility but he could see a secure way of getting us the information we really needed, given the technique we'd described. He invited us to submit a kernel patch and gave a few hints about potential gotchas.
I did work one up but the kernel community actually jumped on it as an opportunity to do more cleanup, so I ended up just signing off on the patch they produced. It was all a remarkably smooth and efficient process.
Heh, as someone who maintains another exotic debugger[1] my experience has been that the kernel development process is kind of a pain. We've had good experiences with people fixing regressions once they're discovered but getting new features into the kernel has been difficult. I think it took 10 revisions for me to get cpuid faulting into the kernel, including multiple review cycles where I was first told "change X to Y" and then in a subsequent cycle told "change Y to X".
Yeah, for similar reasons, the /proc/PID/wchan now shows just "0" (for other users' processes) on newer kernels, unless you run as root. Same with /proc/PID/stack, but it's implemented in a different way, I can open() that file successfully, but the read() syscall on the opened file descriptor returns EACCESS error...
$ ls -l /proc/$$/stack
-r-------- 1 tanel tanel 0 May 26 21:52 /proc/967141/stack
$
$ cat /proc/$$/stack
cat: /proc/967141/stack: Permission denied
$
$ sudo cat /proc/$$/stack
[<0>] do_wait+0x1c3/0x230
[<0>] kernel_wait4+0xaf/0x150
[<0>] __do_sys_wait4+0x85/0x90
[<0>] __x64_sys_wait4+0x1e/0x20
[<0>] do_syscall_64+0x49/0xc0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Edit: Adding one more comment - my impression has been that the "no userland-visible changes" promise applies to system calls - how procfs presents data as human-readable text in the /proc files has changed every now and then before (I recall the sar command showing wrong numbers after a kernel update, for example).
We've found userspace ABI does change in some surprising ways occasionally but mostly it doesn't matter to anybody.
e.g. we've seen the format of signal stack frames change in the past but nobody relies on that layout so it's OK.
/proc is another one along those lines - technically somebody could probably complain if it changes but if nobody shouts then it will just drift over time.
Then you do have to fill out a contact details form but you will be able to download a trial of UDB, our interactive debugger. That gets you the Time Travel functionality.
If you want the full LiveRecorder experience - additional tool, library, etc then you do have to request a demo. https://undo.io/about-us/contact/request-demo/ - Mention that you had exchanged messages with me (Mark Williamson - Architect @ Undo) and I can help from my side if you have any technical issues.
We were making use of `/proc/PID/pagemap`, which is a kernel-generated file that previously would show the physical addresses of all the pages in a process's address space. Unfortunately, with the Rowhammer exploit, exposing this information - even for one's own processes - to unprivileged users went from being harmless to a security risk.
The first we saw of the change was when newer kernels started reporting zeros for all physical addresses, unless we ran as root. We raised this the LKML, explaining that we'd been relying on this feature to implement a somewhat esoteric optimisation.
Linus replied very helpfully - the security fix trumped userspace compatibility but he could see a secure way of getting us the information we really needed, given the technique we'd described. He invited us to submit a kernel patch and gave a few hints about potential gotchas.
I did work one up but the kernel community actually jumped on it as an opportunity to do more cleanup, so I ended up just signing off on the patch they produced. It was all a remarkably smooth and efficient process.
[1] Time travel debugging - http://undo.io