Hacker News new | past | comments | ask | show | jobs | submit login
A reference counting bug which leads to local privilege escalation in io_uring (flattsecurity.medium.com)
119 points by todsacerdoti on June 21, 2021 | hide | past | favorite | 46 comments



It still is strange there is zero testing of these functions (e.g. that fix doesn’t come with any tests).

Because there isn’t any requirement for testing, it allows these functions to become super complex and harder to see where errors could occur.


Linux is extremely optimized for happy path coding.


Why is this strange? I never worked in a project where fixes required tests beyond some QA guy/gal stating "it works, done".


From [1]

> MariaDB includes test cases for all fixed bugs.

It's not something I've practiced much myself, but writing a test case to reproduce the bug and then fix the bug seems like a more reasonable form of TDD in my head.

I've seen it once when reporting a bug to another team I worked alongside, where I was stress testing a new feature and found said bug. Instead of having to run the stress test, they wrote a unit test to reproduce it at a much smaller scale and then had a much smaller feedback loop to ensure it worked after they fixed it.

[1] https://mariadb.com/kb/en/mariadb-vs-mysql-features/


Sure, but that is actually more the exception than the rule.


In many projects a test case is required which reproduces the bug and then shows that it's fixed.


Further grist for the mill about the effectiveness of seccomp-style filtering for multitenant Docker, since it's unlikely anyone was filtering out `io_uring_setup`.


People can do whatever they want with seccomp-bpf obviously, but is it really that uncommon to use it for whitelisting? As for kernel vulnerabilities being a weakness of sandboxing in general, if anyone still doesn’t understand that by now it must be willful and I don’t know if they can be helped.


No matter how you mask off attack surface for the kernel, you're not super likely to want to disable io_uring, is the point I'm making. It's easy to find recent threads here with people sticking up for shared-kernel multitenant isolation.

(Be forewarned that I'm talking my book a bit here, since we have a commercial thingy built on multitenant VMM isolation).


BTW while on the topic, what do you think about having a heavy host kernel with a guest vmm attached to the network with a hardened firecracker and a dedicated network interface. Would you feel it's 'better' than shared kernel/os + namespaces? Or is it 'smallest hardened root hypervisor or no go'. Not sure I'm making sense...


The heavyweight host (which is the normal state of affairs) is problematic attack surface; moving the workload into a hardened VMM on that improves security regardless.


Thanks Thomas for the insight.


Isn't the standard pattern dropping privileges after the setup is finished?


> sticking up for shared-kernel multitenant isolation.

Seems like willful snake oil.


For a moment, I thought 'escalati' (in the title of the submission) was some kind of professional term that had so far evaded me. It sounds pretty elegant. But of course, the title was just cut off. Almost disappointing.


Escalati: the secretive guild of hereditary escalator engineers who maintain the escalators in the Illuminati's secret volcano lair (escalator reliability engineering is a major concern when world leaders are frequently escalating over giant cauldrons of molten lava)


Escalati: Cousins to the Air Conditioning Repairmen: https://www.youtube.com/watch?v=KrcY6PXkGuE


what a puzzling fragment of american culture did you just unearth for us! it says s03e06 so it survived a surprising while. was this popular amongst HN crowd? was it all so absurd?


This is really a minor part of the show "community". Whilst community definitely deserves a watch, it is a weird comedy about community college. It is not focused on technology in any way.

Give the show a try, it is good! but don't expect tech-focus.


The Escalati - a secret society controlling the world by means of privilege escalation.


As opposed to Iluminati, who try to do the same with smart lightbulbs?


Have we lightened up yet?


You got it backwards. Remember that when people say “illuminati” they are speculating about occultists, not about illumists.


The same - though I read it as being a tongue-in-cheek plural for escalation in a security context. Perfect for high-falutin' conference papers!


pwn2own: escalati the boxen!


>Escalati

Found the name for my next CTF team.


escalati: plural of escalatum

OR

escalati: The beings who control the illuminati


Here's the CVE

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-2022...

For some reason the article don't link there :(


Thank you. But CVE seems to disagree with the headline?

> The highest threat from this vulnerability is to data integrity, confidentiality and system availability.

Or is this more of a "read/modify /etc/shadow or /sbin/su" kind of thing?


As I read it: it's a kernel UAF; memory corruption, in the context of the kernel. There's a secondary attack vector related to the refcount mishandling, where you can obtain control of file table entries after an `execve`, even if you exec a SUID, which is also bad.



Any idea what the diagrams were generated with? It looks graphviz-y to me.


Strangely the Redhat bug tracker listed in the CVE has this issue closed with "NOTABUG". I guess it's not technically Redhat's problem?

https://bugzilla.redhat.com/show_bug.cgi?id=1873476


> The affected code was not introduced into any kernel versions shipped with Red Hat Enterprise Linux making this vulnerable not applicable to these platforms.

Might explain the strange status.


It would be nice if the title mentioned what was affected, perhaps something like "CVE-2021-20226: io_uring privilege escalation via reference counting bug".


That's easy. We don't need CVE numbers in titles. The information is trivially available to anyone who needs it.

(Submitted title was "CVE-2021–20226 a reference counting bug which leads to local privilege escalati".)


If anything, [GNU/Linux] would be more relevant.


What does this have to do with GNU? Afaict it affects any Linux kernel of the relevant version.


You’re probably right that this is the one time where GNU is unrelated to the problem. Congratulations.


So HN should be optimized for people who don't click the link?


Perhaps the titles at least should be optimized for people deciding whether to click the link.


I think these articles are more aimed at the postmortem aspect; reading about a bug that happened so you can try to avoid it when you're designing a similar system. So it doesn't really matter that it affects Linux, or what io_uring is, etc. The lesson is relevant even if you use Foobian GNU/OpenBSD emulated under Windows 11 on an M2 Mac.

If you are just looking for notifications that you should patch your system, you probably want a method other than HN for that -- you will miss a lot of critical patches.


Man. I've been on the io_uring train since basically the beginning.

Since following, I've seen reason after reason not to ever use it. Between the skewed performance tests, the dubious funding (coming from Facebook), and the several security risks (including this one), I just don't see it taking off.


Skewed performance tests? Dubious funding? I'm not sure what you mean. Can't Facebook fund any technical work?

I see some reasons not to use it (like it's a vulkan-like low-level-only API, or portability issues, or some missing APIs in my case-s) but 'it was funded by facebook' and 'it has security risks'? I mean I'm not sure there's an area in the kernel without corporate funding/support, or without past security bugs. We keep finding vulnerabilities in ipv6, sctp... Yes the Linux kernel dev process is lacking but likely for different symptoms & causes?


The project has a long history of skewing the performance benchmarks and making wild claims like "60% faster than epoll" and the creator gets very defensive when anyone questions it.

There are claims about SQPOLL that simply cannot be reproduced. Several ongoing threads about it on the liburing tracker.

There have been privilege escalation problems from the start that seem like they aren't being addressed.

It feels as though Facebook wants to have something revolutionary at the cost of quality and they're using the Linux kernel to do so, though I realize that's my own hot take.

Again, I've been following this (and writing code for it) since I could get my hands on the dev branches. It's only really good for filesystem I/O in its current form as that's the biggest focus they have for it. They (Facebook) care less about other resource types (e.g. sockets).

You can choose to believe me or not, I suppose.


Thanks for taking the time to expand a bit more, on your experience with the code and the community around it. I'll have a look.

Right now I'm more interested in the chaining aspect than raw performance but I know some of my high-throughout network workloads behave far better in latency with the complete syscall removal on recv and send though keeping up with the completion queue is hard-ish. I still prefer dpdk right now for this kind of network stuff but just because my use case is perfectly adapted for it (no fragmentation, no complex protocol, constant data stream...) and dpdk ain't no party either.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: