I love the line: I spoke a lot with Todd Mortimer. Apparently I told him that I felt return-address protection was impossible, so a few weeks later he sent a clang diff to address that issue...
The volume and quality of tools coming out of the OpenBSD community in recent years has been absolutely awe-inspiring.
I'm presently able to do the entirety of my personal project work on the OS and I'm only a couple of tools from being able to do the same professionally.
I also must be missing something. XORing the return address on the stack with the stack pointer is similar to other stack protection mechanisms. I forget the precise name of it, but I'm pretty sure one of the existing stack protection tools does exactly this? MSVC's /GS feature is similar but slightly different in that it XORs the return address with a random value initialized on module load.
However, the claim that ROP is impacted seems a bit flimsy to me. After all, ROP only requires that the C3 (RET) or C2 xx yy (RETN YYXX) byte sequence be present at the end of it; these sequences do occur at the end of a function, but they also occur in other places (such as anywhere the byte C3 arises in compiled machine code). ROP tools are programmed to look for the C3/C2 XX YY sequences and do not know or care whether these sequences are at the end of a function. The post is claiming that by transforming the ends of functions, ROP will be affected; but given that it seems to makes no attempt to remove C3 and C2 bytes from elsewhere in the machine code, that ROP tools will in fact continue to work just fine.
Basically the whole thesis of this patch seems to be that "existing stack protection methods will change function epilogues and therefore break ROP". I don't think it will have much of an effect on existing ROP tools. What am I missing?
> but given that it seems to makes no attempt to remove C3 and C2 bytes from elsewhere in the machine code, that ROP tools will in fact continue to work just fine.
To use ROP you need not only the RET instruction, but the code before it. You want to execute some existing function and return only then, not just return.
Buffer overflow attacks rely on overwriting return address, which is stored on the stack, with address of some code that attacker wants to execute. But if before returning the function XORs the value attacker used with some value he does not know, it is impossible for attacker to start ROP chain.
Though like with ASLR, it is possible to defeat this with a leak. If attacker can defeat ASLR, he likely can defeat this as well.
As I said to the other user who replied to a similar comment, these observations apply only to exploitation of stack buffer overflows, and hence don't rebut what I've said about this not mitigating ROP as a general technique (which is also used in the exploitation of non-stack-based vulnerabilities like use-after-free).
Exactly, but also, ROP is not solely about RET instructions, but the general technique is applicable to other forms of control transfer like unconditional absolute/relative jumps as well. Some time ago I did an analysis on this topic [1] using radare2 - I was curious what number of ROP gadgets is present in a healthy instruction stream (I call those implicit, and those are mostly comprised of function epilogues) and what number of gadgets can be formed by jumping into the "middle" of some instruction (explicit gadgets).
The idea was to get rid of dangerous ModRegRm/SIB/XOP prefixes at the compiler level, see last table at [2] for example ModRegRM bytes - if your compiler decides to move something between RAX and {RDX,RBX} it will unavoidably emit C2/C3 bytes as well. Another thing are immediate and constant values, which are literally embedded in the instruction stream so if I have a code like:
something = 0xc351c131485958
For which the compiler can generate:
movabs rdx,0xc351c131485958
Just by using unfortunate value for something at the code level I've actually introduced a new gadget into the program:
pop rax // 0x58
pop rcx // 0x59
xor rcx, rax // 0x48 0x31 0xc1
push rcx // 0x51
ret // 0xc3
Not sure how it turned out, but I heard someone from GCC was trying to implement a mitigation strategy based on this idea.
> To use ROP you need not only the RET instruction, but the code before it. You want to execute some existing function and return only then, not just return.
Ok, you found RET in some unexpected place, like an immediate value. But do you want to execute the code before it? Most likely it is just garbage.
Usually you want to return to mprotect() and then chain somewhere else from it. With this mitigation even if you manage to jump to mprotect() function, you will not be able to make it chain to the next function you want.
Yes, gadgets arising from non-epilogue instances of C2/C3 are used frequently. In fact they are most often critical and the ROP exploit would not work without them.
You need to hit the first ret in the vulnerable function to enter the ROP chain, and before that ret the value at the top of stack will be (de-)mangled.
If the return address on the stack is overwritten by an attacker, it needs to be overwritten with a ROP gadget adjusted for the mangling.
Your comments only apply to exploitation of stack buffer overflows, which have largely been rendered extinct due to compiler-based strategies like this one. Exploiting, say, a use-after-free vulnerability still may require ROP but does not require corruption of a return address on the stack. Given that the proposed defense supposedly targets ROP in general and not exploitation of stack buffer overflows specifically, my points still stand.
Oh, yes. There are other ways to kick off ROP chains that do not involve stack corruption. For these attacks, Retguard will only pollute the gadget space by inserting these return-address permuting instructions before some fraction of the c3 bytes in a program (a little under 50%, depending on the program).
Actually removing c2/c3 bytes and actively reducing the gadget space is a different endeavour. There has been a bunch of academic work in this regard, with varying levels of success. Some would say it is a fool's errand to try to remove all the ROP gadgets, but that's what fools are for. Stay tuned. :-)
I feel like I’m missing something here. An infoleak is required to successfully ROP against ASLR (otherwise the attacker doesn’t know what to overwrite the return address with). Once an infoleak is available, the address of the stack can be leaked. I’m not really sure this does much beyond requiring attackers to modify their existing exploits.
It increases the complexity of the attack. Usually, stack cookies make ROP harder these days but guessing the cookie only has a complexity of 8*256 (on OpenBSD), but xor'ing the return address with another value does increase the complexity even more. And that might be good news for programs that use fork a lot (like nginx) and hence don't get a refresh for ASLR/stack cookies for every request (like e.g. sshd on OpenBSD does [which does does fork/exec to ensure ASLR/cookies are refreshed]).
OpenBSD has been expanding the fork+exec model throughout its source tree, since the OpenSSH preauth work done by Damien Miller, many more followed. The list includes bgpd/ldpd/eigrpd/smtpd/relayd/ntpd/httpd/snmpd/ldapd and most recently slaacd & vmd.
A few remain but are being converted as they are discovered.
How does that work? Should the kernel walk the stack to change all the saved cookie values of the forked copy? I doubt the kernel even knows where the saved cookie values are stored on the stack. Also, that would make fork quite slow, depending on how the deep the stack was when the fork happened.
The post-fork canary value could be paired with the stack pointer at which it became valid. If not valid, the process could walk a linked list of pre-fork canary and stack pointer pairs, to find the correct value to use. Would be interesting to see the performance hit on such an approach.
"ROP attack methods are impacted because existing gadgets are transformed to consist of '<gadget artifacts> <mangle ret address> RET'. That pivots the return sequence off the ROP chain in a highly unpredictable and inconvenient fashion."
I'm not seeing how it's unpredictable and inconvenient. It's predictable if the stack address can be leaked (via a frame pointer leak, for example). It doesn't seem that inconvenient. Instead of including the address of a gadget in the chain, include the gadget xor the leaked stack address. What's the unpredictable and inconvenient part that I'm not seeing?
You have it - if a stack address can be leaked, and you can follow the control flow to figure out the difference between the address you leaked and the address you're going to be dumping your ROP chain into, then you can just xor the gadget address with the stack address, and then do the math to xor any down-chain gadgets with the calculated stack address if the gadgets you want to use happen to have this xor instruction injected into them.
But you don't always have stack address leaks. Presently, in order to ROP you need (a) a leaked address to the space where your gadgets live and (b) the ability to write your ROP chain somewhere where the program will return into it. With this scheme, you now also need (c) the exact address where you are writing your ROP chain.
Not all info leak vulnerabilities leak arbitrary memory of the attacker's choosing. If they did, stack canaries would be pretty useless. So for those cases where a stack address leak is unavailable, this raises the bar against ROP.
I think the LLVM CFI options only apply to C++ programs?
Microsoft's CFG is cool, but works on the other end to this - by protecting CALL instructions instead of RET.
Stack cookies are similar, yes. This mechanism combines two ASLR-randomized values (the caller address and the stack address) to control program flow. Stack canaries use a constant random cookie per object file to detect stack smashing. Retguard raises the bar for successful ROP attacks, just as CFG raises the bar for function pointer corruption attacks.
Full disclosure, I am the author of the clang diff that kicked this off. The appeal of this mechanism is that it is cheap enough to apply in every function, pollutes the ROP gadget space with instructions that permute the next return address, and requires attackers to have more information in order to kick off ROP chains (the address where they are writing their ROP chain). I know about some of the other stuff people are working on (CPI seems promising), and look forward to turning those things on when possible. Meanwhile, this mitigation is cheap, localized inside every function, and doesn't require any program modification in the vast majority of situations (programs that inspect stack frames without consulting the unwind info may have problems), so is fairly easy to turn on everywhere.
Regarding cost, a few years back, I implemented a shadow stack system via static binary rewriting. The overhead was very low, 1-2%. SafeStack claims < 0.1%.
Ah, Return Flow Guard is cool - I did not know that MS had done that!
I like SafeStack, but was disappointed to learn about the limitations with shared libraries. Some SafeStack is better than no SafeStack though, and it can probably be turned on without too much effort.