Cool. If I understand the LLVM code correctly, it's inserting the following instruction sequence into the code:
mov r11, [cookie]
xor r11, [rsp]
...
xor r11, [rsp]
cmp r11, [cookie]
jeq 2
int 3
int 3
ret
(where r11 might be some other suitable temp register as needed). cookie points to an 8-byte chunk of .openbsd.randomdata, a section that is initialized at binary load time by the kernel to contain random data. The canary is one of 4000 possible values, named "__retguard_0" through "__retguard_3999", presumably to avoid having the kernel generate an unbounded amount of random data - the section is limited to 1MB in size.
This makes ret instructions fairly hard to use for rop purposes. Unlike the original design, which xor'd [rsp] directly, this new approach preserves return prediction so it should have a lesser effect on performance. With the changes to reduce polymorphic gadgets in place, this should make ROP attacks significantly less palatable. Also, in the original design, an arbitrary leak made rop attacks feasible as you could just place xor-encrypted return addresses on the stack. With the new design, you need repeatable register control too, assuming the temp register isn't spilled, which raises the bar quite a bit.
It's also worth mentioning there's previous work to reduce the amount of polymorphic gadgets in the instruction stream, including a new framework for clang:
The underlying assumption is that the XORd value cannot be crafted by the attacker. If read-access of [rsp] is possible, this scheme is vulnerable to information leak of the return address (or another program pointer which lets the attacker derive the return address), because cookie = retaddr^[rsp].
But still this is better than the original RETGUARD which was easier to attack in both ways, either a leak of the stack position to get the return address or vice versa.
In exploit development parlance, a gadget is a block of assembly instructions executed outside their intended order by an attacker-induced control transfer. A gadget might start in the middle of a basic block, for instance, and be invoked when an attacker uses a memory corruption vulnerability to overwrite a function pointer with an address they control.
"Return oriented programming", which is kind of a dumb name, is the idea of harvesting gadgets from the text of a program and then using them as primitives for a new program. Gadgets are stitched together by the "return" instruction (hence the name ROP). When used by attackers this way, "ret" isn't really "returning" so much as it's being used as an arbitrary indirect jump mechanism.
This uses OpenBSD's random-data memory [0][1] feature, which was used by the stack protector to provide per shared object cookies.
RETGUARD is more than just an improved stack protector, as explained in the commit message it protects function epilogues that are close to return instructions.
this is basically the xor canary approach originally pioneered by the Stackguard guys (i'm pretty sure you were already around at the time though probably forgot such old history as did the rest of the world apparently ;). the OpenBSD implementation suffers from a few problems, mostly their own making:
1. if they can't find a register to load the cookie into, they'll silently skip instrumentation (i'm not sure how that would happen in practice but the silent treatment when omitting a security feature is a non-starter).
2. if they can find such a register then it'll be spilled to the stack and restored in the epilogue, so a normal buffer overflow can control both the xor'd retaddr and the retaddr itself and the only thing standing in the way of exploitation is the secret cookie value - not unlike with Stackguard/SSP.
3. one would think that a per-function cookie is an improvement but... they're shared among threads (in userland) or everything (in the kernel) so infoleaks are just as catastrophic as before (it'd certainly help if someone described a proper threat model for this defense). at least the kernel side should use a per-syscall cookie to make it somewhat resemble an actual defense mechanism (and there's some more described in my presentation).
4. the int3 stuffing before retn must be someone's joke 'cos it sure as hell won't prevent abusing the retn as a gadget. it does introduce a mispredicted branch for every single function return however.
Hey PaXTeam, thanks for having a look! I wrote the implementation, so I can answer some of these.
1. We don't silently skip instrumentation. If we can't find a free register then we will force the frame setup code to the front of the function so we can get one. See the diff in PEI::calculateSaveRestoreBlocks().
2. We do spill the calculated value to the stack. This is unavoidable in many cases (non-leaf functions). It would be an optimization to not do this in leaf functions, but this would also mean finding a register that is unused throughout the function. This turns out to be a small number of functions, so we didn't pursue it for the initial implementation.
3. I'm not sure what you mean by the cookies are shared. Do you just mean that they are all in the openbsd.randomdata section? They have to live somewhere. Being able to read arbitrary memory in the openbsd.randomdata section would leak them, yes, though this doesn't seem to have been a problem for the existing stack canary, which lives in the same section. I see that RAP keeps the cookie on a register, which sounds like a neat idea. I'd be curious to see how you manage to rotate the cookie arbitrarily.
4. I'm glad you like the int3 stuffing. :-) We could always make the int3 sled longer if it turns out these rets are still accessible in gadgets that terminate on the return instruction. Have you found any?
Anyway, I'm happy to see your commentary on this. You guys do some nice work! If you have other suggestions for improvement I'd be happy to hear them. You can email me at mortimer@.
1. both insertReturnProtectorPrologue and insertReturnProtectorEpilogue check hasReturnProtectorTempRegister before proceeding with the instrumentation. so either the changes to calculateSaveRestoreBlocks are not enough to prevent that condition from ever triggering or these checks should be asserts at most or just be eliminated altogether.
2. sure but then this means that RETGUARD is not an improvement over Stackguard/SSP which is not how it's marketed...
3. shared means that entities of a class (threads in a process in userland, every single process/thread in the kernel) see the exact same cookies so leaking a cookie from one entity can allow exploitation by another. this is especially detrimental to the kernel side protection. frequent enough cookie rerandomization can help narrow this channel (RAP has a per-thread cookie in the kernel that is updated on each syscall, and there's some more to reduce infoleaks across kernel stacks, it's all in the presentation).
4. any normal path leading up to the ret is a gadget and int3 stuffing does nothing to prevent that (the underlying logic here is that if one can retarget a return to arbitary addresses then he has already leaked enough information so bypassing the cookie check is a no-brainer too). not only that but in the bsd.mp kernel i just checked, of the 32199 ret (0xc3) bytes only 20236 are actual retn insns, the rest are inside insns. so this int3 stuffing leaves many other instances available. Red Hat tried similar gadget elimination a while ago but noone's using the gcc feature as far as i can tell.
It's informative in that it tells one that one should stay away from grsec's patch sets and why.
When the BDFL of one's kernel says something like that, combined with how radioactive the community interaction seems to have been in the past, the notion that one might get sufficient support or have positive interactions with the wider community while using the grsec kernel fork is dubious at best.
The idea that you should avoid grsecurity because Linus Torvalds, the "BDFL", says so is so messed up I actually don't know how to rebut it.
Either way: I'm not saying Retguard is based on RAP --- my question was, "what's the relationship between the techniques". But if there is a relationship, OpenBSD should be explicit about it, so that we can keep track of the evolution of memory corruption countermeasures.
I do not care whether you think people should or shouldn't run grsecurity.
I don't avoid grsec because Linus says so. I avoid it because he says they break things when they don't need to. Security is a trade-off and I trust Linus in judging that and in ensuring the operating system should never break user space, unless it absolutely has to. If it does, I sure don't want to be stuck going to a group that goes against the spirit of the GPL (by punishing users who redistribute by cutting them off from future access) and which sues prominent members of the community essentially for pointing this out; which pretty much ensures I will be stuck going to them for support and not redistributing their work as I am permitted to under the GPL. No thanks to that kind of coercion!
I too am curious what the technical basis is for the patch but your assertion that drama is irrelevant is dangerous when applied to community projects that exist because of the goodwill of their members. grsec should be called out, when mentioned, because of their demonstrated ability to make the code they do produce less than useful because of the encumbrance it carries due to its origin.
None of this has anything to do with my question and I'll ask that you not use my comments as a coat rack to hang your unrelated concerns about grsecurity off of.
I'm asking a research question, not a user question.
I was responding to your comment that the concerns raised by GP were drama and irrelevant. They are relevant to this subthread. I've said what I wish to say on the matter as well.
Oh, please. If you want to throw stones about behavior in the Linux kernel you're going to need to throw them at a hell of a lot more people than Grsec - there's decades of shit piled up.
Linus has always been a total blowhard when it comes to... everything, but in particular when it comes to security. I wouldn't take his opinions too seriously on the matter.
The fact is that grsec still maintains the state of the art for memory safety mitigations.
grsec may maintain state of the art, but it doesn't matter if one can't use their code because of the potential for breakage and toxic licensing conditions under which one would have to use it should one choose to do so. I also would not want to support a group which,according to the OSI, violates the spirit of the licence upon which their state of the art work is based - and which certainly violates generally accepted community norms.
Anyone seen this new ReturnProtectorPass also upstream at clang? It was written end of 2017 AFAIR, that when when we last discussed this here.
CFI still looks more promising to me though. Protecting the CALL part. But this is better than the old gcc/clang stack cookie of course, protecting RET.
It depends on the function. Many things will contribute. If your CPU can keep the cookie in cache then loading it repeatedly will relatively fast compared to hitting main memory. If your branch predictor can figure out the jmp over the int3 instructions quickly then that will also be fast. If the function is very short then the retguard stuff will add relatively more instructions to the function so will have a larger impact than if the function was long, etc.
I found that the runtime overhead was about 2% on average, but there are many factors that contribute.
I made a program called Meta-CVS in 2002 that stores a versioned directory structure with permissions and symbolic links along with the files in an ordinary CVS repository.
The need to secure repos was identified by Karger in MULTICS evaluation. It was a requirement in TCSEC security certification. A great summary of issues is below by David A. Wheeler:
I actually wasn't clear at all. To me, it's more about code browseability, and I've found it much easier to splunk through codebases for fun in a GitHub-like interface. (Not much to do with the developers' workflows, I suppose)
That's really where it's coming from, wasn't really trying to be snarky or anything.
If you can say "easily" for this, you can't have tried actually importing the OpenBSD CVS repository into git. cvs2gitdump doesn't do too bad of a job, but doesn't attempt tags and branches. I haven't found another conversion tool that gets anywhere close.
I'm not suggesting that CVS to Git is easy. I'm just saying that Github doesn't present any kind of barrier to adoption of Git (in any project, not just OpenBSD).
No, Facebook and Reddit, big as they are, don't make their own browsers and are still individually minorities in the grand scheme of the world wide web.
Git is used by a comparatively small captive audience; most git users are invested in GitHub. MS is well positioned to run their "embrace, extend, extinguish" play if they wanted to.
It works for them. Changing would be a lot of work for little benefit.
Note that there is a mirror on github at https://github.com/openbsd and to my understanding, developers who prefer git use that, but the official source tree is in CVS.
You could say the same abouf most stuff OpenBSD rewrites for improved security. Many of those things aren't even critical. They just do thd rewrites as part of code maintenance. Then, they use CVS instead of a high-integrity/security VCS. It's a little strange/inconsistent if compared to the general pattern of replacing old, insecure stuff.
The security of one and security of another aren't an apples to apples comparison.
The underlying security of the operating system and user applications running in it has very different risks and benefits versus the integrity of source code commits and who gets to make them.
The latter is something they're equipped to deal with without changing tools. They've decided that the costs of making that technology change aren't worth the benefits that it provides and I mostly agree.
That was my first thought as well, then I realise CVS is not maintained for more then a decade. I know not everyone likes Git but wouldn't SVN be a better solution.
Have you considered CVS may have been finished for over a decade? OpenBSD has been using it for a long time, and it clearly meets their needs, or they'd chose from one of the many other choices.
In my own experience, it's nice to use finished software, step off of the upgrade treadmill, and get to the end of the learning curve.
Are you suggesting that there is nothing about CVS that frustrates its users? Surely there is always room for improvement, even if for performance reasons.
Feature bloat, totally agree. But even for bug fixes and performance improvements, I have a hard time believing this is truly finished.
I suspect that any remaining user frustrations are either issues with fundamentals of the design, impossible to fix without frustrating other users, or don't rise up to the level of a bug report, let alone a patch.
Based on the timeline of CVS, I doubt there are that many large performance issues that can be fixed without significant risk of breaking. In my experience, CVS was primarily limited by network and disk I/O, both if which ate generally much improved since the time of active CVS development.
Keep in mind that the effective scope of CVS is also shrinking as many users move on to other software; that means any issues are less likely to surface.
I vaguely remember the OpenBSD people started a rewrite of CVS at one point. I am not sure where that went, but looking at other things they have rewritten (NTP, SMTP, HTTP, ...), I would be surprised if that did not work out.
It's not especially active, but you can see the last change sets were in the past year, so "not maintained for more than a decade" doesn't apply to what they're using.
That is not a fork, that is OpenCVS which is a brand new, from-scratch implementation. It's not yet being used to host the OpenBSD code (but some AnonCVS mirrors use it).
It would certainly jive with John Gilmore's story on how the NSA worked through the standard bodies to keep IPSEC easily exploitable by making the design too difficult to implement properly:
Their behavior around Simon & Speck and how they refused to reveal details on how exploitable they could be also seems to be similar to their previous tactics.
This is why it's worrisome that Google intends to implement Speck in Android and have pushed it to the Linux kernel, too.
Don't have all my sources on hand, but the last time I looked in to this the general conclusion I came to was that there's evidence to suggest that someone was in fact paid to put vulnerabilities into the IPSec stack of OpenBSD. But there was no evidence to suggest that those vulnerabilities ever got written or if they were written that they ever made it into the source tree.
I believe OpenBSD conducted an audit of their tree when rumours of an IPSec backdoor started and didn't find anything alarming.
Pretty ancient stuff to bring up, especially in this context. Here's the last denial I recall by one of the people accused of planting backdoors in OpenBSD. Note the date.
It appears that there is a continuous audit of source code. So, even if a malicious hole was planted, it ought to be discovered in the years of repeated auditing. Cheers to OpenBSD!
The issue has been discussed many times on HN. My guess is that people don't want to revisit it (and I don't know enough off the top of my head to write a good answer). Look at HN history and you can find much of what you need.
This makes ret instructions fairly hard to use for rop purposes. Unlike the original design, which xor'd [rsp] directly, this new approach preserves return prediction so it should have a lesser effect on performance. With the changes to reduce polymorphic gadgets in place, this should make ROP attacks significantly less palatable. Also, in the original design, an arbitrary leak made rop attacks feasible as you could just place xor-encrypted return addresses on the stack. With the new design, you need repeatable register control too, assuming the temp register isn't spilled, which raises the bar quite a bit.