Ginseng: Keeping secrets in registers when you distrust the operating system

dmitrygr · on April 5, 2019

This seems completely nonsensical! The data in registers hits memory at context switch time and they do nothing to stop that. So this provides no protection at all...

ajross · on April 6, 2019

It's running in a TEE/SGX/TrustZone enclave, so this is inherently firewalled from routine OS code already.

Whether this is worthwhile, or even works without holes, is sort of an open question. I agree it's sounds a little heavy on the serpent fat, but the technical promise is definitely achievable.

dmitrygr · on April 6, 2019

No. The code they claim to protect runs in userspace not in TEE

staticfloat · on April 5, 2019

The second sentence of the article is:

    It arranges things such that this sensitive data is only ever in the clear in registers (never memory),
    and is saved in an encrypted memory region called the secure stack on context switches

dmitrygr · on April 5, 2019

Not in their example. Data ends up in x0 and x1 and they don't talk about how they do this magic secure context switch. What stops the os from taking an irq just when they placed secret values into x0 and x1?? Where do they imagine x0 and x1's values will go when this irq is taken?

CodeArtisan · on April 5, 2019

> What stops the os from taking an irq

From the article

For data confidentiality, in addition to the call stack management described previously, Ginseng must also intercept all exceptions to save sensitive registers to the callstack before the exception can be handled by the OS. Ginseng intercepts exceptions using dynamic trapping. A NOP instruction is inserted at the beginning of the exception vector code in the kernel, and replaced by a call to the secure monitor at runtime when sensitive data enters registers. (Once the registers are clear of sensitive data, the NOP is restored). Once the OS serves the exception, control is handed back to the app. GService manages the return address to ensure we resume at the correct point in the sensitive function.

olliej · on April 5, 2019

The original commenters point is that the OS does not need to send an exception to the process before reading content — any interrupt (software or hardware) hits the kernel first. The kernel can then happily pull the content of any registers before switching the context back to the target app.

Basically the threat model seems to be “malicious kernel”, but if the kernel is actually malicious then it can do whatever it wants before the target has an opportunity to “protect” itself

CodeArtisan · on April 5, 2019

interrupt or exception, the handling mechanism remains the same: The CPU jumps to a handler routine by retrieving its address from a table. On Intel x86, Both interrupt and exception handlers are stored in the same table.

From Intel x86 manual

To aid in handling exceptions and interrupts, each architecturally defined exception and each interrupt condition requiring special handling by the processor is assigned a unique identification number, called a vector number. The processor uses the vector number assigned to an exception or interrupt as an index into the interrupt descriptor table (IDT). The table provides the entry point to an exception or interrupt handler (see Section 6.10, “Interrupt Descriptor Table (IDT)”).

The allowable range for vector numbers is 0 to 255. Vector numbers in the range 0 through 31 are reserved by the Intel 64 and IA-32 architectures for architecture-defined exceptions and interrupts. Not all of the vector numbers in this range have a currently defined function. The unassigned vector numbers in this range are reserved. Do not use the reserved vector numbers.

Vector numbers in the range 32 to 255 are designated as user-defined interrupts and are not reserved by the Intel 64 and IA-32 architecture. These interrupts are generally assigned to external I/O devices to enable those devices to send interrupts to the processor through one of the external hardware interrupt mechanisms (see Section 6.3, “Sources of Interrupts”).

https://en.wikipedia.org/wiki/Interrupt_vector_table

https://en.wikipedia.org/wiki/Interrupt_descriptor_table

olliej · on April 5, 2019

The IVT is controlled by the OS, so a malicious kernel can force whatever IVT content it wants.

But it's still not relevant, as the kernel can arbitrarily halt execution of any process at any point, and therefore can read the content of all registers whenever it wants them.

Again, if your attack model is a malicious kernel nothing you're doing in user mode is going to protect you. If you're using kernel APIs to install protections against that, you're still dealing with a malicious kernel that can ignore or wrap whatever you do.

If you're trying to mitigate/protect against kernel bugs that's a different threat model, but the same general problems exist only by accident so are less likely to leak useful information.

roblabla · on April 5, 2019

> But it's still not relevant, as the kernel can arbitrarily halt execution of any process at any point, and therefore can read the content of all registers whenever it wants them.

Not if the kernel can't handle interruptions? The only way for the kernel to "steal control" from the process is through an interruption (software or hardware). If all interrupts are first handled by the secure monitor, which handles saving the registers securely, then it's OK.

olliej · on April 6, 2019

Userspace obviously isn't directly impact the "secure monitor", e.g. hypervisor, so instead what you're saying is that we have two operating systems:

1. the os in charge of the hypervisor

2. the os in charge of the virtualised system.

We're saying that we don't trust 2. so we're going to get the kernel from 1. to intercept interrupts. But that means we already have a trustworthy kernel. The one running the hypervisor, so why aren't we just using that?

vertex-four · on April 6, 2019

Because one might be small and formally proveable and the other might be Linux.

dmitrygr · on April 5, 2019

that is also complete nonsense. I could have a whole other core in the system (nowadays computers have more than one), which can replace the vectors back with the original ones and not the "secured" ones.

In fact, if I was a malicious kernel, I would do precisely that

This whole idea will not work unless your hypervisor is a real complete hypervisor. This attempt at a halfway-hypervisor-lite is doomed to failure for this and many other reasons.

CodeArtisan · on April 6, 2019

Ginseng modifies the kernel and protects sensitive parts by using the CPU's trusted execution environment[1] which has a higher privilege level than the kernel. From the paper

We now describe Ginseng’s runtime protection against such accesses. The runtime protection heavily relies on GService, a passive, app-independent piece of software in the Secure world. GService ensures the code integrity, data confidentiality and control-flow integrity (CFI). It does so only for sensitive functions to minimize overhead. It also modifies the kernel at three points, when booting, when modifying the kernel page table, and when handling an exception. Since we do not trust the OS, the kernel may overwrite the modifications. However, when any of these modifications is disabled, the kernel will infinitely trigger data aborts trying to modify readonly memory, thus ensuring sensitive data are always safe.

I would have look at the source code to find more but the github repository has been deleted.

[1] https://en.wikipedia.org/wiki/Trusted_execution_environment

https://en.wikipedia.org/wiki/ARM_architecture#Security_exte...

https://en.wikipedia.org/wiki/Software_Guard_Extensions

dmitrygr · on April 6, 2019

No amount of modifications in the kernel will make this safe. What if I load a module that fixes my IRQ handlers? Or do they forbid modules? What if I find an exploit in the kernel? Or did they somehow make a 100% exploit-free kernel?

This sort of thing is exactly why TEE exists. It cannot be done half in userspace half in TEE

CodeArtisan · on April 6, 2019

From what i have understood, Ginseng sets sensitive memory regions to only be accessed by the TEE then unmap these regions from the kernel memory space. If your kernel module try to read/write/map these regions, a memory violation exception is raised and then handled by the TEE.

dmitrygr · on April 7, 2019

That literally means that every time you take an interrupt, you also take an extra fault into the hypervisor (since your CPU cannot read the vector, because it has been protected). In that case, forget any ideas of speed. The whole point of hardware assisted virtualization was to prevent that situation. These guys suggest going decades back in terms of performance. No thanks.

amluto · on April 6, 2019

Unless I entirely misunderstand the paper, they’re using a hypervisor. Which makes me wonder what the point is.

dmitrygr · on April 5, 2019

Intercept every irq?!?

Ouch, my performance!

.

Write to kernel code pages?!?

Ouch, my cache! Oh, and my security!

nine_k · on April 5, 2019

Interception only occurs when the sensitive data are in the registers. That is, rarely.

And yes, you have to trust Ginseng more than you trust the OS, it's the whole point and an explicit initial premise.

amluto · on April 5, 2019

This seems far more complicated than necessary. This design assumes that a trusted execution environment is available, and it also assumes that a trusted hypervisor is available. If you have both of those, then why can’t you just run your software in an enclave directly protected by the trusted hypervisor?

amluto · on April 6, 2019

I read it more carefully. Egads! They invented their own hypervisor that essentially poorly imitates Xen’s PV mode. I wouldn’t trust this thing at all.

Here are some likely holes:

They prevent the kernel from mapping “sensitive” code into its own address space, but they don’t seem to prevent the kernel from mapping it into a user address space with write permission, which is just as bad. (Also, the kernel already maps most memory writably in the direct map, and they don’t mention what they do about this, so I would guess that they have a bug.)

They don’t mention protecting sensitive code from DMA.

The kernel can corrupt use code execution in many ways, such as corrupting non-“sensitive” registers. They don’t seem to have a rigorous model to defend against this.

They protect the IDT, but I don’t see anything about protecting the SYSCALL MSRs. The kernel could redirect SYSCALL to skip the magic hook. This might not matter if there are no syscalls in sensitive regions.

jlrubin · on April 5, 2019

I think that every design that relies on trusted hardware needs to be much more up front about it (e.g., put it in the title) so the paper can be less disappointing.

I don't understand academia's obsession with these security theater devices -- it would make sense to see it in industry as a hyped buzzword, but I think it makes for weak scholarship.

edit: as some note below, it is a TEE not a TPM or a SE. I don't think the distinction should distract from the point, so I have amended above.

subway · on April 5, 2019

This paper doesn't mention anything about TPM. It mentions TEE (TrustZone, really) and SGX.

TPM is effectively a smart card hanging off a system bus, usually on a separate package.

SGX and TrustZone are CPU features enabling a "secure" run time environment separate from the main run time env.

munchbunny · on April 5, 2019

What makes TPM's security theater?

Also where does it say TPM? The paper references Intel SGX and ARM TrustZone, which to my knowledge are both on-CPU, whereas TPM's generally sit separately on the motherboard.

jlrubin · on April 6, 2019

I don't think the distinction matters that much in this context, so I amended above.

I think they're theater because they do something complicated with ambiguous security benefits. And even if they are used correctly, flaws in the designs like spectre/meltdown/foreshadow/rowhammer/etc etc compromise these use cases.

Certhas · on April 6, 2019

Because the secret is never in memory, wouldn't it exactly be safe against all the attacks you mention?

jlrubin · on April 6, 2019

Not quite -- for example, the way that registers are implemented in modern processors is super complex.

Here's a sketch of an attack against this:

the registers don't actually get overwritten they get renamed in modern CPUs. That means the old data is still there, just not logically accessible by non-pipelined instructions. It's possible the data is still sitting in the registers, with an old epoch name. The predictors will be predicting branches and other things based on those registers. So by issuing the right instruction and measuring delay, you might be able to create an oracle to see if you guess a byte correctly from the stale data.

I also don't think the secrets are actually out of memory, they are out of your memory, where your is the kernel and the user space but not necessarily the TEE Secure World memory. This is, of course, the same RAM but protected by a page table.

""" To ensure code integrity the kernel page table is made read-only at boot time. The kernel is modified to send a request to GService whenever it needs to modify the page table, GService honours this request only when doing so would not result in mapping the code pages of a sensitive function. The kernel is also prevented from overwriting its page table base register so that it can’t swap the table with a compromised one. """

Of course, this sounds like the perfect kind of attack for rowhammer to break. Just overwrite the page table by doing reads and then overwrite a sensitive function after it's been invoked once and approved, and now you can leak secrets out that way.

etc

munchbunny · on April 6, 2019

You could accomplish that with the TPM but not TEE. But with the TPM you'd have to trust the OS when you generated the secret, so what's to say that the OS isn't already compromised?

jayalpha · on April 5, 2019

Reminds me of: https://en.wikipedia.org/wiki/TRESOR

chalst · on April 5, 2019

The approach is rather closer to Zircon.

https://fuchsia.googlesource.com/zircon/

benj111 · on April 5, 2019

So assuming you don't even trust the OS. How can you be sure you aren't running in a virtual machine or something.

You have to trust the OS to set everything up, do IO etc. I don't see how its tenable to not trust the OS.

_underfl0w_ · on April 5, 2019

Very true, though this seems to plug at least one potential data leak.

WallWextra · on April 5, 2019

Wouldn't you need, in addition to patching the interrupt vector, to make sure there is no existing code running in the kernel when you restore the sensitive registers? This makes the nginx use case seem kind of unrealistic.

rrdharan · on April 8, 2019

This reminds me of the Overshadow paper which VMware published during my time there. We never ended up shipping it but it was a neat proof of concept:

https://www.cs.utexas.edu/~shmat/courses/cs380s/overshadow.p...

EDIT: I see Overshadow[14] was indeed one of the cited references.

Also, direct link to the actual paper is here: https://www.ndss-symposium.org/wp-content/uploads/2019/02/nd...

sneakernets · on April 5, 2019

> we miminize the unsafe part of GService to a small amount of assembly code

Good.

If you want as much security as you can get in an unsafe environment, you're going to need to write the entire routine yourself, sometimes on a level down to the bare metal if you have to.

_8huj · on April 5, 2019

I have not worked in C in a long while. Would you mark the assembly section as volatile to avoid the compiler & assembler doing anything to it? Is there any guarantee that the assembler will not aggressively re-optimize assembly-within-C?

aidenn0 · on April 5, 2019

Most C compilers vary from fairly to completely hands-off when it comes to inline assembly.

You can also just write it in a separate assembler file and then the C compiler does not see it.

That leaves just the linker, and most optimizing linkers will treat code outside of the purview of the compiler as a "black-box" otherwise you wouldn't be able to link with code that makes system-calls.

Avamander · on April 6, 2019

How does link time optimization affect separate assembler file?

comex · on April 6, 2019

In general, in the implementations I've seen, "link-time optimization" is a bit of a misnomer. It's more like a glorified version of `gcc -combine`. The "compiler" binary stuffs its half-finished results (IR) into fake object files, and the "linker" binary calls back into the compiler (built as a library) and sends it the IR from all the fake objects, which the compiler combines and builds into one giant, real object file. That object file is sent back to the linker, which goes on to do its normal job. (LLVM ThinLTO is a bit more advanced than that in terms of scalability and incrementality, but it maintains the same "hands-off" approach from the linker's perspective.) On the other hand, if the linker sees an object file passed to it is a real object, it doesn't send it to the compiler and handles it during the normal linking phase instead. And if you build a .s file, you always get a real object file, even if you passed -flto.

TL;DR: It doesn't affect it.

drb91 · on April 5, 2019

It's generally the linker you need to worry about, but it depends on what you're doing with the assembly.

black-tea · on April 7, 2019

You can't trust your compiler, but it should be easy enough to inspect the output of a small routine in a hex editor to verify that it hasn't been modified.

comex · on April 6, 2019

Props to the blog post for being quite well written: easy to understand, yet also thorough enough to explain exactly what Ginseng is and how it works. By "easy to understand" I don't just mean the introduction, which goes over the motivation at a high level, but also the lower-level explanation, with a handy C and assembly comparison that helps explain what the transformation actually does.

...On the other hand, the design itself seems like a pretty massive hack. The goal is to turn parts of a userland process into the equivalent of a TEE component, without having to manually separate the codebase into two pieces and set up IPC between them. But although that kind of "automagic" approach is easier to use, it also makes it really easy to write security flaws.

For instance, in the example code:

    void hmac_sha1(sensitive long key_top,
                   sensitive long key_bottom,
                   const uint8_t *data,
                   uint8_t *result) {
        sensitive long tmp_key_top, tmp_key_bottom;
        /* all other variables are insensitive */

        /* HMAC_SHA1 implementation */
    }

It's quite dangerous to say that all other variables are insensitive! It's hard to say for sure without seeing the actual implementation, but SHA-1 requires first expanding the message into a state of 80 32-bit words, before performing 80 rounds of hashing on them. If the state is treated as insensitive, another core could read it out before it actually goes through hashing, in which case the key could be easily recovered. This design might be secure if the SHA-1 function is separate and itself marks all state as sensitive, as long as the key never leaks into memory in between, but that's not how SHA-1 implementations usually work, so I'm pretty suspicious.

I tried to find the actual code to determine whether it's actually vulnerable, but failed: it's supposed to be released as open source [1], but the instructions involve downloading from a GitHub repo [2] which is currently marked as private, I guess by mistake.

...I don't really understand why Ginseng doesn't just mark all variables in a sensitive function as sensitive; it's not like memory for the secure stack is particularly scarce. That still leaves other attack vectors, though.

[1] http://www.ruf.rice.edu/~mobile/ginseng.html

[2] https://github.com/susienme/ndss2019_ginseng_arm-trusted-fir...

ryacko · on April 5, 2019

Modern chips have over a hundred registers per logical core, but either 16 or 32 you can explicitly access.

MuffinFlavored · on April 5, 2019

Do you think in the future more registers will be accessible, increasing performance?

lightsighter · on April 5, 2019

Probably not. The out-of-order execution units in the cores rely heavily on register renaming to remove anti-dependences in the instructions stream and get more instructions executing in parallel. History (so far in most cases) has shown that this is a better approach to getting good performance than having compilers or humans try to statically manage all the physical registers themselves.

blattimwind · on April 5, 2019

Your programs can potentially use all of those registers (or all those assigned to your task, if the particular core uses static partioning for registers). Unlike the compiler's work, the core does this based on real, current runtime information.

WallWextra · on April 5, 2019

More architected (i.e. program-visible) registers require more bits to encode.

bluGill · on April 5, 2019

Back when AMD designed x86_64 asked that question: they concluded that it was better only show a few registers because that was all compilers really needed (compilers writers had learned a lot of tricks from the mess that x86 was), and so they could better deal with a small instruction set that fewer registers offers for greater performance.

ahartmetz · on April 7, 2019

But AMD also increased the number of general-purpose registers from ~6 in x86 (eax to edx and the not fully general purpose esi, edi, esp, ebp) to ~16 in AMD64 (rax to r16). ARM has ~32.

MuffinFlavored · on April 6, 2019

> from the mess that x86 was

Is ARM seen as a mess? Are x86's days numbered as ARM catches up and becomes more widespread for personal computing devices?

bluGill · on April 8, 2019

ARM is much better as an instruction set, but it isn't clear if that will ever matter.

ryacko · on April 5, 2019

It depends on the complexity of the sensitive variables. This looks like a purely academic exercise.

chrisseaton · on April 5, 2019

They are accessible - they aren’t sitting there unused. They’re dynamically allocated to logical names as the program runs.

termie · on April 5, 2019

The upcoming MKTME support in future Intel processors will hopefully make this problem simpler to solve.

wahern · on April 6, 2019

AMD calls this SEV and has shipped it for some time. It doesn't help much. Here's one of the latest attack papers also with a summary of previous work: https://arxiv.org/pdf/1901.01759.pdf

toomuchequate · on April 6, 2019

What trust do we need the OS to have?

Encryption, keystrokes?

Web has made the need for desktop applications often unnecessary.

njacobs5074 · on April 5, 2019

It seems like this kinda' just pushes the issue around. For example, how do we know that the Ginseng compiler is trustworthy? Of course, there could be a trusted authority for it but I don't see how this is different from a trusted authority for an OS.

aidenn0 · on April 5, 2019

Sufficiently complex operating systems will never be trustworthy not because of maliciousness, but because of bugs.

Pick any old version of Linux and browse the local privilege escalation attacks. It seems likely that recent versions of linux have as-yet undiscovered attacks. The existence of even one means that the OS is not trustworthy in the presence of non-trustworthy applications.

As long as Ginseng is more amenable to verification than the Linux kernel, this isn't just pushing the issue around, but rather reducing the work needed.

muxator · on April 5, 2019

Ken Thompson talks about this in 1984, when giving his acceptance speech for the ACM Turing Award.

Reflections On Trusting Trust, http://cm.bell-labs.com/who/ken/trust.html