Control-flow integrity in Linux 5.13

yosefk · on May 28, 2021

This CFI implementation makes &func be different values in 2 different loadable kernel modules. C says they should be the same value. What do the language lawyers among the compiler writers, who explain why the execution of an entire program is rendered meaningless by undefined behavior, tell us about this? What is the definition of the language compiled with CFI enabled (it's not C but some close relative)?

KMag · on May 28, 2021

I'm not a language lawyer, but I'm pretty sure with x86 and x86_64 ELF dynamic symbols, &func is the address of the program linkage table (PLT) entry for the dynamic symbol, in the current ELF executable/library.

The executable has its own PLT. Each library has its own PLT. The executable has its own global offset table (GOT). Each library has its own GOT. A given dynamically linked function will have one PLT entry and one GOT entry in each library that calls it. After the function's actual global entry point has been resolved, its address will be in the GOT entry. However, symbol resolution can be done lazily, and we don't want taking the address of a function to potentially perform the lookup, so I think the function address is just the start of the PLT entry.

For instance, if the executable and several libraries it loads all call printf, they'll each have their own tiny machine code stub (PLT entry) that (after dynamic symbol resolution) just jumps to the actual printf implementation. The executable will use the start of its PLT entry for printf as its address for printf. Each library will use the start of its PLT entry for printf as its address for printf.

The PLT entry is a tiny machine code stub within the same library or executable that does an indirect jump to the address held in a global offset table (GOT) entry. For lazily resolved symbols, the GOT is initialized to the address of a stub that does the dynamic symbol resolution and overwrites the GOT entry with the correct address (and then jumps to that address). Any later function call will still hit the PLT and jump to the address in the GOT, but the GOT will now point to the actual start of the function.

pritambaral · on May 28, 2021

So if I take the address of printf in one library, and ask another library to compare it with their address of printf, will they not be equal?

andrewf · on May 29, 2021

I just tried to make this work and couldn't. I might be missing something? gcc 7.5, Ubuntu 18.04, x86_64. The third-from-bottom line is two libraries showing the same address.

  $ cat *.h *.c
  // common.h
  #include <stdio.h>
  typedef int (*printf_ptr)(const char * restrict format, ...);
  printf_ptr firstA();
  printf_ptr firstB();
  printf_ptr secondA();
  printf_ptr secondB();
  
  // first.c
  #include "common.h"
  printf_ptr firstA() { return &printf; }
  printf_ptr firstB() { return &printf; }
  
  // second.c
  #include "common.h"
  printf_ptr secondA() { return &printf; }
  printf_ptr secondB() { return &printf; }
  
  // third.c
  #include "common.h"
  int main() {
   printf("%p %p %p %p\n", firstA(), firstB(), secondA(), secondB());
   firstA()("hello world %d\n", 1);
   secondA()("hello world %d\n", 2);
   return 0;
  }
  
  $ make
  gcc -fPIC -c *.c
  gcc -shared -o first.so first.o -lc
  gcc -shared -o second.so second.o -lc
  gcc -o atprintf third.o ./first.so ./second.so
  ./atprintf
  0x7f6d24e08f70 0x7f6d24e08f70 0x7f6d24e08f70 0x7f6d24e08f70
  hello world 1
  hello world 2

KMag · on May 29, 2021

Ahh... I was wrong! There's one detail I wasn't aware of: if you take the address of a dynamic function, it appears to disable ELF lazy dynamic symbol binding for that function, so that it's guaranteed the GOT entry has been resolved to the global function address before main() is entered. It then just unconditionally uses the GOT entry as the function address.

If we print the address of main and never take the address of printf, we get lazy symbol resolution for printf like I expected:

    #include <stdio.h>
    int main() {
        printf("main is at %p\n", main);
        return 0;
    }

objdump --section .plt -d a.out shows 2 PLT entries:

    0000000000001020 <.plt>:
        1020:       ff 35 e2 2f 00 00       pushq  0x2fe2(%rip)        # 4008 <_GLOBAL_OFFSET_TABLE_+0x8>
        1026:       ff 25 e4 2f 00 00       jmpq   *0x2fe4(%rip)        # 4010 <_GLOBAL_OFFSET_TABLE_+0x10>
        102c:       0f 1f 40 00             nopl   0x0(%rax)

    0000000000001030 <printf@plt>:
        1030:       ff 25 e2 2f 00 00       jmpq   *0x2fe2(%rip)        # 4018 <printf@GLIBC_2.2.5>
        1036:       68 00 00 00 00          pushq  $0x0
        103b:       e9 e0 ff ff ff          jmpq   1020 <.plt>

That jmpq *0x2fe2(%rip) in the second PLT entry is an indirect jump through the prnitf GOT entry. The printf GOT entry is actually initialized to point right back at the pushq $0x0 inside pritf@plt. That pushes 0 on the stack so the dynamic symbol resolution knows which GOT entry it's lazily resolving. The jmpq 1020 <.plt> jumps to the first PLT entry, which then uses the 0 at the top of the stack to know it's resolving the printf GOT entry.

But, if we ever actually take the address of printf, then printf ceases to be a lazily bound ELF symbol:

    #include <stdio.h>
    int main() {
        printf("printf is at %p\n", printf);
        return 0;
    }

Note that we lose the dynamic resolution PLT stub for printf (objdump --section .plt -d a.out):

    0000000000001020 <.plt>:
        1020:       ff 35 e2 2f 00 00       pushq  0x2fe2(%rip)        # 4008 <_GLOBAL_OFFSET_TABLE_+0x8>
        1026:       ff 25 e4 2f 00 00       jmpq   *0x2fe4(%rip)        # 4010 <_GLOBAL_OFFSET_TABLE_+0x10>
        102c:       0f 1f 40 00             nopl   0x0(%rax)

And gcc -O2 -S main.c shows it's just unconditionally loading the GOT entry (printf@GOTPCREL) to use as the function address. (Note the dissasembly shows printf@PLT, but objdump doesn't show this PLT entry. I guess the linker does some link-time optimization there to remove the actual PLT entry.)

    main:
    .LFB11:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        movq    printf@GOTPCREL(%rip), %rsi
        leaq    .LC0(%rip), %rdi
        xorl    %eax, %eax
        call    printf@PLT

https://refspecs.linuxfoundation.org/ELF/zSeries/lzsabi0_zSe... (section named Function Addresses) mentions that things (at least for IBM zSeries) basically work as I originally expected, but further vaguely mentions some special steps are taken to make function addresses compare as expected. It doesn't specifically mention disabling ELF dynamic symbol resolution.*

andrewf · on June 1, 2021

Thanks for the detail! I did manage to see different values for &printf with DLLs on Windows. I was inspired by a long-ago port of a codebase to a newer Visual C++; the third-party binary DLLs kept importing the old C runtime and using its malloc/free, which complicated things.

I put a __declspec(dllexport) on everything in my header file then (abridged):

  C:\Users\andrew\Desktop\winlink>cl /LD first.c
  Microsoft (R) C/C++ Optimizing Compiler Version 19.16.27035 for x64
  Copyright (C) Microsoft Corporation.  All rights reserved.

  /out:first.dll
  /dll
  /implib:first.lib
  first.obj
     Creating library first.lib and object first.exp
  
  C:\Users\andrew\Desktop\winlink>cl /LD second.c
  
  C:\Users\andrew\Desktop\winlink>cl third.c first.lib second.lib
  
  C:\Users\andrew\Desktop\winlink>third.exe
  00007FFE38411080 00007FFE38411080 00007FFE382F1080 00007FFE382F1080
  hello world 1
  hello world 2

If I statically link the three modules together on Windows, then I don't see the different pointers.

bombela · on May 28, 2021

Correct. They won't be equal. Since each library has its own table.

KMag · on May 29, 2021

Except I was wrong... sorry. I haven't been able to find documentation that lazy ELF symbol binding is disabled when taking the address of an extern fuction, but using objdump to dump the .plt section, it's clear printf is no longer lazily bound when you need to take its address. Instead, the entry is marked to be resolved at load time, and the global function entry point is just read directly from the GOT entry.

Part of my assumption that the PLT entry was used as the function address came from trying to figure out why the AMD engineers didn't include an ip-relative indirect addressing mode for the call instruction when they designed x86_64. An ip-relative indirect call could directly call through the GOT and avoid wasting an instruction cache line on the PLT entry. The PLT entry would still be used once for lazy symbol binding, but after that wouldn't be used and wouldn't cause any more cache evictions. Such an indirect call would need to be broken down into several micro-ops internally, but would save instruction cache space.

I did a bit of thinking and came to the conclusion that the PLT entry was still necessary for taking the function address of a lazily bound ELF symbol, so such an addressing mode would almost never be used.

Now that I see the PLT entry is actually never used for taking the function address, I'm a bit surprised that AMD when designing long mode (x86_64) didn't include an ip-relative indirect addressing mode for the call instruction. I'm just a mechanical engineer by training who's way too much self-taught about software, so I'm sure the AMD hardware engineers (and maybe also RISC-V folks... I'm less familiar with RISC-V addressing) had very good reasons for not having ip-relative indirect function calls. My best guess is that the complexity increase wasn't worth it and/or there was some hidden performance cost that's not obvious to me.

bcoates · on May 28, 2021

Loadable modules aren't part of the C spec. Whatever the meaning of &func in 2 different modules potentially compiled by 2 different compilers is is specified by the ABI.

CFI is a breaking change to the contract the Linux kernel makes with kernel modules, but it doesn't have anything in particular to do with C the language.

derefr · on May 28, 2021

> What is the definition of the language compiled with CFI enabled (it's not C but some close relative)?

I think I saw mentioned, in the comments of a recent HN post about undefined behavior in C, that OS kernels aren't using "standard C", as they use a bunch of compiler flags to constrain what the compiler does in the face of UB, contrary to what the standard says the compiler is allowed to do. As such, OS kernels have always kind of been using a "close relative of C." (Same syntax, slightly different semantics.)

Joker_vD · on May 28, 2021

Could you elaborate the scenario a bit? Two different loadable kernel modules would define two different (even if named the same) functions "func" so "&func" has to be two different pointers... or do you mean that if that "func" was defined in the kernel, and both of those modules are linked against the kernel dynamically, then "&func" should yield the same value? But even now they don't even without CFI: when "func" is defined in the dinamically-loaded library, the "&func" value is the pointer into the (calling object file's) PLT.

twic · on May 28, 2021

This is a bit like defunctionalization:

https://blog.sigplan.org/2019/12/30/defunctionalization-ever...

It's basically replacing a function pointer with an enum. The kernel version is particularly simple, because the enum values don't have fields, they are a pure discriminant.

Although I think the integritized function pointers aren't enum values or indices into the jump table, they're pointers into it. I think they could be indices; I'm not sure if there's a reason this is a bad idea.

DSingularity · on May 28, 2021

I am wary of any claim to CFI without secure tracking of the call-stack to verify the return destination. You need to compare return destinations against a shadow call-stack. If you dont protect this call stack then attackers will just evolve slightly. You can encrypt/decrypt the return address grsecurity style but we know that this can still be bypassed.

jnwatson · on May 28, 2021

It looks like backwards-edge protection is in a different patch. Backwards-edge is a bit easier to implement.

DSingularity · on May 28, 2021

Well I am not sure I agree. It seems to me that there are significant security complications due to the need of making a runtime comparison to determine the integrity of the return destination. How are we to be convinced that the value at the top of the shadow-stack is not attacker-controlled?

ndesaulniers · on May 28, 2021

An attacker could modify the shadow stack, it's just rather difficult to find where it's randomly placed and would require arbitrary read/write capabilities. The kernel zeros out the register used to write to the shadow stack as soon as possible after use. It's not impossible to defeat, but does raise the bar significantly.

I encourage you to read the commit message of shadow call stack kernel patches, which as another commenter notes is only backwards edge protection; CFI is forwards edge protection.

agumonkey · on May 28, 2021

do you know who works on such topics ?

DSingularity · on May 28, 2021

I’m not sure what you mean. I’ve read a variety of papers and articles on this topic. Are you referring to the user that replied to me?

ufo · on May 28, 2021

Does anyone know which tools the kernel developers used to identify all the places in the kernel that use function pointer equality?

Changing how all the function pointers behave is a spooky change to make so they must have had some tools to help them find all the tricky places, right?