I really wish we got cc: warning: UB line: 123 file: foo.c

jcranmer · on Jan 21, 2022

How many warnings do you want for this small function?

  void oh_the_humanity(int *ptr, int val) {
    *ptr = val + 1;
  }

Off the top of my head:

* UB: ptr may be pointing a float variable. (It's not illegal to assign a float* to an int*, it's only UB when you actually dereference it with the wrong type.)

* UB: val + 1 may overflow.

* UB: potential data race on writing *ptr.

* UB: ptr may be a one-past-the-end-of-the-array pointer, which can be validly constructed, but may not be dereferenced.

* UB: ptr may be pointing to an object whose lifetime has expired.

* UB: ptr may be uninitialized.

* UB: val may be uninitialized.

As you can see, UB is intensely specific to the actual data values; it's not really possible to catch even a large fraction of UB statically without severe false positive rates.

mzs · on Jan 21, 2022

Yeah I know I get it, it's me more being wishful, but I more seriously wish at least compilers could emit a warning when they optimize something after UB:

  % cat foo.c
  #include <stdlib.h>
   
  int
  foo(int *bar)
  {
   int baz = *bar;
  
   if (bar == NULL) exit(2);
  
   return (baz);
  }
  % cc -O3 -Wall -Wextra -c foo.c
  % objdump -dr foo.o            
  
  foo.o: file format Mach-O 64-bit x86-64
  
  
  Disassembly of section __TEXT,__text:
  
  0000000000000000 _foo:
         0: 55                            pushq %rbp
         1: 48 89 e5                      movq %rsp, %rbp
         4: 8b 07                         movl (%rdi), %eax
         6: 5d                            popq %rbp
         7: c3                            retq
  % cc -O0 -Wall -Wextra -c foo.c
  % objdump -dr foo.o            
  
  foo.o: file format Mach-O 64-bit x86-64
  
  
  Disassembly of section __TEXT,__text:
  
  0000000000000000 _foo:
         0: 55                            pushq %rbp
         1: 48 89 e5                      movq %rsp, %rbp
         4: 48 83 ec 10                   subq $16, %rsp
         8: 48 89 7d f8                   movq %rdi, -8(%rbp)
         c: 48 8b 45 f8                   movq -8(%rbp), %rax
        10: 8b 08                         movl (%rax), %ecx
        12: 89 4d f4                      movl %ecx, -12(%rbp)
        15: 48 83 7d f8 00                cmpq $0, -8(%rbp)
        1a: 0f 85 0a 00 00 00             jne 10 <_foo+0x2a>
        20: bf 02 00 00 00                movl $2, %edi
        25: e8 00 00 00 00                callq 0 <_foo+0x2a>
    0000000000000026:  X86_64_RELOC_BRANCH _exit
        2a: 8b 45 f4                      movl -12(%rbp), %eax
        2d: 48 83 c4 10                   addq $16, %rsp
        31: 5d                            popq %rbp
        32: c3                            retq
  %

That's very similar to something that bit me in embedded except it was with pointer to structure. Compiler realizes I've derefed NULL and that's UB anyway so no need to do the NULL check later and merrily scribble exc vectors or whatever.

foxfluff · on Jan 21, 2022

That's a nice example. It'd definitely be nice to have a warning for this one.

Fwiw GCC does have a related warning flag (-Wnull-dereference) but I'm not sure it's made exactly for this. I believe it works based on functions being annotated for possibly returning NULL, e.g. malloc. It's also not enabled by -Wall or -W because apparently there were too many false positives: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96554

I imagine patches would be welcome. I'm guessing there are more people who like to wish for more compiler features than there are people who like to develop compilers :)

mzs · on Jan 21, 2022

edit: Thanks, I just checked and the warning doesn't work

https://godbolt.org/z/4TP1hfx4j

But your hint found -fno-delete-null-pointer-checks which does the trick

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

And -fno-delete-null-pointer-checks made it into LLVM too. It's a good to know but a little late for when I needed it, cheers :)

  % cc -O3 -fno-delete-null-pointer-checks -Wall -Wextra -c foo.c
  % objdump -dr foo.o                                            
  
  foo.o: file format Mach-O 64-bit x86-64
  
  
  Disassembly of section __TEXT,__text:
  
  0000000000000000 _foo:
         0: 55                            pushq %rbp
         1: 48 89 e5                      movq %rsp, %rbp
         4: 48 85 ff                      testq %rdi, %rdi
         7: 74 04                         je 4 <_foo+0xd>
         9: 8b 07                         movl (%rdi), %eax
         b: 5d                            popq %rbp
         c: c3                            retq
         d: bf 02 00 00 00                movl $2, %edi
        12: e8 00 00 00 00                callq 0 <_foo+0x17>
    0000000000000013:  X86_64_RELOC_BRANCH _exit

astrange · on Jan 22, 2022

If you want a general approach you can turn on ubsan in trap-only mode and see what traps have ended up in your output.

zajio1am · on Jan 21, 2022

But such code might be generated by macros (or some code generator), in which case silent elimination of unnecessary code is expected and wanted behavior.

Too · on Jan 22, 2022

In this particular case the code is wrong and can be fixed easily by swapping the assignment and the comparison.

Likewise, code generators shouldn’t be generating this faulty code.

Raising compiler error here is the only right thing to do.

There are of course more ambiguous examples, though obvious examples like the one above are sadly way too common.

UncleMeat · on Jan 23, 2022

Why can't we say that the original code is wrong then? The whole point of having something be UB rather than implementation defined is because the language committee believes that it represents a bug in your program.

MauranKilom · on Jan 22, 2022

(Or templates, in the C++ case. Think of all the inlined iterator functions...)

Dylan16807 · on Jan 22, 2022

Well if we're not super worried about implementation difficulty right now:

* 1 and 4-7: Don't worry about where the pointer goes, as long as you treat it like an int in this function. If there are going to be warnings about improper pointers, they should be at the call sites.

* 2 If overflow will either trap, wrap, or behave like a bignum, then it's not the dangerous kind of UB, so no warning by default. Consider extending C so the coder can more easily control integer overflow.

* 3 Anything could race. Out of scope, don't worry about it.

tlb · on Jan 21, 2022

You can't detect most UB at compile time. LLVM has a system to detect it a runtime. There is a significant performance penalty, but it can be useful during testing.

See https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html

astrange · on Jan 22, 2022

It’s not actually that significant compared to something like asan. You can ship software with ubsan enabled if you want to.

gpderetta · on Jan 21, 2022

Compilers will routinely warn if they can detect UB at compile time.

The problem is that except in a few trivial cases it is impossible to detect UB at compile time. Even whole program static analysis can only catch a small subset.

foxfluff · on Jan 21, 2022

Yeah, it'd be nice if we could solve the halting problem.