Hacker News new | past | comments | ask | show | jobs | submit login
Address Sanitizer Internals (epita.fr)
153 points by todsacerdoti 7 months ago | hide | past | favorite | 29 comments



One thing this explains is why ASan has false negatives. It's a great tool, but the typical comment that it fully mitigates memory safety issues is just not true (even assuming your tests actually trigger the memory safety bugs, which unlike eg code coverage there's no knowing if you achieved or not)


I've never seen anyone claim that.


Some examples from 1 minute with Algolia:

https://news.ycombinator.com/item?id=37479651

“All decent C compilers have compilation options so that at run-time any undefined actions, including integer overflow and out-of-bounds accesses, will be trapped.”

“Despite the hype, by default Rust is not safer than C compiled with the right options, because the default for Rust releases is also to omit many run-time checks.”

https://news.ycombinator.com/item?id=25922430

Upthread someone asks for a “-safe” flag which makes C a safe language. The reply is “It's called AddressSanitizer. You enable it with the compiler flag -fsanitize=address.” Several replies ensue pointing out how wrong this is.


It comes up a lot in HN C++-related comment threads, for starters


I frequently bring up ASan on HN. it's a great way to mitigate C and C++'s shortcomings. But it's not a panacea and unlikely to be described that way here without swift rebuttal.

C or C++ w/o ASan and UBSan is like skydiving w/o a parachute.


I wouldn’t say they explicitly mention ASan, but in general you will see certain well known C++ developers/community members insist that with a set of sanitizers you won’t have to worry about the kind of things safety focused programmers would like added to C++, all the time never mentioning false positives.


ASAN is only a probabilistic sanitizer, but adding deterministic checks, like out-of-bounds checks or integer overflow checks, is the same in C/C++ compiled with the appropriate options as what is done in any programming language where these checks are done by default.

In that case there are no false positives or negatives.


Pray tell: what magic C/C++ compiler options do I add to enable deterministic OOB checks that never produce any false positives or negatives?


If you had bothered to search the gcc or clang manual, you would have found e.g. "-fsanitize=bounds-strict".

There is nothing magic about this option. With it C or C++ is compiled exactly like any other programming language where array bounds checking is implicit.

That means that whenever a new value is computed for a pointer or index that will be used to access an array, the value is compared with the bounds associated with that array.

Such a comparison cannot give false positives or negatives, any address is either within bounds or outside bounds.

This is something that is completely independent of the programming language. Out-of-bounds checking has nothing to do with the syntax or with any explicit features of a programming language. It is just a compilation technique, which can be applied or it can be omitted at the compilation of any programming language, regardless whether the language is C or C++ or Ada or Rust.

The only difference is that in better programming language specifications it is required that any conforming compiler must by default insert bounds checks, while in the C and C++ standards the behavior of the compiler is unspecified and the existing compilers have a bad default behavior, so it is the responsibility of the programmer to use the right compiler options.


Weird, the following program compiles and runs without complaint under `-fsanitize=bounds-strict,address,undefined` and outputs the rather fetching output `��1�I��^H��H���PTE1�1�H�Ǧ@`:

    #include <stdio.h>

    int main(int argc, char **argv) {
        printf("%s\n", argv[-8]);
    }
(https://godbolt.org/z/ef8KMGnce - feel free to try other values to dump out environment variables and other stack crap)

Oh, but maybe that's because the compiler has no model of how `argv` works. Fine, try this?

    #include <stdio.h>
    
    int foo(volatile char *x) {
        return x[0] + x[-64];
    }
    
    int main(int argc, char **argv) {
        char x[1] = {42};
        printf("%d\n", foo(x));
    }
(https://godbolt.org/z/TMzehfGah)

Happily loads way out of bounds, no problem whatsoever - no runtime error, no sanitizer complaints, nothing.

Sanitizers are great, but they are not perfect. They are not a replacement for real array bounds checking; other languages that do real bounds checking do so by carrying the bounds with the object which is categorically impossible under the standard C ABIs.


You did not use arrays, so there were no bounds to be checked.

The C language indeed allows the use of pointers having arbitrary values that cannot be checked in any way.

However it is trivial to avoid the use of such pointers and any decent programmer will never use such pointers, because they are never needed.

Unfortunately, it is difficult to forbid the use of such pointers, because there are too many legacy programs.

That however cannot be an excuse for any programmer who is writing a new program. If someone uses pointers in such a way, that cannot happen through an unwilling mistake, so it is their fault and they have no right to blame the programming language.


New programs are unfortunately bound to old APIs. There's not much you can do here to fix the problem.


If it comes up, than that must be due to frequent confusions between ASAN and the many other completely different kinds of sanitizers, like the array bounds checker.

Unfortunately the documentation for the great number of sanitizers available in gcc and clang is both voluminous and incomplete.

For all of them their internals should have been very clearly documented, like in this article about ASAN.


I've never seen anyone claim that asan fully mitigates memory safety issues in C++. Perhaps you could link to one?


I've never seen that particular claim either, but I did previously believe that asan would reliably detect an out-of-bounds write if and when it occurs.

So I learned something new from the OP (that this type of false negative is possible).


ASAN, i.e. "-fsanitize=address" is a completely different and unrelated sanitizer than checking out-of-bounds accesses, like "-fsanitize=bounds-strict".

Checking out-of-bounds accesses must detect any out-of-bounds access done at run time. It is done by comparing the pointers or indices used for access with the array bounds.

(At least for gcc: "Initializers of variables with static storage are not instrumented.")

Out-of-bounds access checking and integer overflow detection should normally be enabled by default by any C/C++ developer, excluding only those functions where it has been determined experimentally that disabling the checks improves measurably the performance and which have been verified very carefully to prove that such exceptions cannot occur.

Unfortunately, in gcc and clang there is a large number of compilation options related to sanitizers. Most of them should always be enabled, to remove all problems created by the laxity of the C/C++ standards. Instead of listing on the command line the humongous number of such options, many of the most useful are included in "-fsanitize=undefined", but the compiler documentation must be studied to see what else may need to be added. At least for release builds, "-fsanitize-trap=all" is usually also desirable.


If you want bounds checked C, use Dlang's -betterC. Its no use trying to convert C to something it isn't.


Yeah, I'm just responding to the guy who is probably the most internet-famous C++ hater in the world. I guess he likes to make up stuff too. The article is good.


adrian_b's comments in this thread are the kind of thing I referred to (my point wasn't that ASan specifically is presented as a panacea but rather various sanitizers in general.)


Adrian's comments are nuanced and truthful. He said gcc offers an out-of-bounds sanitizer and listed some limitations. He does not say it makes C++ memory safe.


There’s nothing remotely nuanced nor truthful about this comment: https://news.ycombinator.com/item?id=40696242

He’s saying that with a compiler flag you can get the same kind of safety as in Ada or Rust. Later downthread he clarifies to say that you only get this safety if you stop using pointers, which he considers “trivial” to do.

This is the same guy who commented a few months back saying the following:

“All decent C compilers have compilation options so that at run-time any undefined actions, including integer overflow and out-of-bounds accesses, will be trapped.”

There is, put simply, no flag that will trap on “any undefined actions”, period. That’s not what sanitizers are capable of. That’s the kind of overhyped comment that isn’t helpful when trying to understand what sanitizers are actually useful for.


Thank you for your truthful and not unnecessarily nuanced comment!


This is great! I found these videos helpful, too: https://youtu.be/Tl1uZ7FBwFQ

Does anyone know of a good explanation of HWAddress Sanitizer internals?


There are multiple versions of HWAsan.

One for ARMv8 with Top-Byte-Ignore: you can use the top byte of memory addresses to store a tag.

When you allocate memory you return the "tagged" pointer and internally store "this region has this tag".

When you dereference a pointer, you check that the tag matches what you expect in your internal data structure.

With memory tagging extensions you can do something similar but the checks are performed by the processor.


Who sanitizes the sanitizer? One of the most hilarious bugs I've previously seen is when someone found a memory out-of-bound access inside the run time support library of Asan.


"For this article, you’ll need the following knowledge:

Basic C understanding (Memory, Stack, Heap, Syscall)."

Obviously, since C doesn't prescribe any kind of heap, stack or syscall behavior (or if they even exist), I assume the author meant something like "Basic understanding of how C is often implemented on certain operating systems and hardware".


Yes, because asan only makes sense in the context of specific (kinds of) implementations.


sanitizers are a constant source of pain


And just like pain, they show you where the (likely) problem is.

If you didn't have pain you'd still get the same damage to the body, you just wouldn't be aware.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: