That depends on what you mean by "just like." Zig does not make sound static gua...

pcwalton · on Nov 3, 2021

It's hard to figure out exactly what you're saying, but I think you're trying to imply that bounds checks will allow for the development of some kind of novel use-after-free mitigation. Without a specific proposal I don't know why that would be the case. Zig is not meaningfully different from C++ when it comes to UAF.

pron · on Nov 3, 2021

> Zig is not meaningfully different from C++ when it comes to UAF.

Without pointer arithmetic, unsafe casts (Zig is much easier to write without such casts than C, at least), unsafe unions, and unknown buffer sizes, the set of pointers in a Zig program can be well-defined, as pointers can come into being only in very specific ways (they have a simple provenance). Because the set of pointers is well defined, it can be precisely tracked and analysed. This means that 1. pointers can be traced even with arena allocators or perhaps even other kinds of pools (provided they cooperate with the tool) and 2. dangling pointers can be detected even without being dereferenced. This is simply not something that analysers for C or C++ can do (at least not nearly as easily).

The way to think about it is that in Zig, when an object (including an allocator) is deallocated, you can invalidate the full set of pointers pointing to it, and that set it the only way of generating more pointers into it. That's not the case in C or C++. For one, pointers aren't well defined (unions); for another, a valid pointer could be used to create an invalid one.

pcwalton · on Nov 3, 2021

What you've described is a tracing garbage collector (to be pedantic, one where weak pointers are the norm, but the infrastructure and algorithm are essentially the same). In fact, I absolutely agree with you that Zig should adopt tracing garbage collection (a state-of-the-art generational concurrent one with bump allocation in the nursery), and doing so would eliminate most of my complaints about it. Unfortunately, it's unlikely that Zig will ever do this, given everything that I've seen about its design goals of being low-level with no runtime.

pron · on Nov 3, 2021

You've misunderstood me. My point was that Zig's properties allow a kind of precise analysis that is not possible (or very hard) in C or C++, and so it is not true that it is "not meaningfully different re UAF". As a simple concrete example, I hinted at a hypothetical algorithm similar to that of a tracing GC, that could be used not to collect garbage (I specifically mentioned the use of arenas), but to promptly detect all dangling pointers during testing, including those that are not dereferenced, and those pointing at arenas. That alone is already more effective than tooling you could make for C or C++. But those guarantees allow for other kinds of analysis, perhaps static analysis, that will also be more effective than what's affordable in C or C++.

So while the general approach -- of dynamic and static analysis, as opposed those of Java or Rust -- is in the same broad category of tools for C or C++, their effectiveness, due to Zig's properties, is significantly increased, and that their entire cost/benefit is different. I.e. it could find more bugs for a lower price, so much so that the approach, while underwhelming when applied to C or C++, could well compare favourably with others when applied to Zig.

(A GC -- whether tracing or ref-counting -- might well be adopted for Zig's comptime, but that's a whole other matter)

pcwalton · on Nov 3, 2021

> As a simple concrete example, I hinted at a hypothetical algorithm similar to that of a tracing GC, that could be used not to collect garbage (I specifically mentioned the use of arenas), but to promptly detect all dangling pointers during testing, including those that are not dereferenced, and those pointing at arenas.

I'm highly skeptical that there won't be too many false positives with such a tool. Systems programmers routinely create temporary dangling pointers and let the values go dead without dereferencing those pointers. It happens most every time you call free, in fact.

I also see no reason why you couldn't create such a thing for C and C++. In fact, it exists: the Boehm GC can operate in such a checking mode. The fact that everyone uses ASan instead is a strong indicator that ASan is in fact a better approach.

Finally, precise tracing GC stack/register maps are a lot of work, especially in LLVM which has poor support for them. (I heavily looked into this for Rust.) Without a serious effort (and it is a lot of work) to generate them for Zig I have to consider it vaporware.

pron · on Nov 4, 2021

> I'm highly skeptical that there won't be too many false positives with such a tool.

I'm not advocating for a specific algorithm. I merely used a hypothetical one to demonstrate that there is, indeed, a fundamental difference between Zig and C/C++, even when it comes to UAF.

> I also see no reason why you couldn't create such a thing for C and C++.

Because in C and C++ there is no precise set of pointers, and bad pointers can be created from good ones.

> In fact, it exists: the Boehm GC can operate in such a checking mode.

Which would not work effectively for the reasons I mentioned.

> The fact that everyone uses ASan instead is a strong indicator that ASan is in fact a better approach. ... I have to consider it vaporware.

So from your asserted premise that Zig is no different from C in this regard you conclude that what doesn't work well for C must not work well for Zig and use that conclusion as further evidence of your premise? Your premise is exactly what I contest. Zig is fundamentally different because pointers are known and you cannot manufacture bad pointers from good ones. To demonstrate that difference I sketched a hypothetical algorithm that could work well in Zig but not in C. To point out that my hypothetical example is "vapourware" completely misses the point.

You could try to argue that the fact that pointers in Zig are well-defined and can be created only in carefully controlled ways -- and so are fundamentally different from pointers in C -- cannot be effectively exploited, but I don't think that's an easy argument to make.

--------

(> especially in LLVM

Even if we were to talk about specific tools, there is absolutely no need to base them on LLVM. Zig is specifically designed so that backends are easy to write, and an analysis tool need not use the same backend used by the compiler for production code; in fact, switching backends is meant to be commonplace in Zig development, and it is expected that different ones will be used for development and production)

pcwalton · on Nov 4, 2021

Boehm GC's checking mode works fine in C and C++. The reason why nobody uses it has nothing to do with the fact that it's conservative and everything to do with the fact that Address Sanitizer is just plain better at solving programmers' needs. ASan is about as good as you can do as far as developer tools that find use-after-free problems in memory-unsafe languages like C/C++/Zig go. It would not be a better tool if it were precise at identifying pointers, because of the inevitable false positives that come with trying to scan the whole object graph for dangling pointers in memory-unsafe languages. ASan got so popular precisely because it tries very hard to avoid false positives.

Zig does seem to have some properties that make precise pointer identification possible. But the right conclusion to draw from this is that Zig should use a tracing garbage collector. It's well-known how to use the pointer provenance properties you're talking about to achieve UAF protection: just implement a GC! Trying to get by with things like quarantining memory forever is not going to work in production, and the reasons why Zig programs will supposedly not be vulnerable to UAF problems are unconvincing. It is going to want a GC eventually.

pron · on Nov 4, 2021

> It would not be a better tool if it were precise at identifying pointers, because of the inevitable false positives that come with trying to scan the whole object graph for dangling pointers in memory-unsafe languages.

I've repeated several times that I was merely demonstrating why Zig and C are fundamentally different when it comes to pointers. You're trying to poke holes at a straw man, and worse, you're doing that while drawing on the very premise which I'm refuting, i.e. that Zig programs and C programs are essentially the same. You could just as likely have assumed that Zig programs behave more like Rust programs (or Go programs), and at the time of deallocation there is just one pointer to the object, and voila, zero false positives.

I am not saying that a Zig program should behave like a Rust/Go/Java program, just that your arguments are begging the question. You start with the assumption that Zig and C are essentially the same and then use the resulting conclusions to shore up that very assumption. But Zig programs are about as different from C as they are from Rust.

You insist that if you're not Java or Rust then you must be C. Zig is so revolutionary precisely because it works like none of those. Now, I don't know if Zig's revolutionary design is revolutionarily good. It's far too early to tell. But basing your criticism on a false premise completely misses the mark.

Of course, it's okay to be skeptical of a new idea, just as I'm skeptical that a type-level shrine for accidental complexity is the way to go.

> But the right conclusion to draw from this is that Zig should use a tracing garbage collector.

I disagree, but that's a whole other discussion. But since you seem to claim that such a thing could exist and effectively work for Zig and not for C, it seems you accept both of my points: 1. that pointers in both languages are fundamentally different, and 2. that this can be exploited for analyses that are completely different in their effectiveness than those feasible for C or C++.