Hacker News new | past | comments | ask | show | jobs | submit login

Zig, a language which is explicitly aimed at the same domain as C, has an improved semantics for all of these things.

If a pointer can be null, it must be an optional pointer, and you must in fact check before you dereference it. This is what you want. Is it ok to write a program which segfaults at random because you didn't check for a pointer which can be null? Of course not. If you don't null-check the return value of e.g. malloc, your program is invalid.

But the benefit is in the other direction. Careful C checks for null before using a pointer, and keeping track of whether null has been checked is a manual process. This results in redundant null checks if you can't statically prove (by staring at the code and thinking very hard) that it isn't null. So in practice you're likely to have a combination of not checking and getting burned, and checking a pointer which was already checked. To do otherwise you have to understand the complete call graph, this is infeasible.

Zig doesn't do any of this. If it's a pointer, you can safely dereference it. If it's an optional pointer, you must check, and then: it's a pointer. Safe to pass down the call stack and freely use. If you want C behavior you can always YOLO and just say `yoloptr.?.*`.

Overflow addition and divide by zero are safety checked undefined behavior, a critical concept in the specification. They will panic with a stack trace in debug and ReleaseSafe mode, and blow demons out of your nose in ReleaseFast and ReleaseSmall modes. There's also +% for guaranteed wraparound twos-complement overflow, and +| for saturating addition. Also `@addWithOverflow` if your jam is checking the overflow bit. Unwrapping an optional without checking it is also safety-checked UB: if you were wrong about the assumption that the payload carries a value, you'll get a panic and stack trace on the line where you did `yolo.?`.

Shift operations require that the right hand side of the shift be a type log2(Type.bitwidth) of the left hand side. Zig allows integers of any width, so for a: u64, calling a << b requires that b be a u6 or smaller. Which is fine: if you know values will be within 0..63, you declare them u6, and if you want to shift on a byte, you truncate it: you were going to mask it anyway, right? Zig simply refuses to let you forget this. Addition of two u6 is just as fast as addition of the underlying bytes because of, you got it, safety-checked undefined behavior. In release mode it will just do what the chip does.

There's a common theme here: some things require undefined behavior for performance. Zig does what it can to crash your program if that behavior is exhibited while you're developing it. Other things require that you take some well-defined actions or you'll get UB: Zig tracks those in the type system.

You'll note that undefined behavior is very much a part of the Zig specification, for the same reasons as in C. But that's not a great excuse to make staying within the boundaries of defined behavior as pointlessly difficult as it is in C.




Yes, you can surely improve things from C. C is not a benchmark for anything other than footguns per line of code.

The debug modes you mention are also available in various forms in C and C++ compilers. For example ASan and UBSan in clang will do exactly what you have described. The question is, then whether these belong in the language specification or left to individual tools.


As proven multiple times throughout the computing history, individual tools are optional, and as such used less often than they actually should be.

Language specification is unavoidable when using said language.


Have you wondered why Rust or Python do not have a specification?

For a bunch of languages outside the C-centric world, specifications don't exist.



Documentation and specification are not the same things.

The intuitive distinction is that the second one is for compiler/library developers, and the former is for users.

A specification can not leave any room for ambiguity or anything up to interpretation. If it does (and this happens), it is treated as a bug to be fixed.


mwahahaha. as if there is some divine "language specification" which all compilers adhere to on pain of eternal damnation.

no such thing ever existed.


Given that one can write Fortran in any language, maybe you're right.


it's not just in debug modes. It should be the standard in release mode as well (IMO the distinction shouldn't exist for most projects anyway). ASan and UBSan are explicitly not designed for that.


Worth noting that Zig has ReleaseSafe, which safety-checks undefined behavior while applying any optimizations it can given that restriction.

The more interesting part is that the mode can be individually modified on a per-block basis with the @setRuntimeSafety builtin, so it's practical to identify the performance-critical parts of the program and turn off safety checks only for them. Or the opposite: identify tricky code which is doing something complex, and turn on runtime safety there, regardless of the build status.

That's why this sort of thing should be part of the specification. @setRuntimeSafety would be meaningless without the concept of safety-checked undefined behavior.

I would say that making optionals and fat pointers (slices) a part of the type system is possibly more important, but it all combines to give a fighting chance of getting user-controlled resource management correct.

Given the topic of the Fine Article, it's worth briefly noting that `defer` and `errdefer` are keywords in Zig. Both the test allocator, and the GeneralPurposeAllocator in safe mode, will panic if you leak memory by forgetting to use these, or rather, forget to free allocations generally. My impression is that the only major category of memory bugs these tools won't catch in development is double-free, and that's being worked on.


Well, give it a try.

If you can make it work in a way that has acceptable performance characteristics, every systems language will adopt your technique overnight.


I use rust, which already does this.


Signed overflow is officially a 'bug' in rust, it traps in debug mode but silently follows LLVM/platform behavior in release mode.

Huh, doesn't that sound familiar?


> silently follows LLVM/platform behavior

This is not the case. It's two's compliment overflow.

Also, since we're being pedantic here: it's not actually about "debug mode" or "release mode", it is tied to a flag, and compilers must have that flag on in debug mode. This gives the ability to move release mode to also produce the flag in the future, if it's decided that the overhead is worth it. We'll see if it ever is.

> Huh, doesn't that sound familiar?

Nope, it is completely different from undefined behavior, which gives the compiler license to do anything it wants. These are well defined semantics, the polar opposite of UB.


>This is not the case. It's two's compliment overflow.

Okay, here is an example showing that rust follows LLVM behavior when the optimizer is turned on. LLVM addition produces poison when signed wrap happens. I'm a little bit puzzled about the vehement responses in the comments wow. I have worked on several compilers (including a few patches to Rust), and this is all common knowledge.

https://godbolt.org/z/r6WTxGjrb


The Rust output:

  define noundef i32 @square(i32 noundef %x, i32 noundef %y) unnamed_addr #0 !dbg !7 {
    %_0 = add i32 %y, %x, !dbg !12
    ret i32 %_0, !dbg !13
  }
Let's compare like to like, here's one with equivalent C++ code: https://godbolt.org/z/Y4MnGeof4

The C++ output:

  define dso_local noundef i32 @square(int, int)(i32 noundef %0, i32 noundef %1) local_unnamed_addr #0 !dbg !99 {
    tail call void @llvm.dbg.value(metadata i32 %0, metadata !104, metadata !DIExpression()), !dbg !106
    tail call void @llvm.dbg.value(metadata i32 %1, metadata !105, metadata !DIExpression()), !dbg !106
    %3 = add nsw i32 %1, %0, !dbg !107
    ret i32 %3, !dbg !108
  }
> LLVM addition produces poison when signed wrap happens.

https://llvm.org/docs/LangRef.html#add-instruction

> nuw and nsw stand for “No Unsigned Wrap” and “No Signed Wrap”, respectively. If the nuw and/or nsw keywords are present, the result value of the add is a poison value if unsigned and/or signed overflow, respectively, occurs.

Note that Rust produces `add`. The C++ produces `add nsw`. No poison in Rust, poison in C++.

Here is an example of these differences producing different results, due to the differences in behavior: https://godbolt.org/z/Gaonnc985

Rust:

  define noundef zeroext i1 @test() unnamed_addr #0 !dbg !14 {
    ret i1 true, !dbg !15
  }
C++:

  define dso_local noundef zeroext i1 @test()() local_unnamed_addr #0 !dbg !123 {
    tail call void @llvm.dbg.value(metadata i32 undef, metadata !128, metadata !DIExpression()), !dbg !129
    ret i1 false, !dbg !130
  }
This is because in Rust, the wrapping behavior means that this will always be true, but in C++, because it is UB, the compiler assumes it will always be false.

> I'm a little bit puzzled about the vehement responses in the comments wow.

You are claiming that Rust has semantics that it was very, very deliberately designed to not have.


Rust includes a great deal of undefined behavior, unlocked with the trustme keyword. Ahem, sorry, unsafe. If only...

So if we're going to be pedantic, it's safe Rust which has defined semantics for basically everything. A considerable accomplishment, to be sure.


While this is true, we’re talking about integer overflow. That’s part of safe Rust. So it’s not really germane to this conversation.


Even languages like Modula-2 and Ada, among others, had better semantics than C, but they didn't come for free alongside UNIX.


I know nothing about Zig, but this is pretty interesting and looks well designed. Linus was recently very mad when someone suggested a new semantics for overflow:

—— I'm still entirely unconvinced.

The thing is, wrap-around is not only well-defined, it's common, and EXPECTED.

Example:

   static inline u32 __hash_32_generic(u32 val)
   {
        return val * GOLDEN_RATIO_32;
   }
and dammit, I absolutely DO NOT THINK we should annotate this as some kind of "special multiply". —-

Full thread: https://lore.kernel.org/lkml/CAHk-=wi5YPwWA8f5RAf_Hi8iL0NhGJ...


> The thing is, wrap-around is not only well-defined, it's common, and EXPECTED.

No, it's really not. Do this experiment: for the next ten thousand lines of code you right, every time you do an integer arithmetic operation, ask yourself if the code would be correct if it wrapped around. I would be shocked if the answer was "yes" in as much as 1% of the time.

(The most recent arithmetic expression I wrote was summing up statistics counters. Wraparound is most definitely not correct in that scenario! Actually, I suspect saturation behavior would be more often correct than wraparound behavior.)

This is a case where I think Linus is 100% wrong. Integer overflow is frequently a problem, and demanding the compiler only check for it in cases where it's wrong amounts to demanding the compiler read the programmer's mind (which goes about as well as you'd expect). Taint tracking is also not a viable solution, as anyone who has implemented taint tracking for overflow checks is well aware.


It depends heavily on context.

For the kernel, which deals with a lot of device drivers, ring buffers, and hashes, wraparound is often what you want. The same is likely to be true for things like microcontroller firmware and such.

In data analysis or monte carlo simulations, it's very rarely what you want, indeed.


Is it really?

For example, I opened up https://elixir.bootlin.com/linux/latest/source/drivers/firew... as a random source file in the Linux kernel, and I didn't see a single line where wraparound would be correct behavior.

There are definitely cases where wraparound behavior is correct. There are also cases hard errors on overflow isn't desirable (say, statistics counters), but it's still hard to call wraparound the correct behavior (e.g., saturation would probably work better for statistics than wraparound). There are also cases where you could probably prove that overflow can't happen. But if you made the default behavior a squawk that wraparound occurred, and instead made developers annotate all the cases where that was desirable to silence the squawk, even in the entire Linux kernel, I'd suspect you'd end up with fewer than 1000 places.

This is sort of the point of the exercise--wraparound behavior is often what you want when you think about overflow, but you actually spend so much of your time not thinking about it that you miss how frequently wraparound behavior isn't what you wanted.


I think wraparound generally is better for statistics counters like the ones in the linked code, since often you want to check the number of packets/errors per some time interval, which you can do with overflow (as long as the time interval isn't so long that you overflow within a period) but not with saturation.


I think it's critical that we do annotate it as a special multiply.

If wraparound is ok for that particular multiplication, tell the compiler that. As a sibling comment says, this is seldom the case, but it does happen, in particular, expecting byte addition or multiplication to wrap around can be useful.

The actual expectation of the vast majority of arithmetic in a computer program is that the result will be correct in the ordinary schoolyard sense. While developing that program, it should absolutely panic if that isn't the case. "Well defined" doesn't mean correct.

I don't understand your objection to spelling that `val *% GOLDEN_RATIO_32` is. When someone sees that (especially you, later, coming back to your own code) it clearly indicates that wrapping is expected, or at least allowed. That's good.


Unsigned integer overflow is not undefined in C or C++. You can rely on how it works.

Signed integer overflow, on the other hand, is undefined. The compiler is allowd to assume it never happens and can re-arrange or eliminate code as it sees fit under that assumption.

How many lines will this code print?

    for (int i = INT_MAX-1; i < 0; ++i) printf("I'm in danger!\n");




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: