More

nlewycky · 2024-09-05T18:39:00.000000Z

I'm a huge proponent of UBSan and ASan. Genuine curiosity, what don't you like about them?

FWIW, there once was a real good-faith effort to clean up the problems, Friendly C by Prof Regehr, https://blog.regehr.org/archives/1180 and https://blog.regehr.org/archives/1287 .

It turns out it's really hard. Let's take an easy-to-understand example, signed integer overflow. C has unsigned types with guaranteed 2's complement rules, and signed types with UB on overflow, which leaves the compiler free to rewrite the expression using the field axioms, if it wants to. "a = b * c / c;" may emit the multiply and divide, or it can eliminate the pair and replace the expression with "a = b;".

Why do we connect interpreting the top bit as a sign bit with whether field axiom based rewriting should be allowed? It would make sense to have a language which splits those two choices apart, but if you do that, either the result isn't backwards compatible with C anyways or it is but doesn't add any safety to old C code even as it permits you to write new safe C code.

Sometimes the best way to rewrite an expression is not what you'd consider "simplified form" from school because of the availability of CPU instructions that don't match simple operations, and also because of register pressure limiting the number of temporaries. There's real world code out there that has UB in simple integer expressions and relies on it being run in the correct environment, either x86-64 CPU or ARM CPU. If you define one specific interpretation for the same expression, you are guaranteed to break somebody's real world "working" code.

I claim without evidence that trying to fix up C's underlying issues is all decisions like this. That leads to UBSan as the next best idea, or at least, something we can do right now. If nothing else it has pedagogical value in teaching what the existing rules are.

ajross · 2024-09-06T14:41:46.000000Z

I'm not sure I understand your example. Existing modern hardware is AFAICT pretty conventionally compatible with multiply/divide operations[1] and will factor equivalently for any common/performance-sensitive situation. A better area to argue about are shifts, where some architectures are missing compatible sign extension and overflow modes; the language would have to pick one, possibly to the detriment of some arch or another.

But... so what? That's fine. Applications sensitive to performance on that level are already worrying about per-platform tuning and always have been. Much better to start from a baseline that works reliably and then tune than to have to write "working" code you then must fight about and eventually roll back due to a ubsan warning.

[1] It's true that when you get to things like multi-word math that there are edge cases that make some conventions easier to optimize on some architectures (e.g. x86's widening multiply, etc...).

nlewycky · 2024-09-05T03:31:50.000000Z

This construction is called a "false range" in English.

https://www.chicagomanualofstyle.org/qanda/data/faq/topics/C... https://www.cjr.org/language_corner/out_of_range.php

The wording change from Permissible to Possible and making it non-normative was an attempt to clarify that the list of behaviors that follows is a false range and not an exhaustive list.

It's a submarine change because in the eyes of the committee, this is not a change, merely a clarification of what it already said, to guard against ongoing misinterpretation.

nlewycky · 2024-09-05T03:14:45.000000Z

Why isn't that fine? The compiler ignored the undefined behavior it didn't detect.

torstenvl · 2024-09-05T03:55:37.000000Z

No. No honest person can claim that making a decision predicated on the existence of X is the same as "ignoring" X.

nlewycky · 2024-09-05T06:54:40.000000Z

This is the most normal case though, isn't it? Suppose a very simple compiler, one that sees a function so it writes out the prologue, it sees the switch so it writes out the jump tables, it sees each return statement so it writes out the code that returns the values, then it sees the function closing brace and writes out a function epilogue. The problem is that the epilogue is wrong because there is no return statement returning a value, the epilogue is only correct if the function has void return type. Depending on ABI, the function returns to a random address.

Most of the time people accuse compilers of finding and exploiting UB and say they wish it would just emit the straight-forward code, as close to writing out assembly matching the input C code expression by expression as possible. Here you have an example where the compiler never checked for UB let alone proved presence of UB in any sense, it trusted the user, it acted like a high-level assembler, yet this compiler is still not ignoring UB for you? What does it take? Adding runtime checks for the UB case is ignoring? Having the compiler find the UB paths to insert safety code is ignoring?

torstenvl · 2024-09-06T10:29:09.000000Z

> the epilogue is only correct if the function has void return type

That's a lie.

> Adding runtime checks for the UB case is ignoring? Having the compiler find the UB paths to insert safety code is ignoring?

Don't come onto HN with the intent of engaging in bad faith.

nlewycky · 2024-09-08T09:03:36.000000Z

> > the epilogue is only correct if the function has void return type

> That's a lie.

All C functions return via a return statement with expression (only for non-void functions), a return statement without an expression (only for void functions) or by the closing of function scope (only for void functions). True?

The simple "spit out a block of assembly for each thing in the C code" compiler spits out the epilogue that works for void-returning functions, because we reach the end of the function with no return statement. That epilog might happen to work for non-void functions too, but unless we specify an ABI and examine that case, it isn't guaranteed to work for them. So it's not correct to emit it. True?

Where's the lie?

> > Adding runtime checks for the UB case is ignoring? Having the compiler find the UB paths to insert safety code is ignoring?

> Don't come onto HN with the intent of engaging in bad faith.

Always! You too!

The text you quoted was referring to how real compilers handle falling off the end of a non-void function today with -fsanitize=return from UBSan. If I understand you correctly, in your reading a compiler with UBSan enabled is non-conforming because it fails to ignore the situation. That's not an argument as to whether your reading is right or wrong, but I do think UBSan compilation ought to be standard conforming, even if that means we need to add it to the Standard.

To the larger point, because the Standard doesn't define what "ignore" means, the user and implementer can't use it to pin down whether a given UB was ignored or not, and thus whether a given program was miscompiled or not. A compiler rewrites the code into its intermediate IR -- could be Z3 SMT solver or raw Turing Machine or anything -- then writes code back out. Can ignoring be done at any stage in the middle? Once the code has been converted and processed, how can you tell from the assembly output what's been ignored and what hasn't? If you demand certain assembly or semantics, isn't that just defining undefined behaviour? If you don't demand them, and leave the interpretation of "ignore" to the particular implementation of a compiler, yet any output could be valid for some potential design of compiler, why not allow any compiler emit whatever it wants?

nlewycky · 2024-08-04T17:53:45.000000Z

> The continuity of physical processes is not something that can be proved.

I agree with you, but is there any peer-reviewed publication that can be cited? The idea makes sense to me, firstly the Reals \ Inaccessible Reals = Computable Reals, secondly you can't ever input an inaccessible real to an experiment nor retrieve one out of an experiment -- but then I'm not completely certain in making the conclusion that no experiment can be devised which shows that inaccessible reals exist in physical space.

I am concerned about this in the field of complexity analysis of quantum computers too, I think that the use of reals in physics is leading to mathematically correct but non-physical results about complexity theory of quantum computers. Having a paper to point at and say "look, stop assuming your Bloch spheres are backed by uncountable sets, it's leaking non-computable assumptions into your analysis of computation" would be helpful.

nlewycky · 2024-08-04T02:38:26.000000Z

From the abstract, "simulations of such systems are usually done on digital computers, which are able to compute with finite sequences of rational numbers only."

Not at all! Digital computers can use computable reals which are defined as any function from a rational value (the error bound) to a rational value which is within the supplied error-bounds from the true value. Do not mistake this for computing in the rationals, these functions which perform the described task are the computable real numbers. There are countable-infinity many of these functions, one for each computable real. For instance, note that you can always compare two rationals for equality, but you can't always compare two computable reals for equality, just like reals.

Hans Boehm (of Boehm garbage collector fame) has been working on this for a long time, here is a recent paper on it: https://dl.acm.org/doi/pdf/10.1145/3385412.3386037

nlewycky · 2024-08-03T22:47:19.000000Z

The compiler may remove the nullptr check in:

  ptr->foo = 1;
  if (ptr == nullptr)
     return;

but it may not remove the nullptr check in:

  if (ptr == nullptr)
     return;
  ptr->foo = 1;

thayne · 2024-08-04T00:21:22.000000Z

To explain why it can be removed in the former:

Since it is UB to dereference a null pointer, the compiler can assume that ptr isn't null after it is dereferenced[1]. Therefore, the if condition will always be false.

In fact if ptr is null, unless the foo field has a very large offset, the behavior you would probably expect would be for the dereference to segfault before reaching the if, so it doesn't really matter if it is optimized away.

krackers · 2024-08-04T00:40:00.000000Z

>so it doesn't really matter if it is optimized away.

There are cases in which the optimization can result in behavior other than segfaulting, see https://research.swtch.com/ub#null

thayne · 2024-08-05T04:30:23.000000Z

Sure. That's why I said "the behavior you would probably expect", but that isn't necessarily what happens.

Bjartr · 2024-08-03T23:33:30.000000Z

I guess I've misunderstood that other story then, thanks.

nlewycky · 2024-06-25T15:23:28.000000Z

> If the print blocks indefinitely then that division will never execute, and GCC must compile a binary that behaves correctly in that case.

Is `printf` allowed to loop infinitely? Its behaviour is defined in the language standard and GCC does recognize it as not being a user-defined function.

Ontonator · 2024-06-25T15:31:04.000000Z

I’m not sure it can loop indefinitely, but it can block (e.g. if the reader of the pipe is not reading from it and the buffer is full).

nlewycky · 2024-06-19T05:52:15.000000Z

> You're right, most times organizations need a fire lit underneath them to change, for Google, it probably was the NSA annotation "SSL added and removed here :^)" on a slide showing Google's architecture from the Snowden leaks.

As an insider, it was not. The move to zero-trust started with "A new approach to China": https://googleblog.blogspot.com/2010/01/new-approach-to-chin...

nlewycky · 2024-03-25T21:51:59.000000Z

The symptoms match my experience with a mid-network firewall/router that is not aware of TCP window scaling stripping out the scaling factor while leaving the window scaling feature enabled. See https://lwn.net/Articles/92727/

nlewycky · 2024-03-17T05:24:14.000000Z

It caused a problem when building inline assembly heavy code that tried to use all the registers, frame pointer register included.