Hacker News new | past | comments | ask | show | jobs | submit login

Turning all UB into IB would be better so a compiler can provide --overflow=crash or --overflow=wrap.



UB does not prevent that. Compilers already have flags for those, for example gcc's -ftrapv and -fwrapv .


The difference is that implementation defined behavior must be documented and consistent, you get rid of code magically disappearing (null checks after null pointer deref for example) and becoming security problems when you update your compiler.


> you get rid of code magically disappearing (null checks after null pointer deref for example) and becoming security problems when you update your compiler.

You're not going to get that. It would make these proposed safety checks unshipably slow. Languages that have bounds checked array access by default rely on the compiler/JIT being able to recognize in the common case that those branches are dead code & eliminating them. Those "magically disappearing null checks" are in the same category. Mandating that they cannot be optimized away, even though the compiler can "prove" that they are not necessary, isn't going to fly.


First up - bounds checks are simply not expensive, that was true a decade or two ago, it is not now. Similarly null checks are not expensive either. Now the standard C++ "I'll throw an exception now" model for errors is expensive for myriad reasons, and the general mode of development for many projects is to just outright disable exceptions entirely at build time due to the costs that come from the "zero cost" exceptions.

That's why this proposal isn't suggesting throwing an exception - think brk or int3, which is essentially what other bounds checked languages like rust do.

This also isn't mandating that the check can't be thrown away. What it is doing is changing UB to unspecified/implement defined behavior. The difference is important: when something is UB the compiler is free to say "UB is not valid, therefore any preceding branches that would result in UB cannot happen" and then remove them. With unspecified behaviour the compiler only loses the ability to make assumptions based on "this UB path cannot be taken".

C and C++ both have a problem in overuse of undefined vs unspecified behaviour. Take integer overflow: there is no reason that that should be UB, on all hardware it has well defined behavior, and the impact of pretending that this isn't the case has been numerous security exploits over the year. Unspecified behaviour does not (by definition) mean that the spec has any prescriptive behavioral rules beyond "this must be consistent across all sites". For example integer overflow should always be 2's complement or it should always trap, the compiler doesn't get to mix and match, and doesn't get to pretend overflow doesn't happen.


> First up - bounds checks are simply not expensive, that was true a decade or two ago, it is not now. Similarly null checks are not expensive either

They're not expensive because they are optimized away. Branches are still relatively expensive in general. Prediction only goes so far.

> Take integer overflow: there is no reason that that should be UB

Yes there is, it allows for size expansion of the type to match the current architecture. Like promoting a 32 bit index into a 64 bit index for optimal array accesses on a 64 bit CPU. Which you can't do if you're needing to ensure it wraps at 32 bits.

It's not the how it overflows that's the problem, it's the when it overflows that is. Defining it means it has to happen at the exact size of the type, instead of at the optimal size for the usage & registers involved.

> and the impact of pretending that this isn't the case has been numerous security exploits over the year.

And many of those exploits still exist if overflow is defined, because that the overflow happened was the security issue, not the resulting UB failure mode. UB helps here because now it can categorically be called a bug, and you can have things like -ftrapv. That unsigned's wrap is defined and not UB is actually also a source of security bugs - things like the good ol' `T* data = malloc(number * sizeof(T));` overflow bugs. There's no UB there, so you won't get any warnings or UBsan traps. Nope, that's a well-defined security bug. If only unsigned wrapping had been UB that could have been avoided...

But if you want code that is hyper consistent and behaves absolutely identically everywhere regardless of the hardware it's running on (which things like crypto code does actually want), then C/C++ is just the wrong language for you. By a massive, massive amount. Tightening up the UB and pushing some stuff to IB doesn't suddenly make it a secure virtual machine, after all. You're still better off with a different language for that. One where consistency of execution is the priority, abstracting away any and all hardware differences. But that's never been C or C++'s goal.

C/C++ do have problems with UB. But int overflow isn't actually an example of it. And unsigned's overflow being defined is also really a mistake, it should have been UB.


Yep, behavior is consistent with the implementation code.


in fact it is exactly UB that allows that. If, say, the standard specified wrap behaviour, a conforming compiler wouldn't be allowed to trap.


You are right, but OP asserted that IB (implementation defined behavior) would be needed for this. IB would also allow this, but it is not necessary.

All of undefined/unspecified/implementation-defined allows this.

edit:

Whether unspecified or implementation-defined would allow trap depends on the exact wording. If the wording is along the lines of "[on overflow] the value of the expression is unspecified/implementation-defined" then trap is not allowed. But it can be also "[on overflow] the behavior of the program is implementation-defined", which would also allow trap. Point being, unspecified/implementation-defined also have a defined scope, while undefined behavior always just spoils the whole program.

I'm fine with the UB overflow though.


> the behavior of the program is implementation-defined

You're basically defining undefined behavior. The way to do this "properly" would be "the program is allowed to do x, y, z, or also trap"–there's really not much else you can do.


The difference here is that the behavior would need to be documented.

That's why I didn't bother with "the behavior of the program is unspecifed", because that really is just undefined behavior.

But yeah, actually listing allowed behaviors and stating that the actual behavior out of them is unspecified/implementation-defined is more practical.

One example is conversion to signed integer in C17:

> ...either the result is implementation-defined or an implementation-defined signal is raised.

https://cigix.me/c17#6.3.1.3.p3


Well, it is not that easy. Specifying a subset of allowed outcomes might still prevent future evolution.

For example bound checking, while fairly cheap for serial code, makes it very hard to vectorize code effectively. Then again so does IEEE FP math and that's why we have -ffast-math...

In the end, it is not impossible, it is just tradeoffs.

I'm generally on the opinion that there should be as little UB as possible, but probably this is better done by compilers that can push the limit of static and dynamic checking technology instead of the standard mandating it.


Right, that’s my point at the top of the thread :) I’m just saying if you don’t end up with something that looks like what I just mentioned you’re really not specifying anything, so you might as well go for undefined behavior. This is fine.


Yes, I think we are in complete agreement :)


IB?


Implementation defined behaviour.


Implementation defined behaviour


Ah yeah, "unspecified behaviour" is my known phrase, but I can see it would be confusing to be arguing about whether something should be UB or UB :D


Unspecified behavior and implementation defined behavior are distinct in theory. The latter requires the behavior to be documented by the implementer, while the former does not.

In practice I dare you to find compiler documentation for each implementation defined behavior.

https://timsong-cpp.github.io/cppwp/n4868/intro.defs#defns.u...

https://timsong-cpp.github.io/cppwp/n4868/intro.defs#defns.i...

edit:

C wording:

https://cigix.me/c17#J.1

https://cigix.me/c17#J.3


> In practice I dare you to find compiler documentation for each implementation defined behavior.

For gcc there is this:

https://gcc.gnu.org/onlinedocs/gcc/C-Implementation.html

No idea how complete it is. Also a lot of stuff is architecture/platform specific, not compiler specific, so you won't find it in the general compiler docs but you have to look at the psABI.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: