One of the escape analysis examples in the article got me thinking. Turns out this is one of those cases where marking a local variable const _does_ result in different code.
As I understand it, performing a const_cast itself isn't undefined behavior, which is why the compiler cannot assume function f doesn't modify the local. However, if we mark the local variable as const, the compiler is able to assume no modification will occur (because that would be undefined) and optimize the return.
Fascinating. Does anyone know why x escapes here? Is it an optimization miss? Because to my untrained eye modifying x should still be undefined behavior.
Does the way x is initialized make it not "const-y" enough?
I tried gcc, clang, msvc, and icc. They all generated the load and either an add or an inc.
https://godbolt.org/z/han78n9zM
If we define `f` to const_cast away the const and add 10:
https://godbolt.org/z/1dso8ojWG
all the compilers agree that we should get 53.
However, make `y` const, so we have
const int y = 42;
const int x = y;
and now modifying x suddenly becomes undefined behavior, and x + 1 returns 43
https://godbolt.org/z/cqbGasrMM
Similar to how the original assembly showed making `y` const removed the reload and increment/+1 of x.
So, it seems that all these compilers agree that modifying `x` is undefined behavior if `y` is const, but is defined if `y` is mutable.
My best guess is that for some reason, `x` is a constant object only if initialized itself via a constant object. But I couldn't find anything saying that is the case. I asked on the #learn channel of cppslack.
In my example, even though `x` is a `const` local, the assembly still shows that it reloads `x` and adds `1` if we assign to it from a non-const local `y`.
Why can't the compiler rule out that possibility? That looks like undefined behavior code to me, so thus the compiler could rule it out. S::make() is storing a pointer to a local variable (origp), and then that pointer is being accessed after S::make() has returned. I'm pretty sure that's undefined behavior.
The usual justification for having a concept of "undefined behavior" at all is specifically to allow compilers to "rule out this possibility" so they can make this sort of optimization.
Related reading: this year the author submitted a GCC bug-report on this topic, linking back to the 2021 blog post. The bug appears to be still unresolved.
It should be, actually, as long as you give it all the source code or give it object files with the summary data in them.
The summary mod/ref info should be enough to get it right, because it should say the function does not modify the argument, and that should get propagated through.
The tricky case is where it sometimes modifies the argument - the summary data is not flow sensitive, IIRC
Flow sensitive algorithms are still really expensive worst case (N^3 minimum, sometimes exponential), but could be made practical here using BDD's. Most compilers still do not use BDD's to represent this sort of large binary decision data, even though they are very good at it. It's an area where research always felt more advanced than production (IE even if you don't go whole-hog on datalog, the data structures are still very useful for this sort of otherwise-memory/time intensive data)
Does anyone know what guarantees Rust functions have regarding escape analysis? In particular, is the `x` reference in `fn foo(x: &T)` guaranteed to not outlive the function, or would it be legal for `foo` to `transmute()` it into a `&'static T` and have to escape into a global?
It is unsound to transmute `&'a T` into `&'static T`, but it is not UB - as long as all of the subsequent uses of the transmuted reference obey the "real" lifetime of the original reference:
(because it's unsound, it's considered wrong to do this - you should not intentionally write functions whose types are lies, and this one definitely lies, so it should be marked `unsafe` - but this is not automatic UB)
You can run through Miri and confirm there's no UB even though we're modifying `ptr`, whose lifetime has been extended beyond the length of the function.
However, Rust does have extra guarantees here which make this irrelevant to the pessimization problem in the linked article - you cannot ever legally convert a `&T` into a `&mut T` - this is always UB. This means that Rust guarantees that `example` does not modify `x` (unless e.g. it contains an `UnsafeCell`, like a `Mutex`'s contents), and so it does not need to defensively reload its value.
That is to say: Rust, just like C++, makes it legal (but frowned upon) to "leak" a reference beyond the stated lifetime it's provided as. But unlike C++, it is (always!) illegal to "upgrade" a `&T` into a `&mut T`, and thus the fact that it escapes does not hinder other optimizations.
In C and C++, `const` on pointers/references is basically just a comment to programmers - it is part of the type, but doesn't "mean" anything to the abstract machine; the rules don't treat const / regular references/pointers differently, they just say that the types only let you mutate through a mutable pointer.
Obviously, good code should treat it as more than just a comment - using `const` correctly clarifies intent and makes it possible to stay sane as a C++ developer, but the abstract machine doesn't care.
In C++, you can basically always `const_cast` a `const T&` into a `T&` and then modify it without causing UB. A function that accepts a `const T&` is just pinky promising that it will be polite and probably not do that.
It is only UB if the underlying object is "actually const", and even then, it doesn't cause UB until you actually perform the mutation; creating the mutable reference itself is perfectly fine.
it's only invalid if the object that is pointed at is const, as in:
int& upgrade_to_mut(const int& x) {
return *const_cast<int*>(&x);
}
const int y = 5;
const int& y_ref = y;
int& y_ref_mut = upgrade_to_mut(y_ref); // it is actually legal to produce y_ref_mut, but we cannot modify it
y_ref_mut = 6; // this is UB: cannot modify a const object 'y'
The difference is that in Rust, "mutation capabilities" are part of references, and so you cannot create them out of nowhere, that would be UB. But in C++, mutation capabilities are part of the object being pointed at, so as long as they happen to be there when you perform the mutation (e.g. you're not modifying a string literal or a variable declared `const`) then there's no problem.
It's not entirely true that "const" is "just a comment," depending on the use case. In machines with super-limited RAM you can use const on globals to tell the compiler "put this in .rodata"
In other words, "const" (in a global context) can tell the compiler "you don't have to copy this to RAM, just read it directly from non-volatile storage." Obviously, that would be undesirable on a desktop computer, but if you're dealing with a wee little microcontroller, it's very helpful.
`const`-as-comment is specifically limited to pointers and references - `const` on objects definitely does change semantics (it is always UB to attempt to modify a `const` object).
Another good example is string literals (except when initializing a non-const `char[]` variable), which are often allocated in read-only data in the same way, since they are const objects too.
I don't agree it is just a comment to programmer. CppReference says,
> Modifying a const object through a non-const access path and referring to a volatile object through a non-volatile glvalue results in undefined behavior.
and both compiler have tried to take advantage of this in the past.
That is saying exactly the same thing that I said; the qualification of the pointer is irrelevant; you get UB when modifying a const object. In the code snippets above,
int x; // not a const object, so can be freely modified
const int y; // is a const object, so cannot be modified
C/C++ pointer provenance doesn't include information about constness, so it doesn't matter "how" you got a mutable pointer to `x`: you're always allowed to modify `x` through that pointer, even if the pointer "came from" a const reference.
The reason for mentioning a "non-const access path" is that the type system forbids you from modifying through a const access path in the first place, so the program would already be rejected if you tried that.
I'm not saying it's a good idea to go around dropping const qualifications; `const_cast` is mostly evil and should be avoided. But at the level of the abstract machine, it's a no-op, even when going from const -> non-const, other than changing the type of the provided pointer.
The benefit of `const` is that if you don't use C-style casts that discard constness, and you don't use `const_cast`, and you don't use `mutable` or other type-unsafe or const-unsafe features, it's not possible to accidentally obtain a non-const pointer to a const object. Thus C++ actually helps you avoid this UB pretty well. But the fact that general conversion from const to non-const is permitted reduces the kinds of optimizations that can be performed.
It does pass this information to LLVM, in the form of the `readonly` attribute. This seems to be a bug in LLVM that does not optimize the function propely, I don't know why.
Regardless of escape analysis, this problem doesn't exist in Rust, because the language features that lead to this problem don't exist either.
In Rust, moves are built in into the language. Types with destructors can't have implicit copying, and there are no copy constructors. There is no moved-from state of objects, and destructors never run redundantly. Box (Rust's unique_ptr) is statically guaranteed to be always non-null and have a single owner at all times. So the language has no equivalent of omitting std::move, and has no semantically-observable copy to elide.
No, there is no guarantee that the reference will not be valid after the function returns. The only guarantee is that the reference will not be valid after the caller of the function modifies the referent (outside of an UnsafeCell), deallocates it, or creates a mutable reference to it. So there still is some escape analysis possible, in that any escape is scoped to the next modification.
Lifetimes do not affect whether or not something is UB on the language level; they only dictate which operations are safe and which are unsafe, and thus they act as a "social contract" for library-level UB.
Rust borrows must not exceed the lifetime associated with the borrow. By default that indeed means the borrow/reference cannot be moved into a context that outlives the function's execution, so it indeed can't be moved into a static global (unless `unsafe` is used explicitly)
But unsafe could be used inside the (rust equivalent) body of make and take_ownership without the compiler knowing. So would the rust compiler need to take this possibility into account when optimizing or not?
Transmuting lifetimes is one of the operations explicitly allowed for the std::mem::transmute() function [0]; lifetimes are only there to help safe code, and they have no effect on language-level UB. In fact, the 'static reference will be valid until the caller otherwise invalidates it.
Therefore, any use of transmute that extends a lifetime is UB if it causes the reference to outlive the referent. And because it's UB, the backend is going to assume the invariant holds universally and will optimize accordingly.
> Therefore, any use of transmute that extends a lifetime is UB if it causes the reference to outlive the referent.
By itself, a transmute of a currently-live reference cannot make it outlive its referent, in the sense used by the Rustonomicon. As it says later [0], a reference is alive until the point when it is last used, and no further.
Thus, with unsafe code, the lifetime of a reference can extend past the point when it stops being alive. As a consequence, extending the lifetime can never cause UB by itself: only using the reference after it has been invalidated can cause language-level UB.
Since language-level UB is based on liveness rather than lifetime, the compiler must assume that a reference passed to a function is valid even after it returns, up until the caller invalidates the reference by performing an operation incompatible with it remaining alive.
That's not about 'using' a reference, it's about constructing an invalid value – no different than doing transmute::<u32,char>(0x110000). See https://www.ralfj.de/blog/2020/07/15/unused-data.html for an explanation why it's UB.
That's true, but the claim was "it's ok to extend the lifetime of a reference beyond its referent so long as you don't use it after its referent has ceased being live"; what definition of 'used' the compiler might rely on is relevant.
What is UB in unsafe rust? You would be taking the addresses and then using the address of a variable that is still in scope. This assuming that rust has the same guaranteed elision as C++, extending the lifetime of an automatic object into the caller.
I believe the parent is saying "that would be undefined behavior" in answer to your question "would the rust compiler need to take this possibility into account". The compiler would not take this possibility into account, because the fact that references cannot outlive their referents is an invariant of the language that can only be broken by incorrect unsafe code. (To put it another way, in any context where the compiler would take this into account, it would do so regardless of the existence of the unsafe code, because unsafe code still ultimately has to uphold all the same invariants as safe code.)
Sure, but the reference doesn't outlive the referent in c++: the language guarantees that the local object in make is exactly the same object in the caller's caller.
The question is whether rust gives this object identity guarantee as well and, if it does, whether there are other rules (like uniqueness of mutable refences) that still allows the optimisation.
But there could be an object with the same lifetime as the function passed to both f and h, in the article’s example — could they not collaborate still in safe Rust? Or does the compiler know about that?
I'm afraid I don't know about Rust, but D has the scope parameter storage class for this. It enables the following compiler-enforced guarantee:
> The parameter must not escape the function call (e.g. by being assigned to a global variable). Ignored for any parameter that is not a reference type.
You can't do that. Storing the reference as a &'static T will be ok for the lifetime of your function, there is no guarantee that the thing the &'static T points to will be alive after that.
See this godbolt example https://godbolt.org/z/nbzd59E33
As I understand it, performing a const_cast itself isn't undefined behavior, which is why the compiler cannot assume function f doesn't modify the local. However, if we mark the local variable as const, the compiler is able to assume no modification will occur (because that would be undefined) and optimize the return.