For 2. that seems like a defect in the specification of setjmp()? The equivalent "just works" for functions like pthread_mutex_lock() etc. Those calls implicitly add barriers to force reloading.
Think how the compiler would compile it. The compiler must assume pthread_mutex_lock (as any opaque non-otherwise decorated functions) clobbers memory. So if the address of 'i' escapes the containing function, then the compiler must make sure it is correctly written to memory before the call. If it doesn't escape, the compiler can potentially leave it in a (callee-save) register and there still wouldn't be any multithreading issues.
longjump is special: it can return twice and the effect is observable even for non-escaping variables: so values that must be preserved across function calls need to be forced to memory by declaring them volatile.
GCC specifically marks longjmp as returns_twice, which as far as I can tell prevents the surrounding function from being inlinable and additionally treats all local variables as volatile at the point of the longjmp call (forcing them to memory even if not escaping), but that's a GCC extension.
I'm almost out of my depth here, but I believe this isn't (only) about escape analysis. A function call (like pthread_mutex_lock(), or any other) is running on the same thread.
I can see how failure to prove that a variable hasn't escaped a certain scope must prevent compiler reordering when a function of unknown implementation is called -- but not how that failure should require emitting memory barrier instructions.
However I realize that setjmp()/longjmp() isn't about threading either. What those functions do is quite weird.
The compiler won't emit memory barriers for pthread_mutex_lock. It only need to make sure that all globally observable (i.e whose address has escaped)values are flushed from registers into memory. In practice this means that opaque function calls act as compiler memory barriers. Any additional required hardware memory barrier instruction is inside the implementation of pthread_mutex_lock itself.
Yes. I can see how setjmp()/longjmp() would need additional/special treatment by escape analysis. Now I'm only wondering why the problem would be limited to (syntactically) automatic local variables. If the control flow (returns twice etc) is surprising to the compiler, couldn't that affect optimizations to non-local variables too?
It could, but there is no latitude about it specified in the standard. Only automatic locals are allowd to turn to pixie dust after a longjmp, and only if they have been modified since the setjmp.
Thus, if other things are a problem, the compiler just has to slow down in that section of code where setjmp is being used and not do those optimizations (without being told not to via volatile).
By the way, I have run into a problem, quite recently, where a setjmp-like routine (not setjmp itself) caused a problem with access to a global variable, under gcc.
This was caused by -fPIE builds, enabled in some toolchains and distros.
The global variable in question was accessed, under the hood, via a global offset table or something like that. Basically, a layer of indirection due to the position independence of the code. The invisible pointer variables needed to access the global are, of course, themselves local.
A problem happened whereby code executed since the setjmp like function prepared the value of a hidden local in order to access that global variable. When the longjmp-like function was executed, this was trashed. Yet the code assumed the value is stable; that it can access the global variable through that address. The result being a mysterious crash.
Not sure if the issue is reproducible with real setjmp and longjmp.
Calls to functions like pthread_mutex_lock() don't magically add memory barriers. From the perspective of the compiler they are regular function calls. They "work" because of escape analysis, but that applies to any function that is called.
It's true that the C specification could have said that the setjmp() function is "special". Then the only way to implement it is to spill most local variables to the stack before the call. I suppose the C authors didn't want to introduce this special case (is there any other function that the C compiler is required to treat specially?)
Pretty sure that pthread_mutex_lock() etc. have to add memory barriers, in some way or another, depending on the architecture. Regular function calls shouldn't require full inter-thread memory synchronization just because escape analysis doesn't know the callee.
However setjmp()/longjmp() are different beasts entirely, and the problem here isn't related to multi-threading and thus not related to hardware memory ordering.
Those memory barriers are in the function implementation. They don't exist at the call site. Again, from the perspective of the compiler it's a regular function call.
> However setjmp()/longjmp() are different beasts entirely
Yes, exactly, they are not "equivalent" to pthread_mutex_lock() at all, which is what you suggested in the beginning. A call to pthread_mutex_lock() is a regular function call as far as the compiler is concerned.
Nope! The memory synchronizing properties of those functions are at the specification level. POSIX says so, and so the implementation has to make it so, somehow. That could involve recognizing those functions in the compiler. Usually external function calls are good enough to have a compiler barrier (the compiler won't reorder accesses around those locking calls), so that the function then just has to contain the hardware memory barriers.
It's not even clear what part you're dismissively replying "Nope!" to.
I'll be explicit: on POSIX systems that implement the POSIX threads extension there is a header file called pthread.h that declares a regular function called pthread_mutex_lock() and that function can be called as a regular function by an ISO C compliant compiler.
(POSIX also allows defining macros that can achieve the same effect, possibly more efficiently, but pthread_mutex_lock() et al. have to exist also as regular function definitions.)
So the point remains: pthread_mutex_lock() works not because the C compiler treats it specially. That makes sense since the C standard doesn't even mention it. Unlike setjmp() it's not part of the C standard, and it doesn't need to be, because none of its behavior requires compiler support, beyond what is already required by the platform ABI.
Here's a better analogy. The equivalent just works if we use C++ exception handling instead of setjmp. You can change variables after a try, and those values will be reliably observed in the catch.
setjmp and longjmp are a module that you can write in a small amount of assembly language, without changing anything in the compiler to support them. (Provided it has a bona fide volatile.)
Exception handling is a fairly complex feature requiring supporting in the compiler, with various strategies that have different performance trade-offs.
> Here's a better analogy. The equivalent just works if we use C++ exception handling instead of setjmp. You can change variables after a try, and those values will be reliably observed in the catch