It could, but then we're getting into hairy language definition issues: Does Lua specify at what point updates need to hit memory? And this example is perfect example of why a language ought to be explicit about such guarantees - in this case such a change would have observable effects. Changing observable effects in optimizations is dangerous territory even when it's well defined.
That has nothing to do with high level, C++ for example also had no built-in guarantees for a long time. Making no promises about timing simply means that threading is either a non concept in the language (see C++ before 11) or the language declares everything to do with threading unspecified or undefined.
That's all nice when you don't need to know the effects of your code. It breaks apart the moment you want to interface with a system (like Snabbswitch) that uses memory as an IO mechanism, at which point if you can't depend on language guarantees, you need to depend on implementation guarantees - the same point stands.
> Either the whole thing is interpreted or the memory access is trivially inlined.
In the context of a discussion about the extent of optimization which can be done, if it is inlined, then we're back to square one and do have to worry about optimization. Your argument is circular.
I mean inlining it opaquely. The optimizer won't alter the injected machine code. (Either because it already ran, or because it manipulates something other than machine code, or because the injected opcodes are marked as untouchable.)
Treating things as a function call when optimizing is actually a really good way to preserve volatile memory accesses even in something as complex as a C compiler.
Inlining a function opaquely is a very weird thing for an optimizer to do without special annotation as it looses a substantial proportion of the benefit of inlining.
And it would certainly be a weird thing to do for a hypothetical optimizer as aggressive as what was the starting point for this thread - namely one willing to e.g. partially evaluate even loop contents during compilation.
In any case this again goes back to my original point: The importance for a language of defining the semantics of when updates hits memory.
What I'm saying is that it's really easy to make memory accesses explicit without putting it in language semantics.
If you have an aggressive optimizer, you can mark cases that bypass the aggression with a handful of lines of code and still end up with something that's very fast.
In fact doing it with function calls can be easier and better than teaching the compiler about memory access semantics. Code for calling functions is a lot better tested than code for handling volatile.
It is not at all weird.
I'm not talking about a way to trick the optimizer, I'm talking about a way to make an optimizer and a forced memory access coexist in a super-easy fashion, by very slightly altering the compiler. Normal functions would be inlined transparently. Only the special memory access functions would be opaque.
Memory-mapped I/O is a platform-specific thing. It shouldn't really be baked into the language. Especially when the simplest way of putting it in the language pessimizes you all over the place in code that doesn't need those specific accesses. And even that would be a notable bloat; Lua does not have a big spec.
> If you have an aggressive optimizer, you can mark cases that bypass the aggression with a handful of lines of code
Now either your code is implementation dependent, or the way you mark those cases are part of the language semantics. And we're back to the original argument.
> In fact doing it with function calls can be easier and better than teaching the compiler about memory access semantics
You are teaching the compiler about memory access semantics if you add rules to the optimizer such as that it can't elide memory accesses in inlined functions.
> by very slightly altering the compiler
... in other words by altering the language semantics, either for just this implementation, or in general.
> Memory-mapped I/O is a platform-specific thing. It shouldn't really be baked into the language.
I agree, but this is one of the reasons why language semantics ought to cover what memory access guarantees are provided when and how, because otherwise you can't reliably even use such OS level facilities without digging into how a specific implementation works.
> And even that would be a notable bloat; Lua does not have a big spec.
It doesn't take a big spec to say "memory writes inside functions [where ... <insert additional criteria>] should be treated the same as IO for the purposes of optimization, and e.g. can not be elided, coalesced or reordered", or whatever semantics you prefer. Saying "an optimizer is free to elide, coalesce or reorder memory accesses as long as it does not alter the behaviour observable from Lua" would be an acceptable decision too, though in e.g. this case it would mean the use in the article is unsupported and relies on undefined behaviour, but knowing is better than not knowing; at least then you make an informed decision. It's not about giving complex guarantees, it's about not leaving giant gaping unspecified holes that makes it unclear what optimizations are legal and safe for example, but that could be plugged with a sentence or two.
Forgive my ignorance, but when writing a sufficiently large program, why would looking into the generated assembly be of use to a programmer? Wouldn't this turn into an arbitrarily large amount of instructions to sift through just to measure performance?
You find the 'hottest' code (usually a loop, the code executed most) and examine very carefully the generated assembly for that and how it can be made more CPU/JIT/VM-friendly.
This is an optimisation you perform when a small amount of code is consuming the overwhelming majority of your software's execution time (code very commonly has this property, until it is profiled and optimised).
In addition, this case is for a high speed networking library, where every CPU cycle counts.