Tracing JITs and Modern CPUs: Part 2

LForLambda · on July 26, 2015

Is it a thing to blog with github issues now?

Everlag · on July 26, 2015

In theory blogging on github issues could actually be quite convenient.

- Great(==standardized) markdown support.

- Easy image support.

- Uptime is someone else's problem.

- Sending someone your blog also directly links them to your ~entire public codebase.

- Builtin sane commenting system.

- Can keep code beside the blog. Its aggravating to find a living blog that links to a dead code host.

- Free with zero setup cost for everything above.

- Not having to deal with ruby dependencies for jekyll...

Edit: You know, I think I'm going to host my blog in this manner. When writing the above list I realized I'm sick of having to handle each of those.

golergka · on July 26, 2015

All of that works with hosting md files in github repo or in gists, but with them you also get a convenient version control system also known as git.

jud_white · on July 26, 2015

Even the wiki is a git repo, though there's no built in search like there is for repositories and issues.

fche · on July 26, 2015

(Seems like an "embarrassingly serial" problem -- a smarter lua could turn it into a single "set c = 1e9" instruction.)

vidarh · on July 26, 2015

It could, but then we're getting into hairy language definition issues: Does Lua specify at what point updates need to hit memory? And this example is perfect example of why a language ought to be explicit about such guarantees - in this case such a change would have observable effects. Changing observable effects in optimizations is dangerous territory even when it's well defined.

Dylan16807 · on July 26, 2015

What's memory?

Lua's much more high level than that, and it makes no promises about timing at all.

josefx · on July 26, 2015

That has nothing to do with high level, C++ for example also had no built-in guarantees for a long time. Making no promises about timing simply means that threading is either a non concept in the language (see C++ before 11) or the language declares everything to do with threading unspecified or undefined.

Dylan16807 · on July 26, 2015

C++ has volatile.

vidarh · on July 26, 2015

That's all nice when you don't need to know the effects of your code. It breaks apart the moment you want to interface with a system (like Snabbswitch) that uses memory as an IO mechanism, at which point if you can't depend on language guarantees, you need to depend on implementation guarantees - the same point stands.

Dylan16807 · on July 26, 2015

There's not really any downside to wrapping that interface in function calls, and then you don't have to worry about optimization.

vidarh · on July 26, 2015

There's a very real downside: Massive (relative to just adjusting a value in memory) overhead.

Dylan16807 · on July 26, 2015

We're talking about Lua. Either the whole thing is interpreted or the memory access is trivially inlined. Neither gives you noticeable overhead.

vidarh · on July 27, 2015

> Either the whole thing is interpreted or the memory access is trivially inlined.

In the context of a discussion about the extent of optimization which can be done, if it is inlined, then we're back to square one and do have to worry about optimization. Your argument is circular.

Dylan16807 · on July 27, 2015

I mean inlining it opaquely. The optimizer won't alter the injected machine code. (Either because it already ran, or because it manipulates something other than machine code, or because the injected opcodes are marked as untouchable.)

Treating things as a function call when optimizing is actually a really good way to preserve volatile memory accesses even in something as complex as a C compiler.

vidarh · on July 27, 2015

Inlining a function opaquely is a very weird thing for an optimizer to do without special annotation as it looses a substantial proportion of the benefit of inlining.

And it would certainly be a weird thing to do for a hypothetical optimizer as aggressive as what was the starting point for this thread - namely one willing to e.g. partially evaluate even loop contents during compilation.

In any case this again goes back to my original point: The importance for a language of defining the semantics of when updates hits memory.

So we're back to the starting point.

Dylan16807 · on July 27, 2015

What I'm saying is that it's really easy to make memory accesses explicit without putting it in language semantics.

If you have an aggressive optimizer, you can mark cases that bypass the aggression with a handful of lines of code and still end up with something that's very fast.

In fact doing it with function calls can be easier and better than teaching the compiler about memory access semantics. Code for calling functions is a lot better tested than code for handling volatile.

It is not at all weird.

I'm not talking about a way to trick the optimizer, I'm talking about a way to make an optimizer and a forced memory access coexist in a super-easy fashion, by very slightly altering the compiler. Normal functions would be inlined transparently. Only the special memory access functions would be opaque.

Memory-mapped I/O is a platform-specific thing. It shouldn't really be baked into the language. Especially when the simplest way of putting it in the language pessimizes you all over the place in code that doesn't need those specific accesses. And even that would be a notable bloat; Lua does not have a big spec.

vidarh · on July 30, 2015

> If you have an aggressive optimizer, you can mark cases that bypass the aggression with a handful of lines of code

Now either your code is implementation dependent, or the way you mark those cases are part of the language semantics. And we're back to the original argument.

> In fact doing it with function calls can be easier and better than teaching the compiler about memory access semantics

You are teaching the compiler about memory access semantics if you add rules to the optimizer such as that it can't elide memory accesses in inlined functions.

> by very slightly altering the compiler

... in other words by altering the language semantics, either for just this implementation, or in general.

> Memory-mapped I/O is a platform-specific thing. It shouldn't really be baked into the language.

I agree, but this is one of the reasons why language semantics ought to cover what memory access guarantees are provided when and how, because otherwise you can't reliably even use such OS level facilities without digging into how a specific implementation works.

> And even that would be a notable bloat; Lua does not have a big spec.

It doesn't take a big spec to say "memory writes inside functions [where ... <insert additional criteria>] should be treated the same as IO for the purposes of optimization, and e.g. can not be elided, coalesced or reordered", or whatever semantics you prefer. Saying "an optimizer is free to elide, coalesce or reorder memory accesses as long as it does not alter the behaviour observable from Lua" would be an acceptable decision too, though in e.g. this case it would mean the use in the article is unsupported and relies on undefined behaviour, but knowing is better than not knowing; at least then you make an informed decision. It's not about giving complex guarantees, it's about not leaving giant gaping unspecified holes that makes it unclear what optimizations are legal and safe for example, but that could be plugged with a sentence or two.

kendalharland · on July 26, 2015

Forgive my ignorance, but when writing a sufficiently large program, why would looking into the generated assembly be of use to a programmer? Wouldn't this turn into an arbitrarily large amount of instructions to sift through just to measure performance?

tekacs · on July 27, 2015

You find the 'hottest' code (usually a loop, the code executed most) and examine very carefully the generated assembly for that and how it can be made more CPU/JIT/VM-friendly.

This is an optimisation you perform when a small amount of code is consuming the overwhelming majority of your software's execution time (code very commonly has this property, until it is profiled and optimised).

In addition, this case is for a high speed networking library, where every CPU cycle counts.