This is a great collection, but any listing of microbenchmarks needs a caveat. C...

This is a great collection, but any listing of microbenchmarks needs a caveat.

Consider the first example, parallel assignment vs sequential assignment. As we can see by the results, parallel assignment is 2.25x slower, which seems like a monumental impact to performance, right? If all your application does is assign a few variables and exit, sure, but very few applications are this simplistic. In order to make a good judgement call on this optimization, you have to understand the impact within the context of your application:

What is the total execution time of your application?

What portion of that execution time is spent on assignment?

What portion of that execution time is spent on the extra allocation of an array due to parallel assignment?

At the bottom of the benchmark, we can see the iteration rate for each. Parallel assignment managed a rate of 2521708.9 iterations per second. We can work out the total execution time per iteration from this number:

Single iteration as a fraction of a second: 1/2521708.9

In decimal form: 0.000000396556478 s

Converted to milliseconds: 0.000397 ms

The same conversion for for parallel assignment gets us: 0.0001758783 ms.

In each iteration, we save 0.0002206782 ms.

Circling back to my list of questions, what is the total execution time of my application? If my app uses an I/O calls — and especially network I/O — it could be hundreds of ms. At this delta, it would take over 4,500 iterations of this optimization to achieve an improvement of 1 ms. If we're talking about an operation that occurs locally and is 100% in-memory, execution times may be <50 ms, at which point, you'd need around 2,250 iterations to get a 1 ms improvement.

At this point, I have to tattle on myself. This is an obtuse method of analysis. Microbenchmarks are hard, and at the i/s rate we're seeing here, there could be confounding factors that the author (and I) haven't accounted for. Things like garbage collection and object caching will have an impact at these time scales. Also, we have to ask whether our microbenchmark reflects reality? What real world application repeatedly assigns literals to variables millions of times per second? Extrapolating any meaningful decisions from the microbenchmarks alone is a fools errand.

The lesson is that microbenchmarks can only tell you so much. A comprehensive approach to optimization involves looking at the total run time and apportionment of time in an actual application. This process is called profiling, and the tools for profiling Ruby applications have improved in recent times.

Looking at the parallel vs sequential assignment difference, what you really want to know is whether parallel vs sequential assignment is impacting your application, and to what degree. Profiling tools will tell you where your application spends its time, and where it's allocating memory. This tells you where to look. Microbenchmarks will tell you which idioms you pay a penalty for. The combination of the two allows you to make smart decisions.

If you have a a parallel assignment wrapped in a loop that will execute hundreds of thousands of times every time your application runs, this will show up during profiling. Moving to sequential optimization will likely pay dividends. Otherwise, the penalty paid for parallel assignment is probably minimal. Profiling is a good way to tell the difference.