Hacker News new | past | comments | ask | show | jobs | submit login
Microbenchmarks Are Experiments (mrale.ph)
57 points by zdw 5 days ago | hide | past | favorite | 12 comments





It’s cool to see this kind of analysis, even if it’s analyzing a totally bogus benchmark.

If you want to compare language runtimes, compilers, or CPUs then you have to pick a larger workload than just one loop. So, if a microbenchmark is an experiment, then it is a truly bad experiment indeed.

Reason: loops like this are easy for compilers to analyze, in a way that makes them not representative of real code. The hard part of writing a good compiler is handling the hard to analyze cases, not loops like this. So, if a runtime does well on a bullshit loop like this then it doesn’t mean that it’ll do well on real stuff.

(Source: I wrote a bunch of the JSC optimizations including the loop reshaping and the modulo ones mentioned by this post.)


> loops like this are easy for compilers to analyze, in a way that makes them not representative of real code

Which makes it a perfectly fine benchmark to measure whether a particular compiler implements these optimisations. The benchmark also highlights fine implementation details. I did not know about Dart's interrupt checks, for instance.

I see these microbenchmarks as genuinely useful, as I can analyse them, the logic behind them, and apply results in interpreter design. Consider [0] for example. Any sane compiler would do this kind of optimisation, but I've seen only one production interpreter (daScript) doing it.

[0] https://ergeysay.github.io/optimising-interpreters-fusion.ht...


> Which makes it a perfectly fine benchmark to measure whether a particular compiler implements these optimisations.

No, because whether the optimizations are “implemented” doesn’t matter.

What matters is whether the optimizations are robust enough to trigger for real code, not just bogus dead loops.


And what if the runtime does poorly on even such a simple loop? Go is surprisingly slower here than Java and Kotlin.

I agree with the author of the blog here - microbenchmarks are just experiments and they can be very useful if you do proper analysis of the results. You can definitely learn something about the runtimes even from such a simple for loop benchmark.


As a VM implementer, I conclude nothing from the observation that Go is slower on this test. Because it’s a bullshit loop that I wouldn’t ever write and neither would anyone else unless they were trying to troll people with nonsense “experiments”.

The fact that you’re drawing information from the fact that Go is slower on this loop is just you being mislead by a bad experiment.

If you want to understand Go’s performance, run some real benchmarks. My understanding is that Go is plenty fast.


A simple loop like that running slower tells that the compiler optimization strength is not really very good. If it misses optimizations of trivial code like looping and basic arithmetic, it will likely miss even more in complex code. And instead of getting defensive about your language of choice, the right reaction is what Dart developers did - they improved their compiler. The benchmark proved actually useful to them.

This benchmark is not any less real than your “real world” benchmark. By being a simple micro benchmarks it may be actually even more useful than running very complex “real world” code, because it simplifies analysis.


> The hard part of writing a good compiler is handling the hard to analyze cases, not loops like this

But it means that it can handle the easy case and if one is already bad at that it indicates that it won’t do well on harder ones either.

My takeaway from 1 billion loops wasn’t exactly “js is always fast” but “js can be fast while python will always be slow” (talking about their main interpreters of course).


> It’s cool to see this kind of analysis, even if it’s analyzing a totally bogus benchmark.

Yeah but that bogus benchmark has become viral on social networks, and people are even writing their own shitty "experiments" based on that one.

This post is absolutely wonderful in that it doesn't shit on the original benchmark too much, while explaining patiently how to do it right. Hopefully, the person who started with the bogus benchmark, as well as people following in his footsteps, will learn something important (I myself have posted benchmarks before - though I believe much higher quality - but failed to properly analyse the results as shown here).

Notice that the bogus benchmark ended up catching the attention of a Dart maintainer and looks like they found their compiler was missing some easy optimisations... so it may end up having been helpful for every Dart user in the world!


I cannot stop being surprised how fast Java is in mocrobenchmarks compare to may experience with real world Java applications. Probably Java's problem is not JVM but the culture - a typical Java application probably has many times more code than a C/C++ application doing something similar. This code would come mostly form of libraries (frameworks) but many layers deep onion architecture in the application itself is also a common case.

I take away a different lesson: Dart is most likely slow out of the box. The author lists several reasons: int vs. Int64, GC interrupts, load hoist optimization. All of these are issues with Dart. Hence microbenchmarks, even without interpretation or validation, point to issues with the language implementation. They are not "meaningless".

It is true, one microbenchmark only shows that said microbenchmark is slow, not that the language as a whole is slow, but the plural of anecdotes is data. If you systematically evaluate a representative set of microbenchmarks (as in the Computer Language Shootout), then it is proof that the language is slow or fast.

Now of course one can argue about what is "representative", but taking random samples of code from across GitHub seems like a reasonable approach. And of course there is the issue of 1-1 translation but at this point LLM's can do that.


Responding to the title:

I've always viewed the act of computer programming as a bunch of experiments or thought experiments. You first build a hypothesis in code, then you evaluate it and analyze the results.

I like this because it puts the science in computer science.


Tweet OP here: Very good analysis, thanks for sharing.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: