submit a better implementation. it is not possible for any one entity to produce optimal implementations for many languages, it is up to us. or, if you dont like their rules, make your own benchmark system with new rules.
I don't use Typescript, and use Javascript only for what I have to. I don't care about the ranking of either of them, I'm just pointing out a possible flaw in the methodology that should be taken into account when looking at the numbers presented, and what I suspect is a concrete example of that.
This also isn't a criticism of the benchmark game, it's well known that not every implementation is equivalent in the time and effort put into optimizing it. It serves its purpose about as well as can be expected. Unfortunately, using it as the base of further calculations can lead to some of the known quirks of the benchmarks being exaggerated into results that are not always obviously an artifact of the underlying system, as I suspect this is. Making that obvious by pointing it out can be useful.
> Unfortunately, using it as the base of further calculations can lead to some of the known quirks of the benchmarks being exaggerated
You seem not to have considered the possibility that the authors may have simply made a mistake, unrelated to the origin of the programs.
The authors presented at an Oct 2017 conference. Archived benchmarks game web-pages from 2017 do not show the 10x fannkuch-redux differences that the authors report --
> Archived benchmarks game web-pages from 2017 do not show the 10x fannkuch-redux differences that the authors report
I think the Sep 1st 2017 benchmark does, though.[1] At that point it's 1,204.93 seconds, compared to the 131.39 seconds on September 18th you referenced. That makes sense, since the paper could have been finished quite a bit prior to the conference.
I agree. I just wanted to point it out in case people took the analysis as entirely accurate without looking a little closer. I suspect the entries towards the top of the list are fairly accurate, by nature of their competitive standing in the benchmark games and their focus as languages on performance. It's much easier for an implementation difference to hide in the natural drift you see in the languages with VMs and interpreters that don't receive as much attention.
That's actually one of the reasons why the TypeScript/Javascript divide jumped out at me. They were mentioning that the bottom of the list was dominated by interpreted languages, and mentioned TypeScript by name (which surprised my given the focus JavaScript VMs have gotten), and then when I reviewed JavaScript's standing (which was more in-line with what I expected), I noticed the difference between it and TypeScipt was very pronounced, which is odd when (to my knowledge) TypseScript compiles to JavaScript, and not because it's doing a lot of convenience stuff that would slow it down. That said, I don't use TypeScript, so maybe I'm overlooking something.
It took me about a minute looking at their published detailed data to notice the problem. They should have noticed a factor-of-15ish outlier and checked why on earth it was there.
If we assume they understood the relationship between JavaScript and TypeScript then maybe it should have been noticed.
However, the original research has been posted multiple times to proggit and HN since 2017; and I don't recall whether or not anyone noticed this problem until now --
It's a study presented at some conference so while not exactly the Higgs boson, they're showing other people data and the conclusions they derived from it. It's 100% their job to understand what their data measures and to notice that one of the measurements is completely bogus for their purposes. The fact other people hadn't necessarily noticed on messageboards before is mildly curious but it's not really their job.
afaict the evidence is - not - that "one of the measurements is completely bogus".
On the contrary; we can see from archived web pages that other measurements showed the same relatively-poor performance, with those old versions of TypeScript.
What do the archived pages have to do with this study? The study starts with a snapshot of benchmark game sources. That's a perfectly sensible way to get a bunch of implementations to bootstrap the study. But some of those implementations might be unsuitable for the study, just as (at least) one was, in their case. They don't seem to have noticed that. What's a good, benign explanation that they didn't?
Because it's a different implementation that happens to be 15 times slower than the implementation used in the straight JS version. That's fine for the benchmark game, it's a garbage input to a 'how energy efficient are these languages' study.
Let's say I want to measure the 'energy efficiency' of x86 assembly and JS. I'll use sorting an array of 1000 integers. In my JS implementation, I call Array.sort. In my x86 implementation, I randomly shuffle the array and check if it's sorted, if not repeat until it is. Does measuring the execution times of these tell me anything about the 'energy efficiency' of Javascript vs x86 assembly?
> a different implementation that happens to be 15 times slower
A few TypeScript versions later, that happens to be only 1.6 times slower.
There are good questions to ask about how to handle possible outliers in a study that takes a snapshot of a changing situation and then seeks to make more general claims.
It doesn't matter what happened in the benchmark game later, it's not a study about the benchmark game. It's not 'good questions', it's a huge fuckup by the authors of the study. You're simply wrong to keep saying otherwise.
It's not really about how the game works, it's whether the study is using sane data. In this case, it very much isn't and it should have been obvious to them the data is bad.