submit a better implementation. it is not possible for any one entity to produce...

kbenson · on March 29, 2019

I don't use Typescript, and use Javascript only for what I have to. I don't care about the ranking of either of them, I'm just pointing out a possible flaw in the methodology that should be taken into account when looking at the numbers presented, and what I suspect is a concrete example of that.

This also isn't a criticism of the benchmark game, it's well known that not every implementation is equivalent in the time and effort put into optimizing it. It serves its purpose about as well as can be expected. Unfortunately, using it as the base of further calculations can lead to some of the known quirks of the benchmarks being exaggerated into results that are not always obviously an artifact of the underlying system, as I suspect this is. Making that obvious by pointing it out can be useful.

igouy · on March 30, 2019

> Unfortunately, using it as the base of further calculations can lead to some of the known quirks of the benchmarks being exaggerated

You seem not to have considered the possibility that the authors may have simply made a mistake, unrelated to the origin of the programs.

The authors presented at an Oct 2017 conference. Archived benchmarks game web-pages from 2017 do not show the 10x fannkuch-redux differences that the authors report --

https://web.archive.org/web/20170918163900/http://benchmarks...

kbenson · on March 30, 2019

> Archived benchmarks game web-pages from 2017 do not show the 10x fannkuch-redux differences that the authors report

I think the Sep 1st 2017 benchmark does, though.[1] At that point it's 1,204.93 seconds, compared to the 131.39 seconds on September 18th you referenced. That makes sense, since the paper could have been finished quite a bit prior to the conference.

1: https://web.archive.org/web/20170901020804/http://benchmarks...

igouy · on March 30, 2019

That's interesting. Do you agree that those very different times were measurements of the same TypeScript fannkuch-redux program?

5 July, Node 8.1.3, TypeScript 2.4.1

https://web.archive.org/web/20170715120038/http://benchmarks...

1 Sep, Node 8.4.0, TypeScript 2.5.2

https://web.archive.org/web/20170922144419/http://benchmarks...

----

How should we now assess your "suspiciously like entirely different algorithms were used in each implementation" comment?

gameswithgo · on March 30, 2019

it is absolutely a flaw in the methodology, but one I think is unavoidable without enormous resources.

kbenson · on March 30, 2019

I agree. I just wanted to point it out in case people took the analysis as entirely accurate without looking a little closer. I suspect the entries towards the top of the list are fairly accurate, by nature of their competitive standing in the benchmark games and their focus as languages on performance. It's much easier for an implementation difference to hide in the natural drift you see in the languages with VMs and interpreters that don't receive as much attention.

That's actually one of the reasons why the TypeScript/Javascript divide jumped out at me. They were mentioning that the bottom of the list was dominated by interpreted languages, and mentioned TypeScript by name (which surprised my given the focus JavaScript VMs have gotten), and then when I reviewed JavaScript's standing (which was more in-line with what I expected), I noticed the difference between it and TypeScipt was very pronounced, which is odd when (to my knowledge) TypseScript compiles to JavaScript, and not because it's doing a lot of convenience stuff that would slow it down. That said, I don't use TypeScript, so maybe I'm overlooking something.

52-6F-62 · on March 30, 2019

I, by nature of my current job, have to write JavaScript regularly. In those cases I almost always opt for TypeScript for my sanity.

So as a regular writer of TypeScript I had the exact same question as you.

pvg · on March 30, 2019

It took me about a minute looking at their published detailed data to notice the problem. They should have noticed a factor-of-15ish outlier and checked why on earth it was there.

igouy · on March 30, 2019

If we assume they understood the relationship between JavaScript and TypeScript then maybe it should have been noticed.

However, the original research has been posted multiple times to proggit and HN since 2017; and I don't recall whether or not anyone noticed this problem until now --

https://news.ycombinator.com/item?id=15249289

https://www.google.com/search?q=energy+efficiency+programmin...

pvg · on March 31, 2019

maybe it should have been noticed

It's a study presented at some conference so while not exactly the Higgs boson, they're showing other people data and the conclusions they derived from it. It's 100% their job to understand what their data measures and to notice that one of the measurements is completely bogus for their purposes. The fact other people hadn't necessarily noticed on messageboards before is mildly curious but it's not really their job.

igouy · on March 31, 2019

afaict the evidence is - not - that "one of the measurements is completely bogus".

On the contrary; we can see from archived web pages that other measurements showed the same relatively-poor performance, with those old versions of TypeScript.

pvg · on March 31, 2019

What do the archived pages have to do with this study? The study starts with a snapshot of benchmark game sources. That's a perfectly sensible way to get a bunch of implementations to bootstrap the study. But some of those implementations might be unsuitable for the study, just as (at least) one was, in their case. They don't seem to have noticed that. What's a good, benign explanation that they didn't?

igouy · on March 31, 2019

> But some of those implementations might be unsuitable for the study, just as (at least) one was, in their case.

Unsuitable because?

pvg · on March 31, 2019

Because it's a different implementation that happens to be 15 times slower than the implementation used in the straight JS version. That's fine for the benchmark game, it's a garbage input to a 'how energy efficient are these languages' study.

Let's say I want to measure the 'energy efficiency' of x86 assembly and JS. I'll use sorting an array of 1000 integers. In my JS implementation, I call Array.sort. In my x86 implementation, I randomly shuffle the array and check if it's sorted, if not repeat until it is. Does measuring the execution times of these tell me anything about the 'energy efficiency' of Javascript vs x86 assembly?

igouy · on April 1, 2019

> a different implementation that happens to be 15 times slower

A few TypeScript versions later, that happens to be only 1.6 times slower.

There are good questions to ask about how to handle possible outliers in a study that takes a snapshot of a changing situation and then seeks to make more general claims.

pvg · on April 2, 2019

It doesn't matter what happened in the benchmark game later, it's not a study about the benchmark game. It's not 'good questions', it's a huge fuckup by the authors of the study. You're simply wrong to keep saying otherwise.

igouy · on April 2, 2019

Supposedly you are right and others are wrong, which doesn't leave much to be discussed.

andrewprock · on March 30, 2019

That's not how the game works. The community "plays the game" by submitting better benchmarks when they come up with better solutions.

pvg · on March 30, 2019

It's not really about how the game works, it's whether the study is using sane data. In this case, it very much isn't and it should have been obvious to them the data is bad.

andrewprock · on March 30, 2019

The site has been through several iterations. Here is the description from 12 years ago:

https://web.archive.org/web/20070503205039/http://shootout.a...

pvg · on March 30, 2019

I don't understand what that's supposed to show.