Hacker News new | past | comments | ask | show | jobs | submit login
Reporting a bug on a fragile analysis (blog.mozilla.com)
284 points by lordgilman on Nov 17, 2010 | hide | past | favorite | 76 comments



I know that my submission's title is not the same as the blog post's title and that I will get some hate for it. However, the two diffs linked in the post give pretty convincing evidence that IE is picking up on the exact SunSpider test. Furthermore, if you read the last sentence of the blog post the author is more or less beating around the "You're cheating, we've caught you red-handed" bush.


I don't think it's ever a good idea to make a headline more sensational. We should wait for a response from Microsoft before declaring them liars/cheaters. It's entirely possible that Microsoft has a valid explanation.

Personally - on at least two occasions, I've been accused of writing code that was specifically written to cause grief/problems with another person, only to have to explain that it was a bug and that their personal test-case isn't the only place where it fails.

* And even if the author did directly accuse MS of cheating, that doesn't mean we can't be more correct and rewrite it to be neutral.


I totally agree with you on minimizing the controversy until incontrovertible facts have been found.

In my personal experience, assuming bad will on another person's part kills all chance of civil discussion and severely hurts your chances of finding more facts, because people are on the defensive. Not to mention how hard it is to remove that sort of egg stain from your face.

that'll be 2¢ please!


I don't think it is appropriate at all to change the title or imply meaning on the original article. Further, assume stupidity not malice.


Plus, it makes it harder to search for the discussion of the original article.


IMHO, you should not conclude unless you corroborate the evidence the blog author provided. There is a reason why he didn't name the blog post so. It's you who say MS cheated and it's for you to prove the fact with the evidence. Also "more or less" has never meant conclusive. Just a thought, that title is misleading and a question mark wouldn't have raised this issue at all.


I might edit your post title to include a question mark, but I don't think it's too much of a stretch past that.


You're too generous. The author knows he caught MS red-handed:

"What sorts of code does the analysis work on, other than the exact function included in SunSpider?" (emphasis mine)


This submission demonstrates why you should just stick to the original article title you're linking, instead of coming up with your own flamebait/trolly title.

The issue appears to be a SunSpider bug, not an IE9 bug or "cheat". See http://news.ycombinator.com/item?id=1913368 for more information.

lordgilman, I hope you now realize it would've been wise to wait before passing judgment (especially in a public forum).

Edit: I don't know what's with the downvotes. I'm just going by the HN Guidelines, posted at http://ycombinator.com/newsguidelines.html? If you have a problem, don't downvote me, take it up with pg.


The post you link to has edited it's previous findings.

They now state that IE9 is certainly cheating on the sunspider.

I hope you now realize it would've been wise to wait before passing judgment on lordgilman ( especially in a public forum )


He simply should've followed the HN guidelines (http://ycombinator.com/newsguidelines.html) and linked to the article and stuck to the original title.


If I may be pedantic for a moment, the HN guidelines only warn against "gratuitous editorial spin." I don't feel the changes were gratuitous at all because (if you look at the last sentence in the blog post) my title is clearly the point Mr. Sayrer is trying to make.


The changes were gratuitous. That's why the author didn't directly say "You're cheating".

He wanted a direct answer while at the same time giving them the benefit of the doubt.

But whatever, your mind is made up.


The problem is the analysis to prove they are cheating is faulty. See my other post at: http://news.ycombinator.com/item?id=1914541

Note, I'm not saying they're NOT cheating. I'm just saying that no one has yet provided strong evidence that they are.


I agree entirely. Did you see my defense when I first posted? http://news.ycombinator.com/item?id=1913109


I think you mean http://news.ycombinator.com/item?id=1913109 (without the trailing /). Yes, you defended this choice well.


Pretty much all browser vendors agree SunSpider is a bad benchmark, but yet it keeps getting used and abused. All vendors have tweaked their JS engine for SunSpider itself.

Dromaeo is a much better benchmark suite in that it tests actual DOM things rather than pure language stuff. Kraken (also by Moz) also attempts to focus on webapp usecases rather than doing billions of regexes per second.


> Pretty much all browser vendors agree SunSpider is a bad benchmark, but yet it keeps getting used and abused. All vendors have tweaked their JS engine for SunSpider itself.

Still, there is a gap between tweaking the JS engine and running completely different code (a gap which most GPU makers jumped over without hesitating a few years ago, but it's annoying to see the issue crop up again)


Testing DOM performance is a very different beast from testing JS engine performance.

For example, DOM-based tests are useless when trying to compare Node.js performance against other JS engines.


There's a decent Microsoft blog post explaining which aspects of browser performance the different benchmarks test:

http://blogs.msdn.com/b/ie/archive/2010/09/14/performance-wh...

On SunSpider: "The WebKit SunSpider tests exercise less than 10% of the API’s available from JavaScript and many of the tests loop through the same code thousands of times. This approach is not representative of real world scenarios and favors some JavaScript engine architectures over others."


About the Dromaeo test: IE could call CAPICOM to deal with AES, Base64, RSA in browser, which is super fast.

And personally I think all browsers could just expose an API for these kinds of encryption and computing-heavy stuff, like secure random seeds, etc. Implementing those in Javascript is just a temporary solution.


I believe that we need as DOM+js as pure js benchmarks.



In case you don't have IE9 installed, the benchmark results (quoted in the previous blog post) are:

  cordic: 1ms +/- 0.0%
  cordic-with-return: 22.6ms +/- 2.7%
  cordic-with-true: 22.5ms +/- 2.7%
(Taken from http://blog.mozilla.com/rob-sayre/2010/09/09/js-benchmarks-c... )


Simplifying for my tiny brain: does this mean that IE9 takes more than 22x longer to complete a "real" test instead of the "optimized" one?


Edit (yet again): My initial conclusions were wrong, and it's nearly certainly cheating. Dammit. I hate being wrong in front of people smarter than me. :<

----

I'm running the same benchmark independently right now. Core i7 in a Win7 64-bit install.

For each test, I did 5 runs and averaged them. I increased the number of loops in each test from 25,000 to 250,000 as well.

  Chrome 9.0.576.0
        Stock: 105.28ms
  With "true": 104.44ms

  MS IE 9.0.7930.16406
        Stock: 10.98ms
  With "true": 181.16ms
That's a pretty interesting jump.

Here's a fun little observation:

  Chome: 8.6ms
     IE: 10.9ms

  --- tests/sunspider-0.9.1/math-cordic.js        2010-11-17 00:55:29.000000000 -0700
  +++ tests/sunspider-0.9.1-deadcode/math-cordic.js       2010-11-17 01:10:36.000000000 -0700
  @@ -63,6 +63,7 @@
       TargetAngle = FIXED(28.027);
       CurrAngle = 0;
       for (Step = 0; Step < 12; Step++) {
  +        return;
           var NewX;
           if (TargetAngle > CurrAngle) {
               NewX = X - (Y >> Step);
By returning immediately out of the loop, Chrome's time drops by a factor of 12.1, whereas IE's stays pretty much constant.

I suspect what's happening here is that the IE engine is somehow marking that entire function as deadcode, and thus, not running it; the ~10ms accounts for the time it takes to run that for loop 250k times, but the cordicsincos() code is not being run at all. Ironically, deadcode somewhere in the function causes the engine to NOT throw it all away, and its gets run.

In fact, if we just kill that for loop all together:

  --- tests/sunspider-0.9.1/math-cordic.js        2010-11-17 00:55:29.000000000 -0700
  +++ tests/sunspider-0.9.1-deadcode/math-cordic.js       2010-11-17 01:16:36.000000000 -0700
  @@ -62,20 +62,6 @@

       TargetAngle = FIXED(28.027);
       CurrAngle = 0;
  -    for (Step = 0; Step < 12; Step++) {
  - *snip for brevity*
  -    }
   }
We get the following times:

  Chrome: 8.9ms
      IE: 10.6ms
What I suspect is that the IE engine is seeing "Okay, nothing is returned, and nothing outside of the scope of this function is ever altered", so once it steps into it, it just immediately returns. This is arguably correct behavior! That code is, for all practical purposes, worthless.

If we just move one of the variable references out of function scope (or just remove the var, making it effectively a global variable), IE takes the extra time to run:

  --- tests/sunspider-0.9.1/math-cordic.js        2010-11-17 00:55:29.000000000 -0700
  +++ tests/sunspider-0.9.1-deadcode/math-cordic.js       2010-11-17 01:22:49.000000000 -0700
  @@ -50,11 +50,11 @@
  
  +var CurrAngle;
   function cordicsincos() {
       var X;
       var Y;
       var TargetAngle;
  -    var CurrAngle;
       var Step;

       X = FIXED(AG_CONST);         /* AG_CONST * cos(0) */

  Chrome: 99.9ms
      IE: 217.1ms
Sorry, guys. I like a good IE bash-fest (Hey, it's still slower than V8 when it actually runs the code!) as much as anyone, but I think it's legit here. The benchmark is poorly-conceived, and IE does the right thing with it, though it obviously distorts the scores in their favor. That's a problem with the benchmark, though, not IE.

Edit (like...#14): It could well just be cheating on analysis in this particular case, which I stupidly overlooked. For example, this diff:

  --- tests/sunspider-0.9.1/math-cordic.js        2010-11-17 00:55:29.000000000 -0700
  +++ tests/sunspider-0.9.1-deadcode/math-cordic.js       2010-11-17 02:09:34.000000000 -0700
  @@ -62,7 +62,7 @@

       TargetAngle = FIXED(28.027);
       CurrAngle = 0;
  -    for (Step = 0; Step < 12; Step++) {
  +    for (Step = 12; Step > 0; Step--) {
           var NewX;
           if (TargetAngle > CurrAngle) {
               NewX = X - (Y >> Step);
Results in runtimes of:

  Chrome: 246.8ms
      IE: 956.5ms
Replacing the for with a while also results in long runtimes:

  --- tests/sunspider-0.9.1/math-cordic.js        2010-11-17 00:55:29.000000000 -0700
  +++ tests/sunspider-0.9.1-deadcode/math-cordic.js       2010-11-17 02:12:03.000000000 -0700
  @@ -59,10 +59,11 @@

       X = FIXED(AG_CONST);         /* AG_CONST * cos(0) */
       Y = 0;                       /* AG_CONST * sin(0) */
  +    Step = 0;

       TargetAngle = FIXED(28.027);
       CurrAngle = 0;
  -    for (Step = 0; Step < 12; Step++) {
  +    while(Step < 12) {
           var NewX;
           if (TargetAngle > CurrAngle) {
               NewX = X - (Y >> Step);
  @@ -75,6 +76,7 @@
               X = NewX;
               CurrAngle -= Angles[Step];
           }
  +        Step++;
       }
   }

  Chrome: 103.4ms
      IE: 190.0ms
So, my initial conclusions were wrong. Its dead code analysis is either incredibly narrow, or it was hand-crafted to optimize out that part of the benchmark. Either way it's rubbish.


You missed the point. Yes, dead code analysis may make sense (although a benchmark that includes dead code is probably a bad benchmark...), but this "dead code analysis" fails on utterly trivial variations of the benchmark (http://people.mozilla.com/~sayrer/2010/sunspider/diff1.html, http://people.mozilla.com/~sayrer/2010/sunspider/diff2.html - adding a "true" in the middle breaks it!).

The most likely conclusion is that IE doesn't do any "real" dead code analysis; it just recognizes this particular snippet.


I think that regrettably, you might be right. It's obviously not just checking for a bytecode match (see my var foo example), but it's doing something hinky. I did a simple pow-and-modulo test with the same assumptions and it didn't optimize it away.


This isn't "regrettable" it's simply par for the course from everybody's favorite tech company.

The only thing they will regret is getting caught, much like any sociopath.


It's absolutely regrettable. If this was legit, it would mean that the browser would be faster, the user experience would be better, and developers would be another tiny step closer to having an easier time of things when working in IE. I don't feel sorry for Microsoft here, but I'm a web developer, and I want fast, continually-improving browsers to code against.


The claim that IE is legit hinges on the diffs causing a valid bug in IE's optimization code (e.g. deadcode inside deadcode prevents the latter being optimized out), versus foul play in the engine (a hard-coded case for this benchmark).

Can you find any variation on the benchmark code that still allows IE to optimize it, or does it only optimize the exact form of the code used in the benchmark suite?


I changed variable names and declaration order, the number of loops in that inner for loop, and other such things that could possibly change the bytecode (to what effect, I don't know - I'm not a JS VM engineer, obviously) without changing the operations actually performed.

I don't have an explanation for this, though (maybe variable initialization counts as "run up until here"?):

Runs fast (11ms):

  --- tests/sunspider-0.9.1/math-cordic.js        2010-11-17 00:55:29.000000000 -0700
  +++ tests/sunspider-0.9.1-deadcode/math-cordic.js       2010-11-17 01:42:43.000000000 -0700
  @@ -56,12 +56,14 @@
       var TargetAngle;
       var CurrAngle;
       var Step;
  +    var foo;

       X = FIXED(AG_CONST);         /* AG_CONST * cos(0) */
       Y = 0;                       /* AG_CONST * sin(0) */

       TargetAngle = FIXED(28.027);
       CurrAngle = 0;
  +    foo = 1;
       for (Step = 0; Step < 12; Step++) {
           var NewX;
           if (TargetAngle > CurrAngle) {
But if I assign foo after the for loop, it runs slow:

  --- tests/sunspider-0.9.1/math-cordic.js        2010-11-17 00:55:29.000000000 -0700
  +++ tests/sunspider-0.9.1-deadcode/math-cordic.js       2010-11-17 01:43:20.000000000 -0700
  @@ -56,6 +56,7 @@
       var TargetAngle;
       var CurrAngle;
       var Step;
  +    var foo;

       X = FIXED(AG_CONST);         /* AG_CONST * cos(0) */
       Y = 0;                       /* AG_CONST * sin(0) */
  @@ -76,6 +77,7 @@
               CurrAngle -= Angles[Step];
           }
       }
  +    foo = 1;
   }
If I assign foo inside the for loop, it runs fast:

  --- tests/sunspider-0.9.1/math-cordic.js        2010-11-17 00:55:29.000000000 -0700
  +++ tests/sunspider-0.9.1-deadcode/math-cordic.js       2010-11-17 01:44:41.000000000 -0700
  @@ -56,6 +56,7 @@
       var TargetAngle;
       var CurrAngle;
       var Step;
  +    var foo;

       X = FIXED(AG_CONST);         /* AG_CONST * cos(0) */
       Y = 0;                       /* AG_CONST * sin(0) */
  @@ -63,6 +64,7 @@
       TargetAngle = FIXED(28.027);
       CurrAngle = 0;
       for (Step = 0; Step < 12; Step++) {
  +        foo = 1;
           var NewX;
           if (TargetAngle > CurrAngle) {
               NewX = X - (Y >> Step);
Color me boggled.


The point is whether it does the right thing to all simlar dead code. Or if a human being has made the same analysis as you and added a shortcut that triggers when it sees this exact benchmark code.

The Mozilla guys clearly know this is dead code, do you really want them and every other javascript engine to be adding code targetted at this exact code snippet?


That's a good point; it appears to be doing legit dead code analysis, but the point still remains that if it's custom-tailored to detect that as dead code when it won't do it in the general case, it's cheating.

I may be eating my hat here, because I just replaced the cordicsincos() with the following:

  function numNumNum() {
    var I;
    var num = 10;
    for (I = 0; I < 10; I++) {
      num = num * num * num * num * num % num;
    }
  }
Using the same benchmarking framework, I get these times:

  Chrome: 849.5ms
      IE: 1226.4ms
That would seem to satisfy all the previous conditions - no leaked scope, no return, no external functions - but it doesn't get optimized away. I'd assumed that by "cheating", it would be hot-swapping that benchmark's bytecode for optimized bytecode, or running a function in C or something, rather than just cheating on the dead code optimization. Bad assumptions make for bad benchmarks!

Witch hunt on!


Cheald, could yu run this test:

function numNumNum() { var I; var num = 10; for (I = 0; I < 10; I++) { num = num + num + num + num + num - num; } }

See if that changes the results?


Sure. That does, in fact, optimize well!

Edit: I just published my testing setup here: https://github.com/cheald/SunSpider-deadcode

  --- tests/sunspider-0.9.1/math-cordic.js        2010-11-17 00:55:29.000000000 -0700
  +++ tests/sunspider-0.9.1-deadcode/math-cordic.js       2010-11-17 15:08:43.000000000 -0700
  @@ -80,11 +80,15 @@

   ///// End CORDIC

  +function numNumNum() { var I; var num = 10; for (I = 0; I < 10; I++) { num = num + num + num + num + num - num; } }
  +
  +///// End CORDIC
  +
   function cordic( runs ) {
     var start = new Date();

     for ( var i = 0 ; i < runs ; i++ ) {
  -      cordicsincos();
  +      numNumNum();
     }

     var end = new Date();

  Chrome: 19.2ms
      IE: 1.0ms
Curious.


I think this is just fragility, not cheating. I don't know JS super well, but in some languages you might see some rules associated with certain operations and preserving over/under flow exceptions and such.

In any case I think a few things happened here:

1) For whatever reason the "true" statement caused the compiler to think there was a side-effect potential. I suspect the compiler simply didn't know what to do with it, and they hadn't handled 'true;' or 'false;' as standalone statements in their optimizer. I bet if you put 'true;' in the middle of that loop it will break the DCE.

2) The probably don't do liveness analysis. So they can see that a block doesn't change global state, but don't look to see if the proceeding blocks use any of the variables. So if there is any code after a block they assume that they can't DCE that block.

3) '*' and '%' causing problems may be very specific to those operations, and I'm guessing '/' too.

All in all I'd say it is a target incomplete implementation, but not cheating. Based on what I've seen thus far.


And also, thanks for doing the run and posting a link to your setup.


This reminds me a lot of benchmarking Haskell which does significant dead code analysis and thus was breaking benchmarks. The benchmarks were, of course, modified to do more with the looping code and ensure that all the code paths were run, but the interesting moral is that sometimes artificial benchmarks unsurprisingly don't test what you think they're testing.


I have to say, that is some damn good code analysis. The function executes, nothing external happens in the function, nothing is returned from the function: the function is dead code, don't run it.

So yes, while IE9 is technically slower than chrome, it does good stuff with code analysis which chrome should too. Impressive by the way. :)

In the end the benchmark should be augmented to force the function to run.


It would be good code analysis if it applied in the general case. That was my initial assumption too - that IE is doing the right thing - but the fact that it fails to apply this same analysis in other cases where the same conditions apply (no external scope modified, no returns, etc) makes it feel awfully suspiciously like it's cheating on the analysis of that function in particular.


I don't have IE9 installed and therefore can't verify the benchmark results. If they are genuine, I'm having a hard time coming up with a different conclusion.

By the way, hilarious idea to make this into a bug report.

EDIT: of course I find it much easier to believe that someone at Microsoft optimized for the benchmark than that someone at Mozilla would fudge the timing results, especially when it's so easy to verify the claims.


> hilarious idea to make this into a bug report.

Actually, the MS guys requested that. I'm sure there's a lesson about soulless bureaucracies somewhere in there...


Bugs are more likely to get fixed if they are posted somewhere visible, instead of sitting in someone's inbox...


um.. Are you using the right version of IE 9? The current version is IE 9 PP7 (1.9.8023.6000), which produced a Sunspider result of 216ms. You are testing the beta version of IE 9, which was released in back in September. Of course the old version of IE 9 will have a much slower speed. By the way the Sunspider result for IE 9 beta was about 340 ms


What if it just means that IE9 takes 21 more milliseconds to complete a "real" test instead of the "optimized" one?</silly question>


A better test to see if IE9 is cheating is to remove/rearrange code and rename variables. I'd avoid changing operators. Adding a 'true;' or 'return;' may seem harmless, but if their analysis is fragile they may just throw as "may have side-effects" on those statements or (in the case of the 'return;') it may not do liveness analysis on the other side of the block.

This code (taken from this thread) seems like a good test:

function numNumNum() { var I; var num = 10; for (I = 0; I < 10; I++) { num = num * num * num * num * num % num; } }

Except it uses two new operators: '*' and '%'. Test the same code using '+' and '-'.

This will give a much better idea of it the analysis is just fragile or if this code was being targeted.


At what point does "fragile" become "targeted"? Seriously, if it's that narrow...


Well there's really three words of interest here: fragile, targeted, and cheating.

Cheating is really doing something like looking specifically for sunspider and then doing DCE based on knowing the function.

Fragile is distinct from cheating in that there is actually a real analysis framework in place, but the analysis can be invalidated easily. For example, it's not uncommon to see analysis assume function calls may write to all globals and modify all byref arguments. Looking at the code you can say, "with interprocedural analysis its obvious that this function has no side effects", but the analysis may not be that smart. That's an example of fragility.

Now with this example, given that the browser is in Beta/CTP I wouldn't be at all surprised if their framework was simply incomplete. The 'return;' statement causing a problem, but renaming and reordering variables doesn't is the clearest indication IMO. It seems to indicate that they aren't doing any liveness analysis on the backside, but they aren't doing simple pattern matching on the text, nor the IR.

Targeting is really about how one brings up the framework. I actually wouldn't be surprised to hear that they did target sunspider, and that sunspider is probably part of their regression suite. With that said, this is EXTREMELY common in the compiler industry.

Now the question you're arguing is does targeting == cheating? In most cases, no. In fact my suspicioun is that what we're seeing here is the result of either an incomplete implementation where they did target sunspider, or a more complete implementation that broke, but no one noticed because its main DCE test was sunspider.

If IE9 can turn this around with a fix in their next CTP, it was probably not cheating and just a case of targeting. The reason being that doing a static analysis framework that is capable of being robust in these situations is non-trivial, and not something you just add in post-beta.

And if someone could run the test I posted above with '+' '-' rather than '*' '%' we'd have a first step in our answer. I would do it, but I neither know the sunspider harness, and don't have IE9 installed (and getting a new VM on this particular machine is a hassle).


It certainly seems like Microsoft is 'cheating', but it also seems like an excellent but warped example of Test Driven Development: they solved the failing test by the simplest and most direct means available. If time and budget hold out they will refactor later to generalize.

How do the TDD proponents feel about Microsoft's approach? How is it different than the supposedly correct behaviour demonstrated here: http://thecleancoder.blogspot.com/2010/10/craftsman-62-dark-...


Well in this fantasy TDD scenario (as in we don't know what happened in MS so I'm just making stuff up), presumably the product requirement was "make IE9 look very fast on the benchmarks without getting caught cheating".

So sure, they solved the first failing unit test (make IE9 look quick), but don't seem to have written enough unit tests to make sure the second part of the requirement works. So they would fail the acceptance tests and have to keep working on it.

(Wouldn't count myself as a full on TDD proponent but do use it when the time is right.)


The actual blog post title is:

> Reporting a bug on a fragile analysis


I think the title is pretty true based on the diffs though. In one example, the only difference was adding "true;" in the middle of the code somewhere -- essentially adding a no-op instruction causes vastly different benchmarks? definitely fraud.

I wouldn't be surprised if microsoft added code like "if (isSunspiderTest) {loadHandOptimizedAssembly()}"


Adding seemingly trivial things to code can sometimes throw off performance entirely, by disaligning cache tables and such. It's not always cheating.

That said, if this is a pure bug, it seems pretty pathetic. For one, it proves that the engine is not robust. For another, it probably means that someone spent hours upon hours tweaking the code with only the sunspider benchmark as test - analogous to over-fitting the training data. It's really tempting to do this, but it's also a common enough amateur mistake that Microsoft should have best practices to avoid it.

All this is speculative for now. Let's see what they say.


> disaligning cache tables and such

This is JavaScript we are talking about.


Noooooo! I can already imagine the pain writing HTML5 in 2014: "Well, IE9 is too slow to do common thing X reliably, so let's trigger the SunspiderTest optimizations using this hack..."

It's like finding out that IE9 only performs well on the subset of JS needed when you are drawing a fishtank..


The second example is even more of a smoking gun -- adding a "return;" to the end of a function shouldn't affect optimization within an earlier loop. Especially not that much!


IIRC there was this Microsoft website which listed a few HTML demos in which ie9 was way faster than even google chrome. I wonder whether they used the same 'technique' there too.


I think that was mostly down to IE9 using hardware acceleration, which Chrome doesn't have yet.


They have a paradigm in machine learning called over fitting. Trying to do well on a test dataset by cheating and seeing it first... I think teh benchmark should choose tests randomly from a large set of tests and calculate the expected performance over a number of such random runs. not allowing any one to cheat...


Over-fitting and peeking at the test set are completely different things. Over-fitting may in fact degrade performance on a test set, because it means you are giving too much weight to idiosyncratic patterns in the training data. Peeking at the test data, however, is right out, and should invalidate any results you try to report.


If I understand you correctly, what you are suggesting is that one way to improve deadcode analysis would be to start with known dead code and compare the results the deadcode analysis algorithm to the results achieved by "cheating."

Given that SunSpyder is a known example of deadcode and that using it is easier than writing a new deadcode benchmark, your explanation seems somewhat plausible (assuming I am understanding you correctly).

Edit: As a general case, there would seem to be a legitimate rationale for recognizing standard javascript snippets and loading pre-compliled routines to improve execution.


This was revealed 68 days ago, but nobody seemed to be interested in it at the time:

http://news.ycombinator.com/item?id=1676827


That's a pretty big conclusion to jump to (they are cheating the test) based on a small amount of evidence. If they were "precompiling" the java script for the test, and had functionality to "preconpile" java script code in the cache, would the fact that they precompiled the benchmark mean they were cheating? No. It wouldn't.

Keep in mind that there is a lot of code, such as Jquery, that is identical but distributed from many sources. It could benefit from similar matching and pre-compilation.

If dead code analysis (and other optimizations) was part of an "offline" compilation step (that's not efficient enough to do online), then changing the code would result in a slower execution path. Once the method body changes, the compiler wouldn't know it was dead without re-running the analysis (the changes could introduce side effects).

Now, this doesn't mean they are not cheating, because there is no evidence either way. But, what you are observing in this case doesn't imply cheating either.


Did you look at the diffs? There's not much room for the kind of excuses you're coming up with.


Yes. I'm not comming up with an excuse.

Other java script engines, like the one in web kit, minimize the amount of analysis they do of java script source, in order to avoid extra overhead. Something like an optimizing compilation pass is generally too slow to be done online. It would delay page load time considerably.

But, if it could be done off line, operating on cached, frequently used pages, it could improve runtime considerably.

If one were to implement such a system for js, it would make sence to use file hashes as keys to the precompiled code index, and fall back on slower methods for cache misses, until such time as the offline process could compile the code. Small changes (non white space), like the ones in the diffs, would trigger hash changes.

Given such a system, precompiling the benchmark is not cheating. My point is that you are confusing necessary with sufficient conditions, and are making damning conclusions without proper evidence.


Ok, so your hypothesis is that this benchmark is fairly frequently executed, so that it's reasonable to think that a precompiled version is stored somewhere?

In that case, to avoid the accusation of cheating, the choice of precompiled code should have an algorithmic basis : For instance, something akin to Alexa rank of the .js at various CDN. That would make sure that JQuery would be precompiled, which could well be rational.

But I seriously doubt that such an objective method would include this benchmark code in the IE precompiled payload...


If they have the ability to precompiled JS code, they would, of course, precompile the benchmark. Why would you run a benchmark in "slow" mode if you had a fast mode available? There's nothing wrong with precompiling the benchmark.

I'm not saying that's what they are doing, because I don't know. I'm saying that the conclusion of cheating is unfounded.


Could anyone explain what is "dead code analysis"?

Update: I still don't get why "the SunSpider math-cordic benchmark is very fast, presumably due to some sort of dead code analysis.". Didn't the author prove exactly the opposite by showing SunSpider is slower when adding dead code to the benchmark? Sorry for the noob question.


Finding code which is executed (so its not unreachable) but whose results are not used. Its a kind of optimization to speed up program execution by not doing unnecessary work.

http://en.wikipedia.org/wiki/Dead_code

vs

http://en.wikipedia.org/wiki/Unreachable_code


replying to your update:

yes, that's exactly the problem :o)

if you start by assuming that sunspider is very fast because of dead code analysis then adding more dead code shouldn't change anything. but it does. so their deadcode analysis seems less like "real analysis" and more like "this looks like the sunspider test so we know we can ignore this piece of code". and if that is true then they are "cheating" because the result no longer reflects normal behaviour - it is tailored exactly for this test.


It seems the hypothesis is that the benchmark originally goes very fast due to dead-code analysis (the function 'cordicsincos' has been marked as dead code, and is therefore not executed).

That the test goes much slower when more dead code is added (code that in no way 'undeadens' the 'cordicsincos' function, indeed code that does nothing at all) implies that the dead-code analysis being done is either not really dead-code analysis at all but simply looking for this specific function (this would be the 'cheating' hypothesis), or, more charitably, the dead-code analysis could merely be extremely fragile.


Dead code is a segment, that is not reachable during execution or does not add anything to the result. If you find such code, you can spare CPU cycles and get a better benchmark result.


Dead code analysis removes code that can't possibly have an effect. For example:

    if(false){ whatever }


Removing code that does nothing. For example: "true;"





Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: