Ding, ding, ding and to be fair this is the problem in both methodologies. The o...

csl · on June 23, 2016

You don't need to go to such complicated lengths. Just perform enough tests (as in, a statistically large enough amount) and a distribution will form. That also captures the variability of real world network effects.

breakingcups · on June 24, 2016

What about different ads served to different browsers? Someone running, say, Opera will have a different ad profile than a Chrome user even when completely blank cookie-wise.

drdeca · on June 24, 2016

Maybe have them give the same user agent?

patcheudor · on June 23, 2016

It's hardly complicated. I've put such tests together in an afternoon. In fact, whatever is added in complexity is gained by the fact fewer tests are necessary. Via this mechanism you can also remove any questions about compression, use of HTTP/2, etc., which could impact the tests based on server-side choices when it comes to serving data to either platform. Equal always equals better.

dogma1138 · on June 23, 2016

But those metrics are important, if servers serve more optimized pages to Edge users for some reason that a freaking important fact to know. This is about real world data and real experiences and how it affects actual real users. You can normalize the tests to the point where there is absolutely zero difference between the browsers, of that I'm sure, but that will not reflect any actual cases that real users experience.

anc84 · on June 24, 2016

That would be a whole different test then. Not about efficiency of the browser itself but about what the browser gets served.

dogma1138 · on June 24, 2016

The test was which browser gives you the better battery life while browsing the internet like a normal user.

Why is a different question and it's not that relevant, I often don't care why something I USE works better I just care that it does.

If it something I BUILD then I would care much more but again this is a whole different issue.

belorn · on June 24, 2016

Within those specifications, the objection about the ad-block become irrelevant. If the browser justs works better, then users don't care and can simply enjoy more battery time.

The case for more normalized tests is to find out which browser is factually better designed/written.

jcoffland · on June 24, 2016

But this is not repeatable.

dogma1138 · on June 23, 2016

Ads are not that much of a problem ads will even themselves out and if for some reason MSFT Edge users receive less ads or ads that are less resource intensive it's also an important metric.

I don't see anything that would somehow create a bias in favor of a specific browser as far as ad networks goes, if anything the stigma/stereotyping of IE/Edge users would probably mean that ad networks are more incentivized of sending the baity apps towards those browsers.

As for the network part well again that's an important metric if certain browsers perform better at adverse network conditions it's an important factor to know, you do not want to give them the best case scenario every time.

Giving a page a fixed amount of seconds to load is also completely the wrong approach you want to see how browsers behave when they can't load a page properly or when it takes more time than usual, maybe some browsers expend more resources by resubmitting the entire request, maybe some browsers do not parse the DOM tree from scratch when some of the requests stall, maybe some browsers have less resource intensive placeholders for DOM elements, maybe some browsers are better at adjusting the DOM preprocessor for network congestion than others.

So no I can't really see how would your approach would be any better, the approach that MSFT took was quite good, netflix, wikipedia, youtube, facebook etc. with what seems to be realistic user behaviour. What you want to do is to put in test that would produce fair results for fairness sakes that's not how you evaluate anything because it would not yield you any real world data.

NetStrikeForce · on June 24, 2016

That would be as far from real world as you can get.

What you need here is a big enough sample.