Your 0.18 sec result is (to use the units they used in the article) 180ms, and i...

vijaybritto · on Feb 15, 2019

I think its fast because of the L1 cache or something like that. I dont understand fully but this is what i got

acqq · on Feb 15, 2019

The fastest version is the fastest because it's the most cache-friendly one of all which were presented. See e.g.

https://stackoverflow.com/questions/5200338/a-cache-efficien...

But note that robko made an improvement even before making that.

acqq · on Feb 16, 2019

> made an improvement even before

Or maybe not: my short experiments with the simplified version based on their algorithm and his JavaScript versions gave some conflicting results. I haven't thoroughly verified them, this note is just to motivate the others to try.

robko · on Feb 15, 2019

I get 60ms in C. But in your code, the compiler might decide to remove most of the code since b is not used after being calculated. I checked the assembly code and it does not seem to be the case here, but it's still something to be aware of.

acqq · on Feb 15, 2019

> I get 60ms in C

OK, I get cca 80ms for my run with the parameter 1 on my main computer, and 200ms on N3150 Celeron.

> b is not used after being calculated

Earlier, I've never seen that any C compiler optimizes away the call to the allocator and the access to the so allocated arrays. Maybe it's different now? Hm, dead code elimination... I guess a random init of the few values before and read and print of a few values after the loop must be always safe... Now that I think, also filling the array with zeroes before.