Hacker News new | past | comments | ask | show | jobs | submit login

Your 0.18 sec result is (to use the units they used in the article) 180ms, and if I understand correctly their best webassembly compiled and executed result (?) is 300ms. Beautiful.

EDIT: But it could also be that your computer is somewhat faster than theirs? Do you happen to have some very fast CPU? Can you say which? When I run C-like C++ versions of your code I get the speeds you get with node.js. However, you made overall much better results than they were able, it's still great work!

    #include <stdio.h>
    int main(int argc, char* argv[]) {
        enum { height = 4096, width = 4096 };
        unsigned* a = new unsigned[ height*width ];
        unsigned* b = new unsigned[ height*width ];
        if ( argc < 2 ) { // call with no params
            // to measure overhead when just allocations
            // and no calculations are done
            printf( "%d %d\n", (int)a, (int)b );
            return 1;
        }
        if ( argv[1][0] == '1' ) // call with 1 the fastest
        for (unsigned y0 = 0; y0 < height; y0 += 64)
            for (unsigned x0 = 0; x0 < width; x0 += 64)
                for (unsigned y = y0; y < y0 + 64; y++)
                    for (unsigned x = x0; x < x0 + 64; x++)
                        b[x + y*width] = a[y + (width - 1 - x)*height];
        else
        for (unsigned y = 0; y < height; y++)
            for (unsigned x = 0; x < width; x++)
                b[x + y*width] = a[y + (width - 1 - x)*height];

        return 0;
    }



I think its fast because of the L1 cache or something like that. I dont understand fully but this is what i got


The fastest version is the fastest because it's the most cache-friendly one of all which were presented. See e.g.

https://stackoverflow.com/questions/5200338/a-cache-efficien...

But note that robko made an improvement even before making that.


> made an improvement even before

Or maybe not: my short experiments with the simplified version based on their algorithm and his JavaScript versions gave some conflicting results. I haven't thoroughly verified them, this note is just to motivate the others to try.


I get 60ms in C. But in your code, the compiler might decide to remove most of the code since b is not used after being calculated. I checked the assembly code and it does not seem to be the case here, but it's still something to be aware of.


> I get 60ms in C

OK, I get cca 80ms for my run with the parameter 1 on my main computer, and 200ms on N3150 Celeron.

> b is not used after being calculated

Earlier, I've never seen that any C compiler optimizes away the call to the allocator and the access to the so allocated arrays. Maybe it's different now? Hm, dead code elimination... I guess a random init of the few values before and read and print of a few values after the loop must be always safe... Now that I think, also filling the array with zeroes before.




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: