Hacker News new | past | comments | ask | show | jobs | submit login

Wow all these numbers seem insanely bad. 500 milliseconds to transpose 16 million pixels (so 64mil bytes)? A modern CPU should able to do that at least 10x faster, if not 100x.



They are bad but not way off for that basic for loop, depending on which rotation is being applied.

Using their code on my Intel-based workstation at around 3ghz using GCC 7.3 it takes around 80-100ms to rotate a 4096x4096 buffer 90 or 270, and 14ms to rotate 180.

Max memory bandwidth of something like an i9-9900k is 41.2GB/s. This test reads & writes 128mib of data. So max theoretical achievable performance here is around 3-4ms. Max theoretical. So 100x is not really feasible. 10x, though, very much is, as the quick convert shows a peak time of 14ms with a 180* rotation.

Of course the major source of slowness here is that the reads/writes are not sequential, and the 90 & 270 rotations are achieving a fraction of the possible bandwidth they could as the input reads are jumping around, so every single one is a cache miss and the other 60 bytes in each cache line on the miss will be purged before it's used again.

Flipping it would mean the writes are never utilizing a full cache line, either, though. So you can't really "fix" that, not easily at least. So either your read or write bandwidth ends up tanking and you can only achieve roughly 6% of the max (only ever using 4 bytes of the 64-byte cache line) for that half of the problem. Without some clever magic to handle this your max theoretical on a 41.2GB/s CPU drops to around 50ms.

All that said it's clear that WASM is very far off from native levels of performance. ~5x slower isn't something to brag about. But hey maybe the test system was a potato, and the 500ms isn't as bad as it sounds.


You are correct. The code is using an inefficient cache access pattern, so most of the time is spent waiting.

You probably won't get 100x faster without SIMD, but 10x is certainly doable. Unfortunately, SIMD.js support has been removed from Chrome and Firefox a while ago, even though it is not available in wasm to this day.


How would SIMD do anything to address the problem's fundamental anti-cache-friendly access patterns? You'd need to restructure the problem to be cache-friendly, but SIMD won't really be relevant to that.


You can use both at once. Usually, you'd have something like 64x64 tiles in cache and use 4x4 or 8x8 tiles for SIMD.


Or better yet, WebGL should be able to do this in a few ms on a GPU.


Or simply use the canvas api, which has super optimized graphics libraries behind it - rather than reimplementing the wheel :)

But I get that really this was a how much can wasm help performance as % vs js - you could always write an “optimized” routine and compare those and theoretically achieve something similar.


The article mentions why they couldn't use canvas for this: they are running this code in a worker, and canvas support in workers is not great in browsers so far.


Not only that, there's a nasty bug in Chrome that makes it unusable for our use-case https://bugs.chromium.org/p/chromium/issues/detail?id=906619


Ah my bad for skimming - I though most canvas stuff worked these days? (I recall many years ago when I worked on such things that fonts were the biggest problem, but also people generally wanting to be able to paint dom elements in their as well)


It is OffScreenCanvas, the variant that works in web workers, that has poor browser support.


In my experience, the canvas api is very slow and not well thought-out. For example, to create a native image object from raw pixels, you have to copy the pixels into an ImageData object, draw it to a canvas, create a data URL from the canvas and then load an image from that data URL.




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: