WebAssembly: TinyGo vs. Rust vs. AssemblyScript

chrismandelics · on Nov 27, 2022

Plain JS numeric code _can_ be really fast if you stick to typed arrays.

I tried the posted JS in Chrome on an aging thinkpad and each `testSort()` took as long as 125 seconds (32 seconds in Firefox). But when I replaced `Array` with `Float32Array` the runtime dropped to under 7 seconds (about 2 seconds in Firefox) -- in line with the wasm alternatives. Going further and filling an `Int32Array` with `(Math.random() * 2.0 - 1.0) * 100` (as in the AssemblyScript code) brought the Chrome runtime down to around 1.6 sec (still around 2 seconds in Firefox).

chrismandelics · on Nov 27, 2022

Also worth noting: default JS `Array.sort()` converts everything to strings and compares them lexically -- so 80 comes before 9. It's a little faster to sort them as numbers: `.sort((a, b) => a - b)` takes me around 40 seconds in Chrome (11 seconds in Firefox).

moralestapia · on Nov 28, 2022

Thanks man, this is gold.

ankrgyl · on Nov 27, 2022

The Rust vs. Go comparison has two key differences:

- The Rust example uses 8 bit unsigned ints vs. Go example uses 32 bit signed ints

- Rust's sort is stable by default whereas Go's is not.

If you tweak the Rust benchmark to use `i32` instead of `u8` and `sort_unstable` instead of `sort`, you should see ~3-4x faster performance.

zRedShift · on Nov 27, 2022

Made a PR with the fixes, Rust is now 3 times faster than tinygo, and the wasm is almost 3 times smaller (wasm+js is twice as small) as expected.

https://github.com/Ecostack/wasm-rust-go-asc/pull/1

My first foray into wasm, so I probably missed some optimizations like wasm-opt.

miohtama · on Nov 27, 2022

Also I would assume different languages have different random() implementations which could contribute to the run time. So to make tests equal, you should not measure time to set up the array.

twp · on Nov 27, 2022

The Go version should also use `sort.Ints`. https://pkg.go.dev/sort#Ints

protoduction · on Nov 27, 2022

One can speed up the AssemblyScript implementation by using StaticArray and adding unchecked() around the indexing - this makes it some 25% faster.

Further by using the smallest stub runtime (= without garbage collector) one can drop filesize by around half, making the smallest implementation even smaller (4.7kb -> 2.8kb uminified). Quite some of that code is the random initialization.. if you do that in JS it may just be smaller than the plain JS implementation.

And still it's a flawed comparison, ASC is using a stable sorting algorithm (with a comparator function). It's difficult to compare approaches based on such small benchmarks, what's really being compared is the built-in sorting algorithm and the bundle size overhead of the different languages.

schemescape · on Nov 27, 2022

Seems more like a test of the speed of the random number generator. Never mind that the JS version doesn’t use “slice” to copy the array…

Edit: they’re not even using the same data types in the arrays. Very misleading. I would caution against drawing any conclusions from this article.

Edit again: someone pointed out that the random number generation happens in an outer loop, so probably not a big impact. The point about slice still stands.

zRedShift · on Nov 27, 2022

It's also not measuring rng speed, it's measuring memcpy and sort speed.

schemescape · on Nov 27, 2022

Not sure I follow. Math.random() is in testSort, so the time to generate the random numbers is part of the measurement (even though it almost certainly shouldn’t be).

Edit: my main point is that there are many flaws in this comparison, so I wouldn’t draw any conclusions from the measurements in the article. They’re pretty much meaningless.

zRedShift · on Nov 27, 2022

I agree, but what I'm saying is, it only generates the numbers once and then sorts them 500 times. Yes it's a flawed measurement because it measures the generation time + 500 sorts, but the time to generate the numbers is probably minuscule compared to the sorting.

There are many more flaws, as you say, the biggest flaw is the stable vs unstable sort comparison, but it looks like the article author (not OP) has fixed it half an hour ago and updated the article.

schemescape · on Nov 27, 2022

You’re right, it uses the same array 500 times and then runs an outer loop (with a new array each time) 5 times.

usrusr · on Nov 27, 2022

Next step in result assessment would be a few less trivial examples and comparing how file sizes go up between languages: is the bulk of the observed "Rust tax" a one-time investment that won't grow for less trivial code or will the size keep growing in almost linear way, keeping the file size ratios roughly the same?

goliatone · on Nov 27, 2022

OT but I’ve been working with WASM and TinyGo and hit an unexpected roadblock: TinyGo lacks JSON support (because the native go implementation uses reflection).

That meant that I couldn’t find a way to serialize structs between host and WASM compiled functions/modules.

The initial implementation of what I wanted to do took me half an hour. Trying and testing serialization libraries (JSON, protobuf, msgpack) took me four days and couldn’t find one that worked (working on a apple M1 made things more complicated). The one left to try is karmem but I just had to move on and be productive.

I used goja instead to get unstuck but now I want to find a working solution :)

foldr · on Dec 4, 2022

https://pkg.go.dev/github.com/CosmWasm/tinyjson#section-read...

Looks like it would be a reasonable solution?

zRedShift · on Nov 27, 2022

Your go uses `pdqsort` to sort 4 byte ints from 0 to 100, while rust uses a stable sort (`sort_unstable` is equivalent to `pdqsort`) on single byte integers from 0 to 255. Hardly a fair comparison.

qayxc · on Nov 27, 2022

Oh my, where to start... This little article shows everything wrong with software devs these days :) First of all, the implementations aren't doing the same thing, which is bad for a proper comparison; but others pointed that out already.

What bugs me the most, though, is the atrocious JavaScript implementation. On my aging machine, none the of the browsers I tried it on (Edge, Opera, Chrome, and Firefox) - not a single one! - even ran the original version. All of them refused to run the script and timed out.

I did a very, very simple change and just used the proper tool for the job - TypedArrays:

  function testSort() {
    const length = 100_000
    const arr = new Uint8Array(length)
    for (let i = 0; i < arr.length; i++) {
        arr[i] = Math.random() * 255 // the original code produced float values here
    }
    const temp = new Uint8Array(length)
    for (let i = 0; i < 500; i++) {
     temp.set(arr, 0) // shorter than a manual for-loop and about 6+% faster
        temp.sort()
    }
  }

And wouldn't you know it, Firefox now runs this function. I did my testing on https://jsbench.me, put the function in the setup section and the actual call in test cases.

I added another function, this time using an Int32Array instead and random values between -1_000 and +1_000.

Just to be thorough and because I found it interesting, I also added a Float32Array version this time using the result from Math.random() without any scaling.

This leads to an interesting observation: there seems to be a huge difference in optimisation between Firefox and Chromium-based browsers with TypedArrays:

       Test         | Performance                     | Browser
   -----------------+---------------------------------+-----------------
   testSort_uint8   | 0.75 ops/s±1.8% (~1.33s/call)   |  Edge
   testSort_uint8   | 29.33 ops/s±2.3% (~34ms/call)   |  Firefox
   -----------------+---------------------------------+-----------------
   testSort_int32   | 0.64 ops/s±1% (~1.56s/call)     |  Edge
   testSort_int32   | 1.44 ops/s±0.2% (~694ms/call)   |  Firefox
   -----------------+---------------------------------+-----------------
   testSort_f32     | 0.16 ops/s±0.6% (~(~6.25s/call) |  Edge
   testSort_f32     | 1.49 ops/s±0.3% (~671ms/call)   |  Firefox

In Firefox, the results where *really* interesting. The dramatic difference between uint8 and int32 in particular is quite astonishing. Unlike the author, I fund zero difference between Edge, Opera, and Chrome in terms of performance, which doesn't surprise me as they all use the same engine.

While Firefox's SpiderMonkey engine is faster in all scenarios, the uint8-version is faster by an order of magnitude, whereas V8 shows comparatively little difference (~15%) in the integer case and absolutely tanks performance with float values.

This last result in particular might explain, why the original version was both unable to start on my machine and also slow in the author's browsers.

I hope you found this as interesting as I did and maybe even find a place to apply these findings in your projects.

Cheers.

aatd86 · on Nov 27, 2022

I love Go but I must confess I was a bit skeptical when I read the article: 1. In my own experience, firefox is usually faster than chrome at executing wasm.

2. I'm not a rust programmer but the performance of the rust implementation a bit surprising compared to tinyGo.

Still, tinyGo is very impressive. Very glad we have this project.

postalrat · on Nov 27, 2022

Was the test here on the author's understanding of data types and sort algorithms?

stevefan1999 · on Nov 28, 2022

Why not add .net to the list?

https://github.com/SteveSandersonMS/dotnet-wasi-sdk

techn00 · on Nov 27, 2022

Does this still hold up if you don't use fixed size arrays?

ungawatkt · on Nov 27, 2022

a little off topic, but something I've been wondering:

With my small learning project using tinygo, disabling the GC in tinygo sped things up a good amount (unsurprisingly), and since the wasm part is in a webworker it seemed safe enough (the webworker terminating cleared the allocations as far as I could tell). It seems to work nicely for the project, basically "apply an image transform", but would this pattern be good in a more serious code base?

avinassh · on Nov 28, 2022

What makes rust version into be fastest?