Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Simulating 20M Particles in JavaScript (dgerrells.com)
561 points by rustystump 64 days ago | hide | past | favorite | 70 comments
Had some fun with shared array buffers over many months of free time.

Skip to the end to play around with the final app.

Open to ideas on how to simulate more whilst staying in js land.




Nice! I'd suggest embedding the simulation in the blog. I had to scroll up and down for a while before finding a link to the actual simulation.

(You might want to pick a value that runs reasonably well on old phones, or have it adjust based on frame rate. Alternatively just put a some links at the top of the article.)

See https://ciechanow.ski/ (very popular on this website) for a world-class example of just how cool it is to embed simulations right in the article.

(Obligatory: back in my day, every website used to embed cool interactive stuff!)

--

Also, I think you can run a particle sim on GPU without WebGPU.

e.g. https://news.ycombinator.com/item?id=19963640


That's one of the best examples of an explanatory blog that I've ever seen. I wish that this would become the standard of which information was shared - if it's worth sharing, it's worth making it easy to understand.


I have done a few blog posts with interactive doodads like this. It takes a lot (like really a _lot_) more time to do, but I think it's the right way to go. There is so much noise on the internet caused by people casting their 2 cents into the void.

Interactive thingywotsits may slow down individuals making posts, but there are a lot of individuals out there.


Not being a frontend dev, I have no idea how to even start making something like that.

Are there some frameworks that make interactive simulations like that easier to make, or do you just do it the hard way?


The "hard" way is often the simple way with these sort of things. What makes it easier is while building out your code, you make little pieces of UI to visualize what you're doing. Think of them like unit tests or test driven development. Then you can take those, clean them up a little and publish them.


p5.js is a great medium. I did a short series in this style - you can inspect it to see full source (non minified / obfuscated) with some comments here and there.

https://jason.today/falling-sand


This is an amazing series. Love the style and incremental examples.


Wow thank you!


Excellent, that does look quite approachable indeed. Thanks!


wow, that fluid sim is astounding


I do agree about embedding. I thought about embedding each version but was worried about having too many workers all going at once. I'll update the article to include the final version embedded at the end. Thanks for the feedback.

That blog is amazing. Each example is so polished. I love it.

edit: I tried adding an embedded version but the required headers didn't play well with other embeds. The older versions are all still stuck in codesandboxes.


An intersection observer can be used to turn the non visible parts off when out of view. https://developer.mozilla.org/en-US/docs/Web/API/Intersectio...


"Skip to the end to play around with the final app."


Demo on mobile [0], pretty incredible to play with.

[0] https://dgerrells.com/sabby


Woah, it works with multiple fingers! This is wild for pure JS. Interestingly, more fingers means more lag, I guess more stuff being sent between threads.


Wow, looks strangely organic, like lipid structures in primordial ooze.


on my phone firefox outperforms chrome! that's satisfying


Random question (genuine, I do not know if it's possible):

> I decided to have each particle be represented by 4 numbers an x, y, dx, and dy. These will each be 32-bit floating point numbers.

Would it be possible to encode this data into a single JS number (53-bit number, given that MAX_SAFE_INTEGER is 2^53 - 1 = 9,007,199,254,740,991). Or -3.4e38 to 3.4e38, which is the range of the Float32Array used in the blog.

For example, I understand for the screen position you might have a 1000x1000 canvas, which can be represented with 0-1,000,000 numbers. Even if we add 10 sub-pixel divisions, that's still 100,000,000, which still fits very comfortably within JS.

Similar for speed (dx, dy), I see you are doing "(Math.random()*2-1)*10" for calculating the speed, which should go from -10,+10 with arbitrary decimal accuracy, but I wonder if limiting it to 1 decimal would be enough for the simulation, which would be [-10.0, +10.0] and can also be converted to the -100,+100 range in integers. Which needs 10,000 numbers to represent all of the possible values.

If you put both of those together, that gives 10,000 * 100,000,000 = 1,000,000,000,000 (1T) numbers needed to represent the particles, which still fits within JS' MAX_SAFE_INTEGER. So it seems you might be able to fit all of the data for a single particle within a single MAX_SAFE_INTEGER or a single Float32Array element? Then you don't need the stride and can be a lot more sure about data consistency.

It might be that the encoding/decoding of the data into a single number is slower than the savings in memory and so it's totally not worth it though, which I don't know.


You may be interested in bitfields and JS optimised vector operations, the @thi.ng universe has some useful implementations (also a WASM simd one):

[0]: https://github.com/thi-ng/umbrella/tree/develop/packages/vec...

[1]: https://github.com/thi-ng/umbrella/tree/develop/packages/vec...

[2]: https://github.com/thi-ng/umbrella/tree/develop/packages/bit...

[3]: https://github.com/thi-ng/umbrella/tree/develop/packages/sim...

There's also Structura, which implements a few performant data structures:

[4]: https://github.com/zandaqo/structurae/blob/master/README.md#...


I also did some experimenting with number packing and ended up creating a QuickSet implementation[0]. However, it turned out that operating on TypedArrays proved more performant, which I settled on in the end. I've collected some related packages here:

[0]: https://github.com/dleeftink/QuickSet

[1]: https://github.com/dleeftink/QuickSet?tab=readme-ov-file#see...

Of note is FastIntSet, which uses the technique you described, but I think is only able to store 4 unsigned integers as one JS value (I might be wrong).

[2]: https://github.com/TheLucifurry/fast-int-set/blob/main/src/s...

This notebook demonstrate a MVP bitpacking technique using BigInts (see the 'bitpack' and 'pack' cells):

[3]: https://observablehq.com/@dleeftink/array-bitbuffers

SimSIMD offers various similarity measures and quantisation levels for TypedArrays:

[4]: https://github.com/ashvardanian/simsimd#using-simsimd-in-jav...


As you stated, Encoding/decoding would kill your performance.

Float16Array would immediately halve your memory requirements

Another possibility would be to have separate precision arrays.

eg. Float16Array for x,y and even Int8Array for dx/dy, but, in both cases you will would get some motion artifacts, especially for Int8 from the clamping and aliasing of dx/dy.


This is really awesome!

I did have a question about this:

> Javascript does support an Atomics API but it uses promises which are gross. Eww sick.

With the exception of waitAsync[1], the Atomics APIs don't appear to use promises. I've used Atomics before and never needed to mess with any async/promise code. Is it using promises behind the scenes or is there something else I'm missing?

[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

Edit: formatting



another tutorial that make it easier to grasp what that is: https://www.youtube.com/watch?v=R81egpgDzbY


The videos look awesome but the "try it out here" codesandbox links don't work for me on MacOS Chrome desktop. I get 'Uncaught ReferenceError: SharedArrayBuffer is not defined' and some CORS errors: 'ERR_BLOCKED_BY_RESPONSE.NotSameOriginAfterDefaultedToSameOriginByCoep'.


You have to open the previews in a dedicated tab as codesandbox's inline editor blocks the header from being set. It also may get blocked if you are using a privacy focused browser.

I'll try to include embedded examples in the future.

You can try out the final version here https://dgerrells.com/sabby


Whoa this is like fine sand. Amazing!


Marvelous. I spent an hour to understand the code and play with it. Here is a live implementation: https://particules.kapochamo.com/index.html


When I first opened this, I was stunned! It is really pretty, and I can't believe there are so many simulated particles in JS!


Awesome! I am impressed you dug that deep into the code.


thanks for that, i was missing out with the desktop!


You might check if chrome://tracing helps give more insights: I came across it here

https://youtu.be/easvMCCBFkQ?t=114


Oh, man, can't wait to send it to the UI team who write dead slow React apps. JS is blazing fast. Especially if written well.


The problem is that idiomatic JS and blazing fast JS are diametrically opposed to each other, in practice the latter is more like a bad C dialect. You're not allowed to allocate GC objects in fast JS but the language doesn't have good non-allocating alternatives. Nobody is actually going to make a complex JS app where all memory allocations are pointers into a giant ArrayBuffer, it's easier to just switch to WebAssembly at that point.


If JS had typed structs (like they have type arrays) it would definitely be more convenient.

However, that's not where the problem starts. A lot of web sites are slow because they simply run too much code that doesn't need running in the first place and allocates objects that don't need to be allocated.

We don't need lower level constructs if we can simply start by removing cruft and be more wary of adding it. Go back to KISS/YAGNI.


JavaScript is probably the language who has seen the most human-hours spent on optimizations for the various engines.

Too bad we cant just rely on JS only and have to involve a bunch of DOM operations, which is usually the slow part of the UIs we create.


"Too bad we cant just rely on JS only and have to involve a bunch of DOM operations, which is usually the slow part of the UIs we create"

No? With WebGL and soon WebGPU, or in this case here with writing to a imagebuffer and just passing that to canvas, you don't have to use the DOM anymore since quite a while.

(but then you don't get all the nice things html offers, like displaying and styling text etc)


+ built in accessibility + extensions who does something with the DOM + ...

In reality, you're right, there are alternatives, but for the basic web documents, it kind of hurts more than help to use them.


Seriously impressive engineering OP, thanks for the awesome writeup too. Looks like you've got a ton of fans now, well earned!


Great article and very relevant for me since I'm building a game in JavaScript based on "falling sand" physics, which is all about simulating massive amount of particles (think Noita meets Factorio - feel free to wishlist if you think it sounds interesting).

My custom engine is built on a very similar solution using SharedArrayBuffers but there are still many things in this article that I'm eager to try, so thanks!


Such a clever fellow.

How does one get this good with understanding hardware level details like L1 caches and the software implications?

I graduated as an Electrical Engineer, moved into Software for career. Feel like I’m missing some skills.

Specifically how can I better understand and use:

- the chrome Profiler? It’s scary to me currently. - Graphics programming - Optimisiations?


About caches, the main important thing is to know they exist. Which you do know now :) The general idea of cache is exactly how he explains it in the article, and is useful to know about as a general concept. Note that the very hardware specific bit of info that the M1 chip has a "chungus big" cache is not mentioned until very late in the article, which I didn't know yet either.

I'm not super skilled at the chrome profiler either, it seems to be suited better for certain tasks than others, but I might just be doing it wrong ...


Might want a strobe warning. At least for Firefox and Chromium in Linux on a desktop it strobes heavily in the starting state.


It depends on the display type. When run on something with low per pixel lighting it can flicker a bit due to how quickly the average light changes frame to frame. Anything with local dim zones may struggle. I looked at ways to fix this but could not come up anything other than running a blur filter which ends up looking terrible.


"When run on something with low per pixel lighting it can flicker a bit due to how quickly the average light changes frame to frame"

Not sure I understand. The flicker is not due to sometimes the screen drawn with white(like I assumed) and just because of my mobiles light settings?

Other simulations similar to this, don't have this flicker on my devive.

(still impressive work, genuine question to avoid this effect in my experiements)

And no matter the technical reasons, for some people this might be a serious health issue, so a warming might make sense in the current state.


Is the code available somewhere? I'd like to see the full code and run locally. It looks like the code sandbox isn't working anymore.


Nice article.

I have done a somewhat similar experiment a while ago and achieved to fit quite a lot of particles with a basic physics simulation.

https://github.com/Seb-C/gravity


Love this. Enjoyed riding your train of thought from challenge conception through each performance pass to the final form. Surprisingly fun to play around with this sim too. Looking forward to more posts!


Very cool, thank you for sharing.

Has anyone done similar experimentation and/or benchmarking on using webgpu for neural nets in JS?


Inspiring tutorial!

Does anyone know why/how it maintains state if you tab out? Does Chrome eventually try to clean up the cache or is it locked in?


Usually inactive tabs are just paused and their state saved.


requestAnimationFrame won't fire while you're tabbed out.


I've saved it to Web Archive just in case, sadly it doesn't work that way.


Anyone else having trouble with that web vscode he's using?


Yeah, no idea how to run the code. There are links to the final demo at the end, but everything else just links to this editor :/


This was prototyped on codesandbox before they nuked their product. Each link goes to a specific version which you can test by running bun http.ts in the terminal which serves the content. I updated the article to include this info.

In the future I will keep everything self hosted to avoid this issue. I appreciate the patience.


if you want a self-hosted codepen there's this

https://github.com/google/playground-elements

not sure if, based on the way it works, whether it supports SharedArrayBuffer


For security reasons you cannot use some of the features in this code without setting a specific header (the blog mentions this).

The sandbox has a button that's basically "Open a preview in a separate tab". If you click that, the header will be sent, and the demo will work.

If you only use the "in-editor" preview, the proper header will not be sent.

Agree not intuitive. Hope it helps, it was a super cool demo.


so when do we get WebBLAS and WebFORTRAN?

kinda joking, kinda not.


Actively in-progress, actually. [0] Since about 2016.

[0] https://gws.phd/posts/fortran_wasm/


there's also numpy and scipy in the webassembly python distro (pyodide). but the "kinda not" part more refers to first class scientific/numerical computing support. it's possible, but the libraries are all disjoint or are webassembly ports, etc.


Pyodide uses f2c for that, as mentioned in the link, but it isn't great, and barely works. You won't get the expected speed out of BLAS that way.

Which is why the flang port it's about is attempting to compile to the actual primitives.


i wonder if simd is working. that would be cool.


If I understand the build process correctly... It should be on systems support WebAssemblySIMD, like Chrome's V8.


yes, but does it actually work end to end and actually deliver meaningful speedups that make it actually useful?


super cool! i'm thinking webgpu might be usable for a speedup, not sure if webgl would be


A WebGL transform feedback shader would be 100% as performant as what you could write in WebGPU for this use case (independent particle updates).


thanks! is there a minimal example you'd recommend looking at?


This is great work


super helpful!!! thanks for this!!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: