Hacker News new | past | comments | ask | show | jobs | submit login
Understanding Clojure's Persistent Vectors (2013) (hypirion.com)
64 points by sendilkumarn on Dec 9, 2021 | hide | past | favorite | 14 comments



Persistent data structures are, in my opinion, underrated. Not so much for every day programming tasks, but specifically for code that resembles planning/searching.

Here is one library I've heard of https://immutable-js.com/ . I don't know of others.


I'm wondering if there is some inherit flaw in coding some data structures in Node/JS. Maybe this is just another case of the "Node tax"? Things like boxing, floating point number arithmetic, manual pop counts, array copying with bounds checks, etc would all go quite a far way to making for example the Map quite under performant by quite a significant factor especially with large collection sizes and hot loops. Have no idea what the Node runtime does/optimises for example the software version of PopCount encoded in the Immutable.js.

Having written persistent data structures in other lang's that are quite capable (170ns lookup 10,000,000 elements approx to give a rough guide) - yes it is an order slower than unordered hash mutable structures but often still faster than for example the mutable ordered ones I've tested with the advantage of still being immutable. This does benefit some scenarios where you want data sharing/cheap clones of data.


My experience with ImmutableJS was to remove it and see performance go up by a factor of 100-200.

https://github.com/mschaef/react-matchstick/commit/070802b69...

This was something of a worst case scenario (and it was five years ago, so presumably things have gotten better) but it still underscores the need to carefully consider these tools before adopting them. Do they really offer enough (to your application) to be worth the associated costs in readability, performance, etc.

If the goal is to prevent mutation, maybe there are better ways to do that. If the goal is to really accelerate, it's worth testing. (At the very least, Clojure's vectors seem unlikely to provide a benefit if you're working with vectors of len<32.)


That particular example seems to be a horrible misapplication of an ImmutableMap. That's not a map, that's just a record of values. (Well the 'board' member is a collection, but you get what I'm saying.)

I'm not surprised you saw huge speed up... I mean the original is looking up record fields by their name at runtime. That's bound to be extremely slow because it's basically impenetrable to the JIT optimizer.

In this case doing it the "immutable way" would be to create a data object, freeze it, instead only creating copies (themselves frozen) with individual fields changes as appropriate.

I do wonder how it would have performed if you'd removed the outer 'sx', 'sy', 'board' map layer and kept the 'board' as an ImmutableList (or whatever).


Immutable JS is from Facebook.

Mori is the 'original' Clojure data-structures in Java Script.

https://swannodette.github.io/mori/


Clojure is a language that really puts the "high" in high level. It is quite difficult to trace a Clojure expression back to what will happen on the CPU - and the innovations on things like vector storage are part of that.

It was a bold decision with the potential to cause pain, but Clojure's vectors are great fun to work with. The "novel" basic data structures get out of the way and generally cause more joy than pain. It is part of a fundamental strategy enabling a strongly immutable style which really pays off when it works in concert with the rest of the language.


I just spent the last few days implementing a better version of an "persistent list" data structure (heavily modelled on Clojure's vector) for a new programming language that I'm working on.

I did a quick survey of existing implementations in multiple languages and found all of them lacking. They are either overly complex, slow, or both. Even Clojure's vector, while being simple and very performant, is only usable as a stack, not as a queue, and therefore IMO inadequate as a "generic random-access array-like data structure" (akin to Python's list, i.e. "just use it don't worry about performance").

My version is about as fast as Clojure's vector, while implementing a "deque"-like interface. It's a bit more complex, but still significantly simpler than Scala's vectors (both Scala 2 and Scala 3). Cyclops (a Java-only persistent collection library) is so slow I didn't even bother finishing the benchmarks. I also compared my code to Rust's `im` (way more complex), C++'s `immer` (stack, not deque) and `immutable.js` (slower than ClojureScript).

I'll clean up the code and post it here.


voila.. took a bit longer, the code was "too slow"... turns out I was using the wrong branching factor!

https://github.com/tomprimozic/vector


I didn't dive too far into your repo, but it looks like you're doing something similar to Finger Trees [0].

Finger Trees as described in [1] are configurable to be several different kinds of data structures, but using them as a simple list gives you:

  amortized O(1) insert, remove, update at head and tail
  O(log n) insert, remove, update, worst case
  O(log n) concatenation of two lists
And you can traverse all N elements forward or backwards in O(1) per element.

I think it's a very nice data structure.

[0] https://en.wikipedia.org/wiki/Finger_tree

[1] https://www.staff.city.ac.uk/~ross/papers/FingerTree.pdf


I just tested the Fingertree implementation from org.functionaljava

Conclusion: unusuably slow (which kind-of makes sense, since it looks like the common implementation is a 2-3 fingertree - compared to a branching factor of 32 for Vector).


There are lots of implementation details that can make some version slower or faster.

In my own, I use 2..5 branching because a sequence of appends on the front or back will tend to create a tree with 3 pointers per node (when you're about to hit 6, you split. If you get to 1, merge with a neighbor, etc...). Three is close to e, which is optimal in some sense for some operations.

I'm not a Java guy. For my use, having 32 way branching is pretty expensive because I use reference counting for each of the nodes. Since you have a true garbage collector, you don't have that cost, and it wouldn't be difficult to make a 2-32 way implementation.

Anyways, it sounds like you're not impressed and not interested. That's fine :-)


There are several libraries that implement variations of deque for Clojure, but Clojure also allows transparent use of Java, so when necessary you can just use the Java deque, which I think is highly optimized.


Java's Deque is mutable (so, naturally, it will be much more performant). Also, I tried adding it to the benchmark but turns out that it doesn't support random access (for no good reason).


Great series of articles. I used this to write a persistent vector library in Nim: https://github.com/PMunch/nim-persistent-vector




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: