Elm 0.12.1: Fast immutable arrays

millstone · on May 2, 2014

I think this means to say fast persistent arrays. A fast immutable array is trivial - just use an array!

anorwell · on May 2, 2014

I'm not sure what persistent means in this context, but what is meant by "fast" immutable array here is that the immutable array supports operations like insert and delete (returning new immutable objects) in near-constant time.

ajanuary · on May 2, 2014

From Wikipedia [1]:

    a persistent data structure is a data structure that always preserves the previous version of itself when it is modified

So when you do an insert, you get back a new array with the inserted value, and the old array is (as far as external visibility) untouched.

An immutable array is just an array that never changes. A persistent array allows you to do operations on immutable arrays which returns new immutable arrays that is the old immutable array with the operation performed.

[Edit] As sgk284 pointed out elsewhere in the thread, my last paragraph mischaracterises the situation slightly.

An immutable array with any operations defined on it will necessarily be persistent, so it was misleading to say 'is _just_ an array that never changes'.

[1] http://en.wikipedia.org/wiki/Persistent_data_structure

tree_of_item · on May 2, 2014

Rich Hickey (re)popularized the usage of "persistent" to mean what you just described immutable as. That is, the previous version of the data structure "persists" after you add an element, and now you have two values.

To the people using that definition of persistent, "immutable" just means constant, i.e. no insertion or deletion at all.

sgk284 · on May 2, 2014

That's not quite right. All immutable data structures are persistent, but not all persistent data structures strictly use immutability.

Immutability is a common way of achieving persistence, but fundamentally a persistent data structure simply preserves previous versions. This can be achieved through some mutability via things like fat nodes[1].

It turns out that when you use immutability, many other nice properties often fall out from that decision. You can absolutely insert and delete into an immutable data structure though. You'll simply get back a new structure.

[1] http://en.wikipedia.org/wiki/Persistent_data_structure#Fat_N...

ajanuary · on May 2, 2014

To give some more context, 'persistent data structures' is a standard term in the literature.

The use of efficient persistent maps in Clojure, and the fact Rich Hickey explains that in most of the early Clojure introductory talks, has popularised it outside of academia.

pkhuong · on May 2, 2014

Fast(er) persistent arrays are easier than purely functional ones. See Baker's Shallow Binding Makes Functional Arrays Fast (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.3...). The problem with impure persistence is that it doesn't help concurrent accesses at all.

tehabe · on May 3, 2014

When I read elm I thought it was about an email client. I think that makes me officially very old.

I miss those email clients. They were much nicer than most of the email software which is considered modern today.

moomin · on May 2, 2014

Couldn't they just use Mori?

JeanPierre · on May 2, 2014

Mori doesn't implement RRB-trees, thus doesn't support logarithmic concatenation.

jonase · on May 2, 2014

Maybe mori could include core.rrb-vector[1]. Other interesting persistent data structures (by the same author) are persistent sorted maps and sets[2].

[1] https://github.com/clojure/core.rrb-vector [2] https://github.com/clojure/data.avl

malandrew · on May 3, 2014

Interesting that you ask this since I just created a new thread asking for advice and information on using mori in my JS code:

https://news.ycombinator.com/item?id=7689823

sitkack · on May 3, 2014

What is Mori?

ahh, found it. http://swannodette.github.io/mori/

rurban · on May 2, 2014

Logarithmic is not constant.

The fact that elm announces constant array access will not surprise anybody, as this is an inherent property of arrays. But then seeing that they have to use radix-trees for arrays and announcing it as constant is highly misleading. These guys should switch over to politics.

And this only holds for immutable arrays! So even without push,pop,shift,unshift they have to use trees for arrays.

reitzensteinm · on May 2, 2014

Fully populating a 128 bit memory pool would take enough energy to boil the oceans[1]. This requires a branching factor of 26 to address.

So the absolute worst case scenario - even if we get science fiction computing - is a small, constant number of indirections.

The whole purpose of this type of analysis is to get an intuitive approximation as input size grows. The graphs for log32(n) and 1 will be essentially indistinguishable, and most programmers have a much clearer understanding of constant time versus O( log32( n ) ).

It is not misleading, it is a simplification that is absolutely valid to make.

[1] https://blogs.oracle.com/bonwick/entry/128_bit_storage_are_y...

millstone · on May 2, 2014

Agree with what you wrote. However, the article is misleading when it says that because the operations are "constant time in practice," that they are "fast."

reitzensteinm · on May 2, 2014

Unless the article has been edited, it doesn't say that[1].

What the article calls "constant time in practice" is necessary, but not sufficient.

As you say, it's not correct to say that CTIP means that you're fast - after all, you could make a HTTP request per array access and maintain constant time.

But you can say that taking advantage of CTIP allows you to achieve speed.

In my opinion, it takes quite an uncharitable reading of the article for it to seem like they mean the former.

[1] (The closest section to your quote) "Elm now has fast immutable arrays. How can that be? Is there such a thing? Thanks to Christian Widera, the new Array library uses some very clever data structures that make common operations like get and set constant time in practice! We will get into the details later in this post, but the big takeaway is that you can have immutability and speed. "

millstone · on May 2, 2014

> What the article calls "constant time in practice" is necessary, but not sufficient.

So non-CTIP data structures cannot be fast! There it is again!

This line in particular jumped out at me:

One way to make these trees faster is to increase the “branching factor”.

And we are lead to conclude that the 32-ary tree is faster than the binary tree. But there's necessarily a tradeoff here: a larger branching factor results in fewer node traversals, but more copying and wasted space on insertions and deletions.

So it's not at all obvious that a higher branch factor is faster. After all, in the limit as the branch factor approaches infinity, you recover naive arrays.

Basically I'm begging for benchmarks! It would be very interesting to see the performance effects of adjusting the branch factor.

JeanPierre · on May 2, 2014

Constant time in practise is a notable difference from saying constant time.

I can always sort a 32-bit int array in worst case linear time using counting sort, which in theory is better than the worst case runtime of quicksort, O(n²). Is it better in practise? Of course not, the constant factor is just too high, both for time and memory.

ajanuary · on May 2, 2014

Much as I dislike it, not everyone uses 'array' to mean a collection with constant time indexing. Lots of people use it to mean an indexable collection.

The 'constant' vs. 'almost constant' has been covered by other comments.

pohl · on May 2, 2014

Given the current rate of progress, what year would you predict that web browser applications use large enough arrays where that distinction makes a practical difference, and not just an academic one?

tiles · on May 2, 2014

Why are Elm's arrays so much more complex than JavaScript's O(n) arrays?

wheatBread · on May 2, 2014

It's about mutability vs immutability. I think Zach explains the whole idea of immutable arrays really well: http://www.infoq.com/presentations/julia-vectors-maps-sets

dalke · on May 2, 2014

Elm's arrays are immutable. JS's are mutable.

gliptic · on May 2, 2014

slice/push/get/append can stand in for push/pop/shift/unshift. The immutability of the arrays is considered a feature and Javascript arrays are O(N) for the same operations.