Hacker News new | past | comments | ask | show | jobs | submit login
Elm 0.12.1: Fast immutable arrays (elm-lang.org)
88 points by yiransheng on May 2, 2014 | hide | past | favorite | 25 comments



I think this means to say fast persistent arrays. A fast immutable array is trivial - just use an array!


I'm not sure what persistent means in this context, but what is meant by "fast" immutable array here is that the immutable array supports operations like insert and delete (returning new immutable objects) in near-constant time.


From Wikipedia [1]:

    a persistent data structure is a data structure that always preserves the previous version of itself when it is modified
So when you do an insert, you get back a new array with the inserted value, and the old array is (as far as external visibility) untouched.

An immutable array is just an array that never changes. A persistent array allows you to do operations on immutable arrays which returns new immutable arrays that is the old immutable array with the operation performed.

[Edit] As sgk284 pointed out elsewhere in the thread, my last paragraph mischaracterises the situation slightly.

An immutable array with any operations defined on it will necessarily be persistent, so it was misleading to say 'is _just_ an array that never changes'.

[1] http://en.wikipedia.org/wiki/Persistent_data_structure


Rich Hickey (re)popularized the usage of "persistent" to mean what you just described immutable as. That is, the previous version of the data structure "persists" after you add an element, and now you have two values.

To the people using that definition of persistent, "immutable" just means constant, i.e. no insertion or deletion at all.


That's not quite right. All immutable data structures are persistent, but not all persistent data structures strictly use immutability.

Immutability is a common way of achieving persistence, but fundamentally a persistent data structure simply preserves previous versions. This can be achieved through some mutability via things like fat nodes[1].

It turns out that when you use immutability, many other nice properties often fall out from that decision. You can absolutely insert and delete into an immutable data structure though. You'll simply get back a new structure.

[1] http://en.wikipedia.org/wiki/Persistent_data_structure#Fat_N...


To give some more context, 'persistent data structures' is a standard term in the literature.

The use of efficient persistent maps in Clojure, and the fact Rich Hickey explains that in most of the early Clojure introductory talks, has popularised it outside of academia.


Fast(er) persistent arrays are easier than purely functional ones. See Baker's Shallow Binding Makes Functional Arrays Fast (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.3...). The problem with impure persistence is that it doesn't help concurrent accesses at all.


When I read elm I thought it was about an email client. I think that makes me officially very old.

I miss those email clients. They were much nicer than most of the email software which is considered modern today.


Couldn't they just use Mori?


Mori doesn't implement RRB-trees, thus doesn't support logarithmic concatenation.


Maybe mori could include core.rrb-vector[1]. Other interesting persistent data structures (by the same author) are persistent sorted maps and sets[2].

[1] https://github.com/clojure/core.rrb-vector [2] https://github.com/clojure/data.avl


Interesting that you ask this since I just created a new thread asking for advice and information on using mori in my JS code:

https://news.ycombinator.com/item?id=7689823


What is Mori?

ahh, found it. http://swannodette.github.io/mori/


Logarithmic is not constant.

The fact that elm announces constant array access will not surprise anybody, as this is an inherent property of arrays. But then seeing that they have to use radix-trees for arrays and announcing it as constant is highly misleading. These guys should switch over to politics.

And this only holds for immutable arrays! So even without push,pop,shift,unshift they have to use trees for arrays.


Fully populating a 128 bit memory pool would take enough energy to boil the oceans[1]. This requires a branching factor of 26 to address.

So the absolute worst case scenario - even if we get science fiction computing - is a small, constant number of indirections.

The whole purpose of this type of analysis is to get an intuitive approximation as input size grows. The graphs for log32(n) and 1 will be essentially indistinguishable, and most programmers have a much clearer understanding of constant time versus O( log32( n ) ).

It is not misleading, it is a simplification that is absolutely valid to make.

[1] https://blogs.oracle.com/bonwick/entry/128_bit_storage_are_y...


Agree with what you wrote. However, the article is misleading when it says that because the operations are "constant time in practice," that they are "fast."


Unless the article has been edited, it doesn't say that[1].

What the article calls "constant time in practice" is necessary, but not sufficient.

As you say, it's not correct to say that CTIP means that you're fast - after all, you could make a HTTP request per array access and maintain constant time.

But you can say that taking advantage of CTIP allows you to achieve speed.

In my opinion, it takes quite an uncharitable reading of the article for it to seem like they mean the former.

[1] (The closest section to your quote) "Elm now has fast immutable arrays. How can that be? Is there such a thing? Thanks to Christian Widera, the new Array library uses some very clever data structures that make common operations like get and set constant time in practice! We will get into the details later in this post, but the big takeaway is that you can have immutability and speed. "


> What the article calls "constant time in practice" is necessary, but not sufficient.

So non-CTIP data structures cannot be fast! There it is again!

This line in particular jumped out at me:

One way to make these trees faster is to increase the “branching factor”.

And we are lead to conclude that the 32-ary tree is faster than the binary tree. But there's necessarily a tradeoff here: a larger branching factor results in fewer node traversals, but more copying and wasted space on insertions and deletions.

So it's not at all obvious that a higher branch factor is faster. After all, in the limit as the branch factor approaches infinity, you recover naive arrays.

Basically I'm begging for benchmarks! It would be very interesting to see the performance effects of adjusting the branch factor.


Constant time in practise is a notable difference from saying constant time.

I can always sort a 32-bit int array in worst case linear time using counting sort, which in theory is better than the worst case runtime of quicksort, O(n²). Is it better in practise? Of course not, the constant factor is just too high, both for time and memory.


Much as I dislike it, not everyone uses 'array' to mean a collection with constant time indexing. Lots of people use it to mean an indexable collection.

The 'constant' vs. 'almost constant' has been covered by other comments.


Given the current rate of progress, what year would you predict that web browser applications use large enough arrays where that distinction makes a practical difference, and not just an academic one?


Why are Elm's arrays so much more complex than JavaScript's O(n) arrays?


It's about mutability vs immutability. I think Zach explains the whole idea of immutable arrays really well: http://www.infoq.com/presentations/julia-vectors-maps-sets


Elm's arrays are immutable. JS's are mutable.


slice/push/get/append can stand in for push/pop/shift/unshift. The immutability of the arrays is considered a feature and Javascript arrays are O(N) for the same operations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: