Hacker News new | past | comments | ask | show | jobs | submit login
Fast immutable vectors in C# (julesjacobs.github.io)
119 points by luu on Nov 11, 2014 | hide | past | favorite | 24 comments



Fast immutable collections are easy when your collections aren't actually immutable. http://www.reddit.com/r/programming/comments/2lyf8f/fast_imm...



Comparisons with ImmutableList<T> from Microsoft aren't necessarily valid, since ImmutableList<T> supports insertion and removal at arbitrary points. The implementation in the article is optimized for adding at the end.


That's why there are two separate methods on this class - Add (at the end) and Set (anywhere). Both outperform the corresponding ImmutableList<T>, according to the article.


Set is the corollary to SetItem on ImmutableList<T>, not Insert.


But Set can just overwrites the specified element while Insert will insert the element at the specified position by pushing the element and all following elements out of the way.


> But Set can just overwrites the specified element

Kind of: Set returns a new Vector with a changed state (the specified element overwritten).

But yes, this implement is very sparse: there's no way to add and remove elements in the middle, no way to even tell how many elements are in the list: https://news.ycombinator.com/item?id=8592178

Perhaps omitting them hides some different optimisation choices (i.e. they would be more expensive)

Maybe it's just a proof of concept?


Hm.. am I missing something? there's no Count property on the Vector interface.

If Lookup and Set take an index (int i), and you can't tell how many items there are in the vector, what happens when you specify an index way past the end to one of those?


That is really odd (unless I'm also missing something). It should be trivial to implement a count.


It should be trivial. The only reasonable explanation that I can think of is that it was't needed for a Proof of Concept.


"If there’s interest in a full featured production quality version, I may implement that."

That would be great.


But... this isn't actually a vector. It stands to reason that if you use a different data structure then you will get different timing properties, but you are not comparing like-for-like.


What is a vector? There is nothing that says a vector has to be a sequentially stored array. If you agree that the interface is a vector interface, then it "is" a vector.


That's pretty much the definition of it. It's what just about every language out there calls a vector.

If you want a blackbox object that has Lookup(), Add() and Set() but with no guarantees about the underlying algorithm, you should probably avoid calling it a vector...


> If you want a blackbox object that has Lookup(), Add() and Set() but with no guarantees about the underlying algorithm, you should probably avoid calling it a vector.

"Vector" here is an interface.

The equivalent, e.g. "interface ILinkedListVector" is considered bad practice, i.e. you do want an interface to be a back box and not leak implementation details. if it has the right operations, "Vector" (actually more like "IVector<T>") is an acceptable interface name.


A "list" can be a linked list, an array based list and so on.

Would a data structure stored in an array but not supporting the correct interface be a "vector" because the secret implementation is an array?


Queues and stacks are typically implemented using arrays. Should a queue or stack that uses a linked list (which is how you make them lock-free, or one way to make them immutable) be called something else then?


Right, except for the fact that they always are made to behave that way, and if I were using a language with a "vector" type which performed like a linked list i would probably stop using that language altogether.


What? The standard Clojure vector is like that. Nobody has a problem with it. It has good performance. Why should you care about the memory layout?

If you are not doing math on pointer, it should not matter.


It has nothing to do with pointers, but memory access patterns can have a huge performance impact. Iterate over a large array, and do the same over a large linked list. The latter cannot make use of the CPU cache in any meaningful way, for example.


I agree, memory access patterns can have a huge performacne impact. The cache is importent but its not as simple as you think.

Im not gone explain to you why it work, but see paper [1].

Its not about the clojure vector, but basiclly the same and he explains performance and he it is quite fast for lookup. Adding is also quite fast, both for immutable and the mutable case.

[1] http://infoscience.epfl.ch/record/169879/files/RMTrees.pdf


And the linked list can share storage between different lists which the straight array cannot. There are always benefits and drawbacks with different structures. "list" doesn't say what data structure is used, it only describes the semantics (an ordered collection of items). It's the same with a vector.


much of the point of immutable data structures is to re-use the storage of items. A mutable vector is usually just a continuous array for that reason (can't re-use so just as well to optimize for compactness and access). But an immutable vector isn't as natural to store in an array.

The opinion that vector must equal array is presumably because people have used mutable vectors in e.g Java or C++ and assumed the underlying data structure is the same in other contexts.

A mutable List is very natural to implement array based, but an immutable List is usually not. It's the same thing.


I have something similar in C++, works pretty well.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: