Fast immutable vectors in C#

0xffff2 · on Nov 11, 2014

Fast immutable collections are easy when your collections aren't actually immutable. http://www.reddit.com/r/programming/comments/2lyf8f/fast_imm...

corysama · on Nov 12, 2014

Fix commited: http://www.reddit.com/r/programming/comments/2lyf8f/fast_imm...

Bognar · on Nov 11, 2014

Comparisons with ImmutableList<T> from Microsoft aren't necessarily valid, since ImmutableList<T> supports insertion and removal at arbitrary points. The implementation in the article is optimized for adding at the end.

DenisM · on Nov 11, 2014

That's why there are two separate methods on this class - Add (at the end) and Set (anywhere). Both outperform the corresponding ImmutableList<T>, according to the article.

Bognar · on Nov 11, 2014

Set is the corollary to SetItem on ImmutableList<T>, not Insert.

danbruc · on Nov 11, 2014

But Set can just overwrites the specified element while Insert will insert the element at the specified position by pushing the element and all following elements out of the way.

SideburnsOfDoom · on Nov 12, 2014

> But Set can just overwrites the specified element

Kind of: Set returns a new Vector with a changed state (the specified element overwritten).

But yes, this implement is very sparse: there's no way to add and remove elements in the middle, no way to even tell how many elements are in the list: https://news.ycombinator.com/item?id=8592178

Perhaps omitting them hides some different optimisation choices (i.e. they would be more expensive)

Maybe it's just a proof of concept?

SideburnsOfDoom · on Nov 11, 2014

Hm.. am I missing something? there's no Count property on the Vector interface.

If Lookup and Set take an index (int i), and you can't tell how many items there are in the vector, what happens when you specify an index way past the end to one of those?

alkonaut · on Nov 11, 2014

That is really odd (unless I'm also missing something). It should be trivial to implement a count.

SideburnsOfDoom · on Nov 12, 2014

It should be trivial. The only reasonable explanation that I can think of is that it was't needed for a Proof of Concept.

_random_ · on Nov 11, 2014

"If there’s interest in a full featured production quality version, I may implement that."

That would be great.

joosters · on Nov 11, 2014

But... this isn't actually a vector. It stands to reason that if you use a different data structure then you will get different timing properties, but you are not comparing like-for-like.

alkonaut · on Nov 11, 2014

What is a vector? There is nothing that says a vector has to be a sequentially stored array. If you agree that the interface is a vector interface, then it "is" a vector.

joosters · on Nov 11, 2014

That's pretty much the definition of it. It's what just about every language out there calls a vector.

If you want a blackbox object that has Lookup(), Add() and Set() but with no guarantees about the underlying algorithm, you should probably avoid calling it a vector...

SideburnsOfDoom · on Nov 12, 2014

> If you want a blackbox object that has Lookup(), Add() and Set() but with no guarantees about the underlying algorithm, you should probably avoid calling it a vector.

"Vector" here is an interface.

The equivalent, e.g. "interface ILinkedListVector" is considered bad practice, i.e. you do want an interface to be a back box and not leak implementation details. if it has the right operations, "Vector" (actually more like "IVector<T>") is an acceptable interface name.

alkonaut · on Nov 12, 2014

A "list" can be a linked list, an array based list and so on.

Would a data structure stored in an array but not supporting the correct interface be a "vector" because the secret implementation is an array?

zamalek · on Nov 12, 2014

Queues and stacks are typically implemented using arrays. Should a queue or stack that uses a linked list (which is how you make them lock-free, or one way to make them immutable) be called something else then?

EpicEng · on Nov 12, 2014

Right, except for the fact that they always are made to behave that way, and if I were using a language with a "vector" type which performed like a linked list i would probably stop using that language altogether.

nickik · on Nov 12, 2014

What? The standard Clojure vector is like that. Nobody has a problem with it. It has good performance. Why should you care about the memory layout?

If you are not doing math on pointer, it should not matter.

ygra · on Nov 12, 2014

It has nothing to do with pointers, but memory access patterns can have a huge performance impact. Iterate over a large array, and do the same over a large linked list. The latter cannot make use of the CPU cache in any meaningful way, for example.

nickik · on Nov 13, 2014

I agree, memory access patterns can have a huge performacne impact. The cache is importent but its not as simple as you think.

Im not gone explain to you why it work, but see paper [1].

Its not about the clojure vector, but basiclly the same and he explains performance and he it is quite fast for lookup. Adding is also quite fast, both for immutable and the mutable case.

[1] http://infoscience.epfl.ch/record/169879/files/RMTrees.pdf

alkonaut · on Nov 12, 2014

And the linked list can share storage between different lists which the straight array cannot. There are always benefits and drawbacks with different structures. "list" doesn't say what data structure is used, it only describes the semantics (an ordered collection of items). It's the same with a vector.

alkonaut · on Nov 12, 2014

much of the point of immutable data structures is to re-use the storage of items. A mutable vector is usually just a continuous array for that reason (can't re-use so just as well to optimize for compactness and access). But an immutable vector isn't as natural to store in an array.

The opinion that vector must equal array is presumably because people have used mutable vectors in e.g Java or C++ and assumed the underlying data structure is the same in other contexts.

A mutable List is very natural to implement array based, but an immutable List is usually not. It's the same thing.

Ono-Sendai · on Nov 11, 2014

I have something similar in C++, works pretty well.