> Maybe it was because I was often writing multi-threaded shared-memory code in ...

pcwalton · 2024-06-14T00:35:45 1718325345

> Here’s an example. Quite often I need to compute long arrays of numbers, and the problem is parallel, like multiplication of large matrices. A good way to do that is slicing the output into blocks, and computing different blocks on different CPU cores, using OpenMP or some other thread pool. Different CPU cores need concurrent write access to the same vector, this is illegal in Rust.

This is actually easier in Rust than in C++, because of par_iter_mut() [1] from Rayon.

(In any case, usually if you want to do that sort of thing quickly then you'd use ndarray which can use BLAS.)

> Pointer chasing is generally slow compared to arithmetic instructions, but much faster than hash maps which can be used to implement the same thing. A hash map lookup is chasing at least 1 pointer usually multiple (depends on the implementation), and before that spends time computing the hash.

Usually in Rust you use indices into arrays instead, which can be folded into the addressing mode on most architectures. If you really want to use a hash map, there's slotmap which precomputes the hash.

[1]: https://github.com/rust-lang-nursery/rust-cookbook/blob/mast...

Const-me · 2024-06-14T01:18:48 1718327928

> par_iter_mut() [1] from Rayon

I wonder have they fixed the performance? https://www.reddit.com/r/rust/comments/bto10h/update_a_scali...

> usually if you want to do that sort of thing quickly then you'd use ndarray which can use BLAS

In C++ I’d usually use Eigen, because these expression templates are saving memory allocations and bandwidth storing/reloading temporary matrices. Sometimes much faster than BLAS libraries with C API. I’m not sure Rust has an equivalent.

> indices into arrays instead, which can be folded into the addressing mode on most architectures

For some applications of graphs and trees it’s useful to have nodes polymorphic. An example is a visual tree in GUI: different nodes are instances of different classes. Array elements are of the same type.

On AMD64 that’s only true when the size of the elements being addressed is 1/2/4/8 bytes, the SIB bytes only have 2 bits for the scale. For any other use case, addressing these arrays requires to multiply (or if you’re lucky at least left shift) these integers

Even when the elements are 8 bytes so the indexing can be merged, you need to either spend a register for the base address, or load it from memory with another instruction.

It’s relatively expensive to split or merge linked lists/trees/graphs stored that way. If the tree/graph is long lived, mutable, and changes a lot, eventually you might need to compact or even garbage collect these arrays.

pcwalton · 2024-06-14T02:35:13 1718332513

> I wonder have they fixed the performance? https://www.reddit.com/r/rust/comments/bto10h/update_a_scali...

The comments point out a whole bunch of problems with that data. A better example would be https://parallel-rust-cpp.github.io/introduction.html which shows them as quite comparable, depending on the compiler.

> In C++ I’d usually use Eigen, because these expression templates are saving memory allocations and bandwidth storing/reloading temporary matrices. Sometimes much faster than BLAS libraries with C API. I’m not sure Rust has an equivalent.

That equivalent would be ndarray.

> For some applications of graphs and trees it’s useful to have nodes polymorphic. An example is a visual tree in GUI: different nodes are instances of different classes. Array elements are of the same type.

And in that case you can use Box (or Rc/Arc).

> Even when the elements are 8 bytes so the indexing can be merged, you need to either spend a register for the base address, or load it from memory with another instruction.

I've never seen this be a performance problem in practice; the cost of doing a shift and add is incredibly low compared to the cost of actually fetching the memory.

> It’s relatively expensive to split or merge linked lists/trees/graphs stored that way. If the tree/graph is long lived, mutable, and changes a lot, eventually you might need to compact or even garbage collect these arrays.

Which is the same thing modern thread-caching mallocs also have to do, except that compacting and garbage collecting is actually possible with the arena approach (not that I think it's terribly important either way).

Const-me · 2024-06-14T10:37:34 1718361454

> That equivalent would be ndarray

It seems ndarray is conceptually similar to C libraries, it doesn’t have expression templates.

When you write r = a * b * c with ndarray, you allocate, store and then load a temporary array with a * b. When you write r = a * b * c with Eigen, depending on the types it sometimes skips the temporary, and instead computes the complete expression in one shot. For some use cases, the tactic causes substantial performance win.

> Box (or Rc/Arc)

An array of boxes will cause another pointer chase: first to load the pointer, another one to reach the payload.

xoranth · 2024-06-14T15:15:59 1718378159

> expression templates

That's one of the cases where you can't "mechanically" translate C++ to Rust. To obtain the same result, a good choice would be a proc macro.

Which is a pain to implement, but will also give you more flexibility.

jltsiren · 2024-06-14T03:48:44 1718336924

All data structures are compromises. If you want something (such as the ability to interleave queries and updates), you lose something else (such as performance or space-efficiency). Instead of using a single general-purpose data structure, I've found it useful to have several specialized structures making different compromises, with efficient conversions between them.

When it comes to graphs, the naive hash map representation has its uses. But I more often use representations based on conceptual arrays. The representations could be row-based or column-based, the arrays could store bytes or structs, and the concrete arrays could be vectors or something similar to B+ trees. Not all combinations are relevant, but several of them are.

And then there are overlays. If one representation is otherwise ideal but it doesn't support a specific operation (such as mapping between graph positions and positions on certain paths), an overlay can fix that. Another overlay could use the graph to represent a subgraph induced by certain nodes. And another could represent the same graph after a transformation, such as merging unary paths into single nodes.

When you have several graph implementations and overlays, the interface has to be pretty generic. You probably want to use either node identifiers or opaque handles. The latter could be node identifiers, array offsets, or pointers, depending on the concrete graph.