Hacker News new | past | comments | ask | show | jobs | submit login
Eigen: A C++ template library for linear algebra (tuxfamily.org)
173 points by cpp_frog on April 4, 2022 | hide | past | favorite | 84 comments



Using it for years, and mostly happy with that library. The performance is awesome for very long vectors / large matrices.

It’s less than ideal for small things when the size is known at compile-time. If one knows SIMD intrinsics, in some of these cases the Eigen’s implementation can be outperformed by a large factor like 2-4. Also it’s very hard to mess with RAM layout of some things (like sparse matrices), just too many layers of abstraction and too much template metaprogramming.

But still, out of the box the usability of Eigen is awesome. And until the code is written, debugged, integrated and benchmarked, it’s generally impossible to tell whether a particular algorithm gonna be a performance bottleneck. That’s why I’m mostly happy with the library.


Can confirm the poor performance on small matrices (less than 20x20). This is a problem particularly for robotics applications when your entities are positions, velocities, etc.

https://stackoverflow.com/questions/58071344/is-eigen-slow-a...


Huh, interesting - this is news to me, as I use Eigen all the time/see it used all over for robotics. Is there a good replacement for robotics-specific operations/small matrices generally (I see some people mentioning DirectXMath?)? Or is the tradeoff just between spending the time and effort to write SIMD intrinsics yourself vs. lower performance but greater convenience with Eigen?

One advantage of Eigen's approach that I haven't seen mentioned here is that its templated design makes it easy to substitute custom scalar types for operations, which helps enable straightforward automatic differentiation and other such tools (e.g. I'm currently using Eigen to make a tracing JIT for computations like FK, etc. over scenegraphs).


Blaze [0] looks promising. There was a little discussion here [1] about it's comparison and performance, especially for small sizes.

[0] https://bitbucket.org/blaze-lib/blaze/src/master/

[1] https://bitbucket.org/blaze-lib/blaze/issues/266/blaze-vs-ei...


Thanks, this looks promising indeed! Especially because it similarly supports custom scalar types: https://bitbucket.org/blaze-lib/blaze/wiki/Vector%20and%20Ma...


One small downside of Blaze vs. Eigen for robotics is that Blaze seems to lack the highly convenient geometry types and operations that Eigen ships with. This wouldn't be difficult to build on top of Blaze (and could lead to some interesting performance comparisons), but does present some additional work necessary to use Blaze in many robotics applications.


glm is also good for 3d math. It mimics the API of OpenGL shaders, so it's a good option if you already know how to write shaders (or are interested in learning).

https://github.com/g-truc/glm


> when your entities are positions, velocities, etc.

For use cases where FP32 precision is enough, I usually use DirectXMath library https://github.com/Microsoft/DirectXMath for that. That thing is cross-platform in practice. Even when building things for ARM Linux, it’s easy to copy-paste required pieces, NEON support is there.

When I need FP64 precision on PCs, I usually proceed without libraries, using AVX intrinsics.


An other answer just debates wether that was a benchmarking issue


Multiple other users confirmed the results and that one answer's author never followed up.


No estoy de acuerdo con tus politicas


It may have improved recently -- I haven't measured -- but serial Eigen seems mostly a little less performant at plateau than optimized BLAS GEMM for reals, and about half as good for complex in results I've seen for v3.3. For multiplication/convolution of sufficiently small dimension matrices on x86 (aarch64 in development) you probably want libxsmm; it can be used header-only -- at least for C -- if that matters. I might guess Eigen does relatively better on L1 and L2 than BLAS libraries.


In my case Eigen is handling some 3000x3000 and bigger matrices. For these scenarios, it’s performance is about 98% of BLAS libraries, which is more than enough when combined with the ease and practicality of Eigen. It also handles RAM placement, so it doesn’t get affected by memory fragmentation.

Their current performance page is here [0].

[0]: https://eigen.tuxfamily.org/index.php?title=Performance_moni...


[0] still looks bad for complex (CGEMM, I assume). It doesn't have comparisons there, but Eigen's claim to be faster than any free BLAS is surely wrong. (I wouldn't doubt its similar if it has the right loop structure and prefetching, and a few percent is within the normal variance of HPC runs anyway, so not worth worry about.) Unfortunately it isn't convenient for me, not being clever enough for C++, but I've nothing against any good free linear algebra implementation for a platform I'm using.

What's "RAM placement" in this context?


I can't comment for performance of Eigen concerning complex calculations, since my work lives in the real domain.

On the other hand, Eigen didn't claim to be faster than any BLAS during my interaction with informational side of their website. Nowadays, I just get the latest version, update my code and directly reach for their reference guide.

However, being used by both CERN (in LHC ATLAS) and TensorFlow, I guess they're no slouch in terms of performance (it also genuinely amazed my heavy BLAS fan professor at my Jury when the performance numbers came up). Another plus is, Eigen's ability to be used in CUDA kernels (though I didn't try it yet).

On the other hand, they're insanely fast (almost BLAS fast in my experience) for what I'm doing, and it's really well optimized (shows when you use -O3, especially on the solvers side). Also, it has neat capabilities like dynamic matrices and instant submatrix access (you can get a submatrix of a matrix, or just patch in a submatrix in a single call).

About RAM placement: From my experience, code becomes overly dependent on non-fragmented memory when allocating big matrices and/or vectors naively (i.e. in a contiguous manner), and execution fails as the node you're running on (or your computer) gets busier with other code. Eigen counters this by intelligently dividing bigger matrices into smaller chunks, both optimizing the access time and reducing (or effectively eliminating from my experience) the risk of segmentation faults related to failed memory allocations due to not having not enough undivided space in the memory space.


> On the other hand, Eigen didn't claim to be faster than any BLAS during my interaction with informational side of their website.

It claims to be faster than any free BLAS in the FAQ (but talks about ATLAS and GotoBLAS there). Given how close GotoBLAS in the guise of OpenBLAS gets to peak performance for GEMM, that would be dubious without measuring 2/3 of OB DGEMM performance. I assume claims of speed are for when BLAS is actually providing it. (I'm too experienced to be susceptible to appeals to authority of HEP software!)

Thanks for explaining the memory thing. I'd hope HPC code wouldn't be subject to SEGVs due to memory allocation failing (for two reasons), but you typically want arrays on the stack and otherwise large pages to minimize TLB misses (per Goto).

I'm not intending to bash Eigen, happy if people can use performant free software.


That's interesting. I'd assume that a template library like Eigen would be most competitive for really small matrices and vectors of a known size -- it should have a fundamental advantage over something like MKL or BLIS, in that it can fully inline and unroll everything if it wants, right?


My guess it’s their deliberate design decision. It could be that most users of the library don’t care about small things.

About the template stuff, I think their main performance advantage over traditional BLAS is not even SIMD, it’s lazy evaluation. Expressions like x=a*b+c never compute the complete a*b matrix or vector. The a*b expression returns a small placeholder object on the stack, of a scary type with couple lines of template arguments in the type name. This way the complete expression runs without making temporary matrices, instead it streams data from all 3 arguments and only writes to memory once. And if the `x` is of the correct size already, it doesn’t call malloc/free.


Yeah, lazy evaluation/fusing operations is I think their best feature. Fixed interface BLAS has really good performance, but I mean, something as simple as DAXPBY is an extension. Selecting what functionality to super-optimize is hard for a fixed interface, and involves non-technical stuff like figuring out what subroutines people actually want.

I'm sure someone has already looked at this, but I wonder if Eigen can just somehow nab kernels directly from BLIS, haha.


This is called "expression templates" https://en.wikipedia.org/wiki/Expression_templates


Thank you for this. I wrote a small linear algebra library using intrinsics in C#, and my code was beating Eigen at the n<50 level, but I figured it was just some error of mine, so I never had the balls to make the claim publicly.


Any idea about perf diff between this and GLM? In Computer graphics my use case is with fixed 4x4 matrices and vec4s. Thanks!


I never really used GLM, but Eigen was substantially slower than DirectXMath https://github.com/microsoft/DirectXMath for these things. Despite the name, 99% of that library is OS agnostic, only a few small pieces (like perspective projection matrix formula) are specific to Direct3D. When enabled with corresponding macros, inline functions from that library normally compile into pretty efficient manually vectorized SSE, AVX or NEON code.

The only major issue, DirectXMath doesn’t support FP64 precision.


Last time I tried Eigen, it sucked for 4x4 matrices.


Could this be fixed with some template specialisation?


Eigen makes use of these already


Eigen is the standard choice (for good reason!) for many linear algebra projects, especially in robotics, but there is a big downside users should be aware of before they chose it.

Eigen makes extensive use of expression templates in C++ to collapse complex operation sequences into streamlined and minimal calculations. This is generally ok, until you need a debug build. I've regularly seen debug builds of software using Eigen run 1000x to 10000x slower than the release build, which seriously complicates various debugging workflows. It also makes it a nightmare to run your test suite through valgrind, for example.

I've seen several engineers attempt (and fail) to try creating/linking a release build of Eigen with a debug build of the rest of the program to try to regain most of that speed while still allowing a decent amount of debugability, but this is really hard due to all the aggressive inlining and heavy use of templates.

In my experience, I would happily accept a 2x or more slowdown in linear algebra performance in release builds in exchange for significant boost in debug execution speed. If you're starting a greenfield project, you should consider how important decent debug performance is before choosing Eigen by default.


You can create custom build configurations to set the optimizations and add debug info. Or just add instrumentation code (print debugging) which is more useful for debugging heavy numerical stuff most of the time anyway.

Running anything through valgrind or cachegrind will have several orders of magnitude slowdown - that's inherent to how the tools work.

To your last point, just add -g to your compile flags and see how far you get.


I don't think you quite understand the problem. The problem wasn't turning on optimizations or getting debug info, the problem was getting Eigen-based linear algebra code to run at a sufficiently reasonable speed when optimizations were disabled because doing so would make other surrounding logic easier to debug (or you needed to run some analysis tool like Valgrind on a debug build).

For example, we had simulation test suites which would run ~30 seconds of real-world time. If we can run those sims at 10x real time, each only takes 3 seconds, great! But with optimizations turned off, heavy Eigen code would run 1000x slower, which means the sims now take nearly 3000 seconds to run when bug hunting in the non-Eigen related surrounding logic. This already makes the test suite virtually un-runnable in debug builds, but if you want to then run those binaries through Valgrind (which tacks on another 100-1000x performance tax) you're now talking about something which is completely intractable. You can't run optimized builds through Valgrind without potentially introducing false positives, so you can't even run Valgrind on only the optimized builds... you just can't run it at all because your linear algebra library is slowing everything down.

The core problem is Eigen's heavy use of expression templates, which the optimizer cuts through just fine. But without the optimizer turned on, it adds layers and layers and layers of unnecessary abstraction. Because these are all templates, C++ expands all this code at every call site, making it substantially less efficient and impossible to shim out by linking with an optimized build of Eigen itself.

The only approach I've seen get close to making this work was an attempt by an engineer to use explicit template instantiation to create a separate binary with all the necessary Eigen functions which could be compiled separately with the optimizer turned on. However, this proved too nightmarish because since Eigen can template on the dimensions of everything, you need to know up front exactly all the various instantiations of Eigen types the program uses. This might be doable by writing a clang plugin or something, but doing it by hand was just way too much work because the codebase was huge, so we couldn't justify spending more time on it.


I appreciate being made aware of this downside! Why does this happen with C++ debugging?


In general, in template heavy libraries, what kills performance in debug builds is lack of inlining with optimizations turned off. And templates traditionally need to be inlinable by having their full definition in a header, so one can't even say build with optimizations on with debug info in isolation, because it has to be done where the templates are used, not in the library where they're defined (because there is no separately built library for the template). In other words, it's a real pain to get just the templates optimized while the rest of your code is not optimized for easier debugging.


This is a problem with C++ in general.

In debug mode it has no notion of performance whatsoever.


The first thing I wondered was about the license. Fortunately, it mostly uses the MPL license:

https://eigen.tuxfamily.org/index.php?title=Main_Page#Licens...

> Note that currently, a few features rely on third-party code licensed under the LGPL: constrained_cg. Such features can be explicitly disabled by compiling with the EIGEN_MPL2_ONLY preprocessor symbol defined. Furthermore, Eigen provides interface classes for various third-party libraries (usually recognizable by the <Eigen/*Support> header name). Of course you have to mind the license of the so-included library when using them.

> Virtually any software may use Eigen. For example, closed-source software may use Eigen without having to disclose its own source code. Many proprietary and closed-source software projects are using Eigen right now, as well as many BSD-licensed projects.


My only complaint is that using OpenMP Eigen can be slower with SMT than without SMT. They even suggest telling to use half as many threads as you have "cores" when "cores" means twice as many due to SMT.

Otherwise, we've seen a 8-10x performance increase in SolveSpace (CAD) in some situations after switching from home-grown matrix operations to Eigen.


This has been true for years, Intel CPU's can't efficiently perform math operations when hyperthreading is involved. Generally there is only a single shared FPU/AVX/SSE unit doing the math over two hyperthreads. Since the Eigen implementation often can keep that unit 100% busy, it makes no sense to try and run two threads at full tilt through the units.

I tested all this very heavily before Eigen had AVX-512 support. In that environment there might be some differences and I would suggest you benchmark both configurations.


> Generally there is only a single shared FPU/AVX/SSE unit doing the math over two hyperthreads.

Hyperthreading does in general share units (both ALU and others); that's what hyperthreading is.

Apart from that, it really depends on what operations you're doing; e.g., modern Intel CPUs have three ports that can issue a 256-bit FMA, each, every cycle.


>> Hyperthreading does in general share units (both ALU and others); that's what hyperthreading is.

Yes, I think the issue with Eigen is cache related. They apparently have optimizations that are aware of cache architecture and running 2 threads that share the same cache will screw that up, resulting in more misses. If this is the case, I'd prefer algorithms that are cache line size agnostic. It is still much faster than the simple hand-written code we had before!


I think this is generally true if your workload is SIMD/AVX heavy: these types of “heavy” instructions cannot execute on a single core simultaneously.


If you have Eigen-like code that won't tend to have many cases where you're not having many branch mispredicts or loads the prefetcher can't figure out and you also have enough calculations that you can use the width of the core on a single thread then there really isn't any potential throughput gain with SMT but you still suffer from cache contention from having two threads. It's really not Eigen's fault, it's the nature of SMT that it doesn't help in all cases.


One of the best things about Eigen is that it is the only linear algebra package that I know of that easily supports using the same code for low, high, and arbitrary precision floating point numbers. I had some linear system of equations that I need to solve in grad school where the condition number was small enough that double precision was not sufficient to solve the problem. I was able to very easily use double double, quad double [1], and arbitrary precision floating [2] point number implementations to solve these problems. The matrices I used were not especially large, but I couldn't find any other existing packages that fit this use case.

[1] https://www.davidhbailey.com/dhbsoftware/

[2] http://www.holoborodko.com/pavel/mpfr/


Probably won't ever catch Fortran, but using Eigen templates for reductions really opens the door for compile time optimizations; e.g. these are all reductions that do the same thing

    const float residual = (L.array() * P.array()).colwise().sum().square().mean();
    const float residual = L.cwiseProduct(P).array().colwise().sum().array().square().mean();
    const float residual = (L.transpose() * P).diagonal().array().square().mean();
The compiler can optimize using static information, e.g. these would all be handled differently for the following types

    Eigen::Matrix<float,3,3> L(3,3), P(3,3);
    Eigen::Matrix<float,3,Eigen::Dynamic> L(3,K), P(3,K);
    Eigen::Matrix<float,Eigen::Dynamic,Eigen::Dynamic> L(N,K), P(N,K);


The compile time loop fusion is also particularly nice.


Thanks -- I just learned something new. Since matrices are primitive types in Fortran, I assume these kinds of optimizations are more abundant in Fortran. I wonder if adding matrices / vectors / tensors as primitive types has even been entertained by the various C++ committees.


I can’t stress enough how this library is a saving grace. It would be incredibly difficult to port my python prototype code with numpy to C++ production without using Eigen.


God bless Eigen, I am using it to implement the state matrices of a Kalman Filter and it's a joy to use its APIs.


Eigen has one of if not the best linear algebra APIs I've ever seen. In particular, vectors are column vectors by default and you never need to touch row vectors (if I can editorialize, row vectors shouldn't exist at all), and vectors are not simply "n-by-1" matrices -- they're true vectors.


But they are nx1 matrices in Eigen, you can call the rows() and cols() methods to check that. However, the nice part is that, since Eigen knows the number of columns at compile time, it enables vector methods such as size() in addition, which lets you handle them as true vector.

But tge nice part is that these vectors can still interact well with parts of the code that expect a matrix.


Could you elaborate a bit more about the difference between Nx1 matrices and "true vectors"?


One has shape (N, 1) and one has shape (N). Always having a trailing 1 is a pain in many settings. Matlab has that “trailing 1” all the time and it’s a PITA.


Why do you say it's a pain? After working with numpy for the last two years, that's the part I miss most about matlab.


There are many reasons the "-by-1" doesn't make sense.

* If a vector is an n-by-1 array (a "matrix"), why not an n-by-1-by-1-by-1-by-1?

* What is a row vector in something like a function space? Is sin(t) a row or column vector?

* You _can_ think of a matrix as acting "to the right (on a column vector)" and "to the left (on a row vector)", but this disrupts the notion of a matrix as a linear function acting on a thing (since it can act on two things). It's much more consistent to define a matrix acting on a vector, m(x), independent of "direction" and then define the transformations that let you go in the opposite direction (take the transpose of m which is a whole new matrix).

* Are physical quantities like velocity a row or column vector? Convention dictates they're column vectors, so in what scenario would it become a row vector? If I have a velocity, v1, that's a row vector and another, v2, that's a column vector, is it appropriate to calculate the projection of v1 onto v2? Should the projection operation be "aware" of the orientations? If all vectors are the same ("column vectors"), this isn't a problem -- you can always define the projection operation without special cases.

* In matlab specifically, if you have an n-by-1 array `a` and a 1-by-n array `b` with the same elements, then `a(i) == b(i)` but `a` and `b` are not the same thing.


If you want a great video on the topic, see https://youtu.be/C2RO34b_oPM

Or equivalently https://jiahao.github.io/talk/2017-06-juliacon/

There are basically two self-consistent choices for the design of linear algebra libraries.


I implemented the same thing for my master thesis project


Eigen is really nice for getting code that looks like the underlying math and it also optimizes away unnecessary temporaries by using template expressions. However, when you mess up, you get a screen full of template errors that take some experience to understand and debug.

It makes me wish for a language + compiler where linear algebraic objects are first-class values.


While it's not exactly what you're asking for, I do find the LinearAlgebra (standard) library in Julia to be pretty fantastic: https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/


> It makes me wish for a language + compiler where linear algebraic objects are first-class values.

Like Fortran?


Fortran gives you first class dense ND-array, but doesn't give you the flexibility to make other types of arrays (banded/blocked/etc) feel first class.


This library is used heavily inside Tensorflow. It is “production ready”.


FWIW it's been used in production long before tensorflow existed.


Eigen is a great library.

A similar one to consider that can at times be slightly easier to use coming from a python background is armadillo: http://arma.sourceforge.net/


Slightly tooting my own horn here; In case anyone is interested in a (trivial) comparison between them I have a tiny example project implemented both with Eigen and Arma (And Fortran and Python/numpy and Julia, FWIW): https://gitlab.com/jabl/tb


I’m using Eigen extensively in a project of mine. It’s extremely fast, versatile, easy to use and reliable. I’m a fan, one might say.


I wonder how Eigen compares to xtensor, which was inspired by Numpy and has support for views, slicing, and broadcasting?

https://github.com/xtensor-stack/xtensor


Eigen is similar to xtensor, but with 1) Eigen being significantly more mature, and used in production in simulation/research environments and 2) Eigen having not just basic operations on arrays, but also a full suite of linear algebra operations comparable to a full BLAS library, including a full test suite. xtensor wins on having better integration with Python and other languages out of the box. There's also some performance limitations due to being inspired by numpy, particularly with respect to memory management.


It's not quite out of the box, but pybind11 has really nice support for eigen matrices and vectors and automatically converts them to/from numpy arrays.


I really like armadillo's syntax, it feels like your doing R/Matlab.


That's not a positive for everyone, Matlab's API choices are a bit polarizing/confusing for people who don't use it.


Not widely publicized, but the benchmarking code is in the source. At one point I was running it on my specific target machines to get performance estimates in support of porting some large-ish CPU stuff from Matlab into C++.

The max performance was in Eigen-calling Intel MKL, but it was a big plus to not need MKL licenses on every development machine.


For anyone still reading this: I'm now confused by Eigen reportedly being basically on a par with optimized L3 BLAS, which I assume doesn't just mean punting to optimized BLAS, which it seems it can do. I can't see any indication that you can do run-time dispatch on the micro-architecture, which I think is important; I saw very poor results until I examined CMakeLists.txt. Anyway, I cmade v3.4 with -DEIGEN_TEST_AVX512DQ=on, which seems most appropriate for my SKX system, and then did make blas. I ran my normal initial quick test using the OpenBLAS 0.3.15 dgemm benchmark, which reports a plateau of ~99000 MFlops. LD_PRELOADing libeigen_blas.so with the same binary under the same conditions (bound to a core, with the hpc-compute tuned profile) gave ~66000 MFlops.

What did I do wrong?


A great library, a cornerstone. Many big libraries build on top of it (e.g. OpenCV, PointCloud Library)


Can you elaborate on how OpenCV is built on top of Eigen? From what I can google it seems that OpenCV can interoperate with Eigen but is not build on top of it.


There are functions related to camera calibration and adjustment of images whose internals are built using Eigen Vectors and Matrixes.


Are you referring to eigen, the C++ library or eigen vectors/values, the mathematical construct?


It supports operator overloading too, by the way. This makes it usable where many LAPACK wrappers like Numpy are not.


Is there anything like Eigen for data that's kept on the GPU? I've been looking at porting some performance-critical scientific computing code from Python to C++ and numpy -> Eigen seems like an obvious migration path, but it's harder to figure out what to do with cupy matrix operations.


There is Kompute.

It wraps Vulkan in a very thin layer, mostly to eliminate boilerplate.


Recently encountered Eigen. It seems awesome - easy to work with, with lots of options. mat.conjugate() * mat.transpose() seems to take a while for my stuff, but I think I just haven't found the right method (next is to look at mat.adjointInPlace()).


Failed to see what chips are supported. Assuming x86 and ARM and their vector/SIMD are leveraged for performance, what about RISC-V's vector/SIMD, is there a plan to add those in the future?


Ahh, the horror of getting this to compile last week in Python :(


BLAS vs Eigen

Go


> Go

Category error.

One is a standard the other a library. BLAS covers a subset of what Eigen does, and Eigen can use BLAS/LAPACK routines directly from other libraries (e.g. MKL) for those things.


Eigen calls BLAS when appropriate and with no overhead and the code looks close to the math. Eigen also supports small fixed size matrices where BLAS is not appropriate.


I believe (haven't looked at benchmarks lately) Eigen has trouble beating a good BLAS implementation like MKL or BLIS at doing BLAS stuff, but it is more expressive and has the ability to do lazy evaluation/fuse operations. Anyway since you can get it to call your favorite BLAS/LAPACK library, these are really complementary projects.


https://github.com/flame/blis/blob/master/docs/Performance.m... but ignore the SKX results with the old OpenBLAS there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: