Blaze: High Performance Vector/Matrix Arithmetic Library For C++

ur-whale · on Sept 12, 2021

At the risk of sounding dumb, I'll ask the obvious question: in what way is this different from myriad other similar efforts (e.g. Eigen, Blas, Atlas, etc...)?

Also: why is it that for projects like these, the very first 10 lines on the web site is not an actual code example of what usage looks like?

I had to actually go dig in the test directory to find actual code.

https://bitbucket.org/blaze-lib/blaze/src/master/blazetest/s...

rualca · on Sept 13, 2021

There really aren't a myriad of similar efforts. Eigen is pretty much the only C++ linear algebra library. BLAS is a C library which is the de facto standard, and ATLAS is an alternative implementation of the BLAS API.

Other than this, there really isn't much more to pick from.

pstoll · on Sept 13, 2021

Hmm… BLAS started life as Fortran codes (said intentionally to sound like a native Fortean-er).

Saying BLAS is a C library skips over much of the depth and history of its FORTRAN roots. The work that went into achieving numeric stability of this stuff is astounding. Ditto with LAPACK.

https://en.m.wikipedia.org/wiki/Basic_Linear_Algebra_Subprog...

elcritch · on Sept 13, 2021

The original BLAS library on Netlib was and still is written in Fortran (F77 likely) but includes a C api called CBLAS that calls into the Fortran code. Alternatives like ATLAS or maybe Intels MKL might be written in C/C++ or other languages.

ur-whale · on Sept 13, 2021

>There really aren't a myriad of similar efforts.

Fair enough, but still doesn't tell me what's different about this project.

asdf_snar · on Sept 13, 2021

Fair enough. Are there any notable improvements over Eigen? I couldn't see whether Blaze supports dynamic rank tensor operations (Eigen doesn't). Thanks.

shaklee3 · on Sept 13, 2021

Armadillo, MKL

nspattak · on Sept 13, 2021

netlib BLAS/LAPACK, openBLAS, AMD libs, intel MKL, eigen, armadillo, GLM, Blitz++, PETSc, Trilinos (and I am not even starting on more "specialized" such as sparce matrices).

There are maybe one hundred libraries in the field.

IMO all of them and certainly any new/active one(s) should clearly differentiate (or state) thier purpose/goals and what makes them different. Please note, I am not saying what makes them better but different.

I am also tired of reading/hearing "HPC/simd/parallel" without ANY benchmarks/timings to support such claims. Isn't it strange that software implementing mathematics make claims (almost all of the times) without any measurements/proof?

Thanks for the info though, I might remember to try it in the future.

edit: the rant is not about Blaze, it is more of a general rant about similar libraries. I feel I have to mention that Blaze at least tries to address what I am complaining and I also saw that they have instructions on how to replicate their benchmarks on ones' machine which IMO is what every project should be doing.

trinovantes · on Sept 13, 2021

I've historically only used glm for matrices/vectors. Even though glm is specifically for graphics and limited to 4x4 or smaller matrices, I would've liked to see a performance comparison

go_elmo · on Sept 12, 2021

Why exactly is this better than atlas / blas / any library using it, e.g. Eigen?

bayindirh · on Sept 13, 2021

In my understanding, Eigen uses its internal routines unless you point it to external BLAS/LAPACK libraries[0]?

Eigen's internal structures and features doesn't map well to traditional BLAS/LAPACK structures anyway.

[0]: https://eigen.tuxfamily.org/dox/TopicUsingBlasLapack.html

go_elmo · on Sept 14, 2021

My bad, should have looked it up before posting, thanks! afaik, LAPACK is highly architecture-adapted / optimized, so it makes sense to "decouple" the two, leaving it to the user to use it for highly performance critical binaries.

bayindirh · on Sept 15, 2021

You're welcome, no problem; we're just discussing here.

BLAS/LAPACK is very optimized and it's recommended to compile on (or for) target for best performance, however they're developed for a very long time, and somewhat old-fashioned in terms of ergonomics and internal flow.

OTOH, Eigen is very modern, very easy to optimize (just pass -O3, and relevant -march -mtune to gcc), and you're screaming at 98% speed of BLAS/LAPACK.

I've used it extensively in my Ph.D., and TensorFlow is also using Eigen. It's very easy and practical to use, and it's very very fast. It makes abusing (ehrm making full use of) your processor easy and strangely enjoyable.

Some older benchmarks and current performance monitoring pages can be found at https://eigen.tuxfamily.org/index.php?title=Benchmark

optimalsolver · on Sept 12, 2021

Edit: I misread the article.

singhrac · on Sept 12, 2021

I’m confused by you posting this link. The conclusion at the end of the post is that Eigen is the fastest library? Did I miss something?

sgillen · on Sept 12, 2021

From the end of the article:

>> Blaze is certainly more tuned for performance and uses nefty techniques under the hood like padding, explicit loop-unrolling and turning divisions to multiplications to create more opportunities for FMA. However, aggressively tuning for performance implies more specialisations, more overloads, more “SFINAE” and so on and as result compiles slower than Eigen.

singhrac · on Sept 12, 2021

I see, good point. I’m not convinced by this summary of the benchmark; N is too small, and they ignore the fact that views are much slower in Blaze than the others.

To be clear, I think it’s great to have multiple competitive C++ implementations out there.

evanb · on Sept 13, 2021

Collaborators and I built a library for numerically studying the Hubbard model (a model of condensed matter physics) using Blaze. It's great.

HOWEVER, the lack of built-in GPU support is severely limiting, and we found the third-party blaze_cuda library very difficult to get to work. Since supercomputers are increasingly accelerator-focused, this limitation a big enough impedance that we made the extremely nontrivial decision to start fresh, on top of pytorch (where we know that there are years and years of development and support for accelerators ahead), instead.

nelsondev · on Sept 13, 2021

Can one get state-of-the-art speeds for Vector/Matrix Arithmetic from a Java application? Or is it necessary to use C++ to be “closer to the metal.”

adgjlsfhk1 · on Sept 13, 2021

Java really sucks for this kind of work. Auto-boxing makes it really easy to accidentally end up with Objects which will kill performance. Also, Java doesn't give you good access to bit manipulation, processor intrinsics, or other useful low level tools.

pjmlp · on Sept 13, 2021

Besides the upcoming vector library (already mentioned as incubation in Java 17), at least on OpenJDK, Intel did some AVX autovectorization work.

Other than that, there are libraries that allow you to call CUDA from Java, like TVM and TornadoVM.

jamesfinlayson · on Sept 13, 2021

Depends what version you're using - I think Java 17 has improved vector maths capabilities, or it might still be in the incubator (but useable). I haven't looked at the API to see what it gives you though.

OneEyedRobot · on Sept 12, 2021

Hey, I've written those before. It's an interesting SIMD exercise if you've got that opportunity in the processor. The DSP versions can become a little odd looking.

1MachineElf · on Sept 12, 2021

Title gave me a flashback from when Element was Vector IM and their Matrix client was Riot.

jupp0r · on Sept 13, 2021

Great idea to give it the same name as Google's build tool.

vbarrielle · on Sept 13, 2021

The blaze library was started in 2012. The build tool was released in 2015.

packetslave · on Sept 13, 2021

…which is called Bazel outside of Google.

jupp0r · on Sept 13, 2021

... only if you don't look to deep into its source code