Looks like a cool project. It's better to separate benchmarking results for big ...

Looks like a cool project.

It's better to separate benchmarking results for big data technologies and small DataFrame technologies.

Spark & Dask can perform computations on terabytes of data (thousands of Parquet files in parallel). Most of the other technologies in this article can only handle small datasets.

This is especially important for join benchmarking. There are different types of cluster computing joins (broadcast vs shuffle) and they should be benchmarked separately.