Pandas supports JOIN and GROUP BY operators so you are saying that there is a gap between Apache Arrow and other mature dataframe libraries? If there is a gap, is there no plan to fix it in the standard Arrow API?
I understand the case for a SQL-like DSL and an optimizer for distributed queries (in-memory column stores, not so much). I'm trying to understand the value add of Polars. I don't mean to come across as critical; perhaps DataFusion is a poor implementation and you are being too polite to say so.
I also think that there is a C++/Arrow vs Rust/Arrow decision that has to be made. I associate PyArrow with the C++/Arrow library. Is Polars' Eager API a superset of the PyArrow API with the addition of JOIN/GROUPBY/other operators?
There is definitely a gap, and I don't think that Arrow tries to fill that. But I don't think that its wrong to have multiple implementations doing the same thing right? We have PostgresQL vs MySQL, both seem valid choices to me.
A SQL like query engine has its place. An in memory DataFrame also has its place. I think the wide-spread use of pandas proves that. I only think we can do that more efficient.
With regard to C++ vs Rust arrow. The memory underneath is the same, so having an implementation in both languages only helps more widespread adoption IMO.
Thank you for your work! I've decided to kick the tires after reading your Python book, I think you understimate the clarity of the API you have exposed which, honestly, looks a fair bit more sane than the tangled web that pandas is.
I understand the case for a SQL-like DSL and an optimizer for distributed queries (in-memory column stores, not so much). I'm trying to understand the value add of Polars. I don't mean to come across as critical; perhaps DataFusion is a poor implementation and you are being too polite to say so.
I also think that there is a C++/Arrow vs Rust/Arrow decision that has to be made. I associate PyArrow with the C++/Arrow library. Is Polars' Eager API a superset of the PyArrow API with the addition of JOIN/GROUPBY/other operators?