Having seen a lot of work come to grief because of the decision to use pandas, a...

c0balt · 2024-11-20T03:47:57 1732074477

Both duckdb and especially polars should also be mentioned here. Polars in particular is quite good Ime if you want a pandas-alike interface (it additionally also has a more sane interface).

ttyprintk · 2024-11-20T03:36:08 1732073768

Since DuckDB can read and write Pandas from memory, a team with varying Pandas fluency can benefit from learning DuckDB.

adolph · 2024-11-20T18:14:07 1732126447

Since Pandas 2, Apache Arrow replaced NumPy as the backend for Pandas. Arrow is also used by Polars, DuckDB, Ibis, the list goes on.

https://arrow.apache.org/overview/

Apache Arrow solves most discussed problems, such as improving speed, interoperability, and data types, especially for strings. For example, the new string[pyarrow] column type is around 3.5 times more efficient. [...] The significant achievement here is zero-copy data access, mapping complex tables to memory to make accessing one terabyte of data on disk as fast and easy as one megabyte.

https://airbyte.com/blog/pandas-2-0-ecosystem-arrow-polars-d...