Hacker News new | past | comments | ask | show | jobs | submit login

Having seen a lot of work come to grief because of the decision to use pandas, anything that’s not pandas has my vote. Pandas: if you’re not using it interactively, don’t use it at all. This advice goes double if your use case is “read a csv.” Standard library in Python has you covered there.



Both duckdb and especially polars should also be mentioned here. Polars in particular is quite good Ime if you want a pandas-alike interface (it additionally also has a more sane interface).


Since DuckDB can read and write Pandas from memory, a team with varying Pandas fluency can benefit from learning DuckDB.


Since Pandas 2, Apache Arrow replaced NumPy as the backend for Pandas. Arrow is also used by Polars, DuckDB, Ibis, the list goes on.

https://arrow.apache.org/overview/

Apache Arrow solves most discussed problems, such as improving speed, interoperability, and data types, especially for strings. For example, the new string[pyarrow] column type is around 3.5 times more efficient. [...] The significant achievement here is zero-copy data access, mapping complex tables to memory to make accessing one terabyte of data on disk as fast and easy as one megabyte.

https://airbyte.com/blog/pandas-2-0-ecosystem-arrow-polars-d...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: