Hacker News new | past | comments | ask | show | jobs | submit login

Koalas is the Pandas API on top of Apache Spark for anyone that's interested: https://github.com/databricks/koalas

It works similar to PySpark and is scalable to massive datasets (hundreds of terabytes). Koalas is probably the best bet if you're working on a massive dataset and want the Pandas API. Or you can simply use PySpark which has a cleaner interface.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: