Hacker News new | past | comments | ask | show | jobs | submit login
A Beginner’s Guide to Optimizing Pandas Code for Speed (upside.com)
104 points by chasedehan on Nov 16, 2017 | hide | past | favorite | 14 comments



This is a good point to plug my talks on 'Pandas from the Inside' (A), 'Big Pandas' (B) and 'Pandas 2.0' (C) that I presented at various PyData conferences over the last 18 months:

- PyData London 2016 - (A), 60 mins, videos online

- PyData Washington DC 2016 - (A), 90 mins, videos online

- PyData Amsterdam 2017 - (A) and (B), 3 hours

- PyData Berlin 2017 - (A) and (B), compressed 90 mins, video probably online

- PyCon UK 2017 - (A), (B) and (C), 2 hours

PDF slides for (A) are here: https://github.com/stevesimmons/pydata-ams2017-pandas-and-da...

Others are in my other repos on github: https://github.com/stevesimmons


Doesn't NumPy use SIMD instructions? There's no mention of that in the article.


It does under certain installations. Anaconda tends to favor intel MKL on most x86 systems


Main performance tip is to find ways not to copy data generally:

- use .loc[]

- use inplace=True


can you elaborate on `using .loc[]` -- what is the defective approach it replaces?


I would assume using df[['colA', 'colB']] for projection/column selection?


Also, I would caution about using inplace=True. See: https://tomaugspurger.github.io/method-chaining.html (ctrl+F: "Inplace?")


There's something really offensive about attributing "premature optimization..." to xkcd, even as a joke


step 1 - don't use pandas


Pandas isn't perfect, but for small- to medium-size datasets, I haven't seen much that matches its performance, and I haven't seen anything that matches its combination of performance and ease of use.


As a person who deeply enjoys developing with python, I'd have to reluctantly say R's tidyverse is a delight to use and often faster than pandas in my experience.


Ah, good to know. I haven't touched R in a while.

Does tidyverse fix up the mess that is string handling in R?


Sorry for the late reply. stringr does a pretty good job (though I like how python handles strings better).


But they work for bamboo shoots. Tough to beat that




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: