Hacker News new | past | comments | ask | show | jobs | submit login

I think it's mostly a nod to the fact that R's data.table blows everybody else out of the water by such a ridiculously wide margin. It's like a factor of 2 faster than the next fastest...

So if you're writing a dataframe library as a hobby project, it's far less demotivating to use "all the other implementations" as your basis for comparison, at least initially.




I think a hobby project written in a general purpose language being the second fastest dataframe library is a hell of an accomplishment.


Sure, but data.table also fits those criteria.


Any idea what makes R's data.table so fast compared to the others?


I read somewhere else (Another comment I think) that it was a ground-up implementation taking a very performance orientated approach.

Basically it seemed like they really got in the weeds to make it super fast.


R is from ~2000, while pandas started in 2011. Is it possible that the lack of compute power had an effect on the required performance characteristics?


data.table is basically a highly optimized C library

https://github.com/Rdatatable/data.table


That's somewhat like libvips which was started when a 486 was state of the art - fast forward and it's an image processing monster.


R is much older than 2000, it's from 1993.


And it’s an implementation of S, originally from Bell Labs in 1976


Thank you, my brief research led to a list of versions that had R 1.0 as 2000, but it appears that v0 lasted a good many years. Pandas as well was in v0 for many years so it is the better comparison to use like-for-like.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: