This, along with inferior metaprogramming affordances, is the biggest reason pan...

kaitai · on Dec 6, 2020

As someone who is now bouncing back & forth between Python and R on a weekly basis, I've been surprised (after making fun of R sometimes) how much I miss the piping when I leave R for Python. Pandas seems so inflexible by comparison, so nitpicky for little gain. I've been surprised again and again how much dplyr supports near-effortless fluency and productivity.

Never really thought I'd be writing that in a public forum.

epistasis · on Dec 6, 2020

Personally, I think R as a language is absolutely beautiful. The libraries have tons of warts, and many different styles. And there are perhaps too many object orientation systems built on it. But that you could even build multiple object oriented systems points to how powerful the language is.

I think it's functional orientation and the way that for loops are neglected make it seem completely insane to many people. But this is a far smaller fraction of people today than it was in 2000. back then, Java and C++ didn't even have lambdas. Since then, procedural languages have gained a lot the "mind-breaking" functional features of Lisp-derived languages. Python and JavaScript have become far more common. All the things that made the language of R "weird" and unusable to the Java/C++ crowd have been adopted elsewhere.

smabie · on Dec 6, 2020

I'm not an R user, but you should try Julia for data analysis. It seems as flexibility (maybe more) than R, while also having blazing performance.

I do like Pandas concept of row indices, which I know Julia (and I believe, R) lack.

scottlocklin · on Dec 6, 2020

You should try using R; the real world performance is actually better than Julia in every way that matters. Yes, Julia has a better data frame object; that's why we use data.tables().

Also Wes probably got the idea for row indeces from R.

disgruntledphd2 · on Dec 6, 2020

The idea for row indices 100% comes from R, as it's been in S for longer than I've been alive ;)

ekianjo · on Dec 6, 2020

Tbh R's performance is only an issue when you deal with really big datasets. Most of the time R does just fine, and has a lot more libraries that Julia can ever hope to have.

kaitai · on Dec 6, 2020

I have used Julia a bit and really enjoyed it. The only reason I do not use it for work is lack of libraries. I know I could 'be the change I want to see in the world' and contribute, but given the pace of things at work I cannot fit that in on the company dime at this time...

sundarurfriend · on Dec 6, 2020

Could you describe what kind of libraries you found lacking in Julia? I did get a feeling that lots of long-tail stuff was missing, when I was looking through Julia packages some time ago, but only in a vague "this doesn't seem that exhaustive" sense. Knowing what specifically has been found lacking would be useful.

smabie · on Dec 6, 2020

I think the obvious limitations of Python is a big reason, but probably not the main reason why Pandas isn't orthogonal. The reason why Pandas is such an ungodly mess is because it must be, in order to be even halfway efficient. When you do try to compose things, or even have the audacity to use a python lambda or an if statement or whatever, you suddenly suffer a 100x slowdown in performance.

Julia doesn't have these problems, and I've found it so much nicer to use for data analysis. You can even call Python libs, if you really have to.

harryposner · on Dec 6, 2020

Pandas does have the .pipe() method [0], which allows you to put an arbitrary callable in a method chain, but it is a bit more cumbersome than in R.

[0] https://pandas.pydata.org/pandas-docs/stable/reference/api/p...

smabie · on Dec 6, 2020

Except you can't actually use it, because it will kill the performance of your program.