ajinkyakale's comments

ajinkyakale · on Dec 31, 2014

I agree to what Hadley said in some ways. It takes a bit more time to get used to the [i, j, by] notation and I personally feels its unlike most of the R syntax. But I dont see that stopping me from using something as fast as data.table.

arun_sriniv · on Dec 31, 2014

ajinkyakale, "harder to learn" doesn't expose the fact that data.table provides so many features that, for example, dplyr just doesn't. And in addition, it is fast and memory efficient.

Rolling joins for example are slightly harder to grasp concept because most of us don't know what a "rolling" join is (unless you work regularly with time series).

Aggregating while joining is hard to grasp not because the syntax is hard, but the concept is inherently new.. It allows us to perform operations in a more straightforward manner, which most embrace after investing some time to understand it.

Binary search based subset, e.g., DT[J(4:6)] is again another concept that's new. One could use base R syntax and use vector scans to subset. But when you learn the difference between vector scans and binary search, you obviously don't want to vector scan. Now we can say that learning the difference between "vector scan" and "binary search" is really hard, but that'd be missing the point.

DT[x %in% 4:6] now internally uses binary search by constructing an index automatically! So you can keep using base R syntax.

And dplyr doesn't have any of these features.

In short, a huge part of "bit more time to get used" is due to data.table introducing concepts that aren't available in other tools/packages for faster and more efficient data manipulation. And I say this as a data.table user turned developer.

"harder to read after writing it" is very very subjective. I don't know what to say to that.

ajinkyakale · on Dec 25, 2014

Not this again! The fact that the article shows only the under developed images (like the over crowded vehicle and the snake charmer) makes me sad. These articles and the tv shows have completely wrong images of india in their mind. You can argue why is it a big deal, but when you see the typecast running in all the articles these days it hurts mostly because it so far from the reality. I have not seen a snake charmer my entire life after staying in the country for 25 years! The problems he mentions might be true I dont have any experience but I bet its common and holds true for the entire outsourcing business.

ajinkyakale · on Dec 25, 2014

Then why the india in the title?

dominotw · on Dec 26, 2014

specifically?

ajinkyakale · on Dec 25, 2014

Data.table package by matt dowle definitely deserves a mention! Its fast and I like the indexing functonalities it provides. The benchmark timings are pretty impressive.

arun_sriniv · on Dec 30, 2014

@ajinkyakale, thanks. What'd be also interesting is to benchmark memory usage in addition to runtime.

ajinkyakale · on Dec 31, 2014

I should have mentioned you (arun_sriniv) as the co-developer of data.table! Thanks for all the hard work. And yes, memory usage will be interesting as that is the bottleneck when it comes to large dataset. I am working on something on those lines. Will post something soon :)

arun_sriniv · on Dec 31, 2014

No worries :-). And glad to hear you're working on it! Let me know if I can be of any help.

ajinkyakale · on Dec 25, 2014

I use ijulia which is based on python notebooks. Try juliabox.org which is a notebook (and more) offering by the julia folks. Then there is juno and julia studio if you are inclined towards an rstudio like interface.

ajinkyakale · on Dec 24, 2014

Ofcourse you can! I dont know why I didnt think of that

ajinkyakale · on Dec 12, 2014

google photos is pretty neat and space expansion is not expensive either

ajinkyakale · on April 20, 2014

you mentioned questions on scalability et al, answers to those questions are very subjective. I went through a similar experience at one of the biggest tech giants out there ... went something like this - i was in the final face to face interviews and did well in all of those IMHO. It was a big interview day at the company and lot of candidates flew in all over the globe. So all interviews done, we went for lunch in the cafeteria, and before I started I was called for another round of interviews (and couple of other guys were called too). To be honest I hate lunch interviews, I think its cruel! But anyways, I was asked about design and scalability etc. and that was the last interview that day. After a few days I get a rejection email from the HR folks. I was in the same boat as you are today ...