I agree to what Hadley said in some ways. It takes a bit more time to get used to the [i, j, by] notation and I personally feels its unlike most of the R syntax. But I dont see that stopping me from using something as fast as data.table.
ajinkyakale, "harder to learn" doesn't expose the fact that data.table provides so many features that, for example, dplyr just doesn't. And in addition, it is fast and memory efficient.
Rolling joins for example are slightly harder to grasp concept because most of us don't know what a "rolling" join is (unless you work regularly with time series).
Aggregating while joining is hard to grasp not because the syntax is hard, but the concept is inherently new.. It allows us to perform operations in a more straightforward manner, which most embrace after investing some time to understand it.
Binary search based subset, e.g., DT[J(4:6)] is again another concept that's new. One could use base R syntax and use vector scans to subset. But when you learn the difference between vector scans and binary search, you obviously don't want to vector scan. Now we can say that learning the difference between "vector scan" and "binary search" is really hard, but that'd be missing the point.
DT[x %in% 4:6] now internally uses binary search by constructing an index automatically! So you can keep using base R syntax.
And dplyr doesn't have any of these features.
In short, a huge part of "bit more time to get used" is due to data.table introducing concepts that aren't available in other tools/packages for faster and more efficient data manipulation. And I say this as a data.table user turned developer.
"harder to read after writing it" is very very subjective. I don't know what to say to that.
Not this again!
The fact that the article shows only the under developed images (like the over crowded vehicle and the snake charmer) makes me sad. These articles and the tv shows have completely wrong images of india in their mind. You can argue why is it a big deal, but when you see the typecast running in all the articles these days it hurts mostly because it so far from the reality. I have not seen a snake charmer my entire life after staying in the country for 25 years!
The problems he mentions might be true I dont have any experience but I bet its common and holds true for the entire outsourcing business.
Data.table package by matt dowle definitely deserves a mention! Its fast and I like the indexing functonalities it provides. The benchmark timings are pretty impressive.
I should have mentioned you (arun_sriniv) as the co-developer of data.table! Thanks for all the hard work.
And yes, memory usage will be interesting as that is the bottleneck when it comes to large dataset. I am working on something on those lines. Will post something soon :)
I use ijulia which is based on python notebooks. Try juliabox.org which is a notebook (and more) offering by the julia folks.
Then there is juno and julia studio if you are inclined towards an rstudio like interface.
you mentioned questions on scalability et al, answers to those questions are very subjective. I went through a similar experience at one of the biggest tech giants out there ... went something like this - i was in the final face to face interviews and did well in all of those IMHO. It was a big interview day at the company and lot of candidates flew in all over the globe. So all interviews done, we went for lunch in the cafeteria, and before I started I was called for another round of interviews (and couple of other guys were called too). To be honest I hate lunch interviews, I think its cruel! But anyways, I was asked about design and scalability etc. and that was the last interview that day. After a few days I get a rejection email from the HR folks. I was in the same boat as you are today ...