Hacker News new | past | comments | ask | show | jobs | submit login
R packages I wish I'd known about earlier (yhathq.com)
104 points by glamp on Feb 12, 2013 | hide | past | favorite | 15 comments



For some reason, I find it harder to find what I'm looking for in the R help than just about any other language. That's why you end up not knowing about useful packages for a very long time. Is that just me?


There's an R-specific search engine called Rseek: http://www.rseek.org/

It's still not perfect, because if you don't know the right keyword for a concept you might not get what you're looking for. But it's pretty good.


When looking for an R package that does something specific, I usually go to the R package list [1] and search for what I am looking for ("SQL" if I want an SQL connector or "sparse" if I want something with sparse matrix). I generally end up finding a package that does what I need.

[1] http://cran.r-project.org/web/packages/available_packages_by...


It's not just you. This was a huge problem for me at first. When you're googling search "R cran" + your query

You'll get way better results


This article should really just be called "Everyone should just follow Hadley's Github". Actually, someone should write that.


If only all such lists could include example code and visualizations, very nicely done. Thanks!


thanks if you like the visuals, check out Hadley's talk at google http://www.r-bloggers.com/engineering-data-analysis-with-r-a...


If you want to do any sort of stock analysis, quantmod is really great as it bundles a number of other related packages with financial application together: http://www.quantmod.com/.

If you have to deal with directed graphs, iGraph http://igraph.sourceforge.net/.

Related Blog Posts

http://www.r-chart.com/2010/06/stock-analysis-using-r.html

http://www.r-chart.com/2010/06/analyze-twitter-data-using-r....


ddply can be very slow. Strongly recommended to get your data into the form you want in something map-reduce-y _then_ throw it into R for analysis and graphing.


I never cease to be amazed at how people who work with data large enough to make tools like plyr slow assume that everyone works with data like that.

I have been using R daily for 7-8 years now and have only occasionally turned to somethig like data.table for performance reasons. "Big data" receives waaay more attention and hype than there are actual human beings working on data of that scale. I can assure you that for the vast majority of R users world wide plyr is plenty fast enough for their needs.


Sounds like you may have some experience that could assist me. I have six 3.6GB CSV files that I can't even get into R (reading them in just never happens, the program freezes), much less manipulate the data. I've not done any mad-reduce type functions before - is there a tool I can use to leverage that and get the data into R?


data.table deserves a mention here too. It simplifies and speeds up dataframe operations

see ?data.table


Absolutely. The improvements to code and performance are really spectacular.


Caret (http://caret.r-forge.r-project.org/) is another package that should make the list, if you are using machine learning packages in R.

Caret gives you a common interface to you use a huge list of classifiers. Plus, it has some nice functions for data preparation.


The list of database connectors should also include RSQLite:

http://cran.r-project.org/web/packages/RSQLite/index.html

A really useful article though. I'm going to investigate RandomForest.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: