Microsoft R Client includes the ScaleR package (rx functions and XDF file type) for faster statistics with data up to memory size. This is larger data than R can handle, as computations are done out-of-memory. For no limits on data size, you can use Microsoft R Server, which is also available free as part of Dev Essentials http://blog.revolutionanalytics.com/2016/01/microsoft-r-open... , but only for development use. Microsoft R Client can also initiate computations (of unlimited size) on a remote Microsoft R Server.
The rx algorithms are not part of MSFT R Open, but they are severely restricted in this, so I kinda fail to see the point of having it. Perhaps someone from the team can chime in and explain the reasoning behind this?
I'm guessing it is primarily aimed at getting people used to the interfaces/differences/quirks of using RevoR scaler functions.
Will it do so? I dunno. I'm a fan of simplification. If it's resource limited I'd argue there more motivation in just sticking with the popular base R implementations and not over complicate things or tie your self to a proprietary product. Sas also offers a kind of "free" version cut down/restrictive, and I don't find I use that much either, because the restrictions (virtual machine, non-EG interface) make me rarely ever want to use it.
Both companies have the challenge of how to keep themselves relevant and reach people to teach their wares in a world where they similarly try to restrict access to their products. Revor has some advantage in this sphere, given the availability and use of regular r,but there are implementation quirks and rough edges around the fact that they're trying to get R to do some things it really wasn't designed to do from the ground up.
Christ almighty, why not just use SAS at that point. Ross Ihaka was so disappointed by Revolution's value-subtracted offerings that he disowned the entire R project.
I think he was under the impression that using the GPL (instead of a BSD license) would avoid commercial forks. Personally I don't care, Gentleman doesn't care, and the Revolution people (some at least) are pretty good guys.
But leading with the above would be shitty click bait }:-)
They would avoid proprietary forks (not commercial, that's not the same thing) if they actually defended the damn GPL. Instead, they kept weakening it, kept trying to change the API from GPL to LGPL (and I think they actually did it) and encouraged non-free plugins on CRAN.
The license will need to a better job of protecting work
donated to the commons than GPL2 seems to have done.
All of the above can be done relatively easily by any individual. And like I said, the Revolution guys aren't all bad. It's really just revo R that I find uncompelling. Even that wasn't always true, but Hadley and Rstudio have buried everyone else as far as dev/write up/presentation integrated environments go. Plus I like Hadley and Winston and joe better on a personal level. Nothing against David and his coworkers, they're great too. I just find it easy to connect as a developer and statistician with the Rstudio guys.
I'm not sure what GP means by Ross "disowning the entire R project". Maybe it's a brain fart. But Ross does (if this is indeed him commenting on dbms2.com) want people to not use Revolution Analytics fork:
I think so, at least for my use, but I'm at a large company and I don't see the licensing costs (nor do they directly come out of my budget). It's not to say R is bad, it just takes (me) longer to do similar work.
Of course, we're also a Teradata shop and being able to run R in-database is awfully tempting - but I doubt we'd save any money at that point (even if its not my budget, still have to justify it), and we'd have a ton of stuff to migrate. And we're not really performance bottlenecked right now.
There's a cost tradeoff there, but R has really been awesome in getting really solid statistical and data manipulation software in the hands of individuals and small businesses for free. For instance, I don't have SAS at home - my basement hacking uses R.
What I can say is that SAS allows you to get quicker at visualisations and analysis without actually knowing what you are doing. R requires you to think before and ask the questions you would like to get from the data.
Sas isn't better than R, the two softwares just target different ideal use cases IMO.
R is far more flexible, but interpreted and has some really annoying properties that make production work/reliability really hard.
SAS is a quirky model of computation that limits it in some/many areas, but makes standard operations on rectangular data and data munging/etl on such a breeze. It's compiled. Macros are both a blessing and a curse.
I still think it's better at that use than RevoR, and some of the data munging in R is not great.
Plus, I admit when I saw that RevoR has an rxdatastep or whatever, I admit I was a bit "lolwut?
This is the best answer I've seen, I wish I'd seen it before I wrote mine. Great explanations.
R/S was really meant as "glue" for FORTRAN and C linear algebra libraries (ask Chambers if you don't believe me, and look at how glmnet works). The other stuff is just to get data into a form that can be fed to LAPACK & friends.
You can burn at the stake for that kind of question.
Jokes aside, it's not even close for actual modeling building. SAS has vertical integrations which make it worthwhile for corporations. Things like portfolio risk management, marketing optimization software, intranets, BI, etc., but R is unparalleled for model building.
Naturally, for anything interesting (eg fitting a lasso penalized GLM) SAS (and others) encourage you to simply call out to R. (I will now wait for someone to amuse me by bringing up the misleadingly named PROC in SAS that does L1 regularization... to linear models.)
I can use either, personally I prefer R. Or Python. Or scala. Or really anything that isn't SAS. I feel dirty after using it.
Again, it's sort of like COBOL: it's dying, but until it's dead, there's a lot of money to be made maintaining old macro libraries etc for dinosaur companies.
If anyone brings up the FDA... Just don't do that ok?
You may be waiting for a while. The thing that R has going for it is documentation. The thing it has going against it is the monstrosity of the language implementation.
Julia is nice if you're coming from Matlab. And it sure as hell is more efficient than R or Python. But the libraries just aren't there yet. I went to implement dropout regularization a while back (a year or two ago?) and there were just so many things that I take for granted in R and Python that were completely missing. I mean, yeah it's fun to write your own SGD implementation... Once... Per lifetime...
1. Usage: R has seen continual growth whereas SAS's market share has been on the decline for years. This is based on number of scholarly citations, Google Trends, number of books and blog posts with the software's name in the title, surveys, online forum references, sales volume, use in Kaggle competitions, and some other measures [0] [1]. This is consistent with my anecdotal observations in academia that R tends to be much more popular among young professors and grad students whereas SAS is mostly used by the old guard. Now, this doesn't directly speak to which is "better", but more researchers believe--and this belief is increasing--that R is better suited for their research.
2. Number of packages: While R usage uptake seems to be approximately constant, package growth appears exponential. Across all packages, R has approximately 150 times as many functions as SAS procs and in 2014 alone added more functions than the total number of SAS procs [2].
3. Package distribution: R has CRAN. I'm not aware of any centralized repository or standards for distributing packages developed by the SAS community.
4. Reproducibility: R is free, SAS is expensive and the license has to be renewed. The reproducibility crisis in the medical, biological, and social sciences is exacerbated by proprietary software that locks out other labs without the software from replicating an analysis. This may not be a conern for business, but the cost should be.
5. Scalability and ease of use: R has its quirks and warts as a programming language, but trying to write anything but a small one-off script in SAS is really something else. This is just pure opinion (which I imagine is shared by many others, but I don't have any data to back it up), but try writing a simple FizzBuzz in both and then come back and tell me I'm wrong (I was going to just post examples of both but couldn't even find one for SAS in this massive list! [3]).
6. Data visualization: I also don't have any data for a comparison here, but data visualization is frequently touted as one of the strong points of R. The native plotting is easy and powerful, and then there's the legendary ggplot2.
Companies have to understand that instead of mainly targeting the head buyers / CxOs reps in the companies they have to target the ones that really understand and work with their product. As big company investions have to be justified, CxOs can no longer buy what they want without justification.
They have to sell the quality of their products to the users which will then convince the CxOs who are going to do the investment.