The fastest Statistical Programming Language is …Javascript?

disgruntledphd2 · on April 28, 2012

Urrgh. This came through on my feed earlier today, and I left a comment on it. While javascript is fast (and can be used for many things), the real issue with using it for stats is the lack of libraries. More specifically, as far as I know it cannot interface with Fortran. That's a death knell for any statistical programming language, as it means no LAPACK, and no-one (sane) is going to rewrite all of those linear algebra libraries. So regardless of how fast it is, its not going to make it as a stats language.

That being said, it does make it easier to develop statistically aware web-apps (a particular interest of mine), so that's definitely good.

friggeri · on April 28, 2012

See: https://github.com/NaturalNode/node-lapack (haven't tested it, just to say that it exists).

aufreak3 · on April 28, 2012

One relatively speedy route to getting solid numerics scriptability in JS is to do the heavy lifting in NaCl. An NaCl plugin for node would then let you use the same binaries on both the server and client.

icebraining · on April 28, 2012

What about using https://github.com/rbranson/node-ffi + clapack?

Not that I know what I'm talking about, I've never used Fortran.

olalonde · on April 28, 2012

> no-one (sane) is going to rewrite all of those linear algebra libraries

Is it because it would take a long time or because it's inherently hard?

jules · on April 28, 2012

It's easy enough to do a basic implementation, but getting good numerical stability and good performance is hard (and in Javascript, it's pretty much impossible with current implementations). See also the matrix multiplication benchmarks in the post: JS is 60x slower than Matlab, even though it's already using typed arrays. A naive triply nested for loop in C would probably perform similarly to the JS, which is to say a lot slower than something optimized for the characteristics of the processor (number of registers, vectorized floating point and cache sizes mostly, I'm not sure if Matlab is using multiple cores here).

bmuon · on April 28, 2012

Yes, latest versions of Matlab use multithreading.

tel · on April 28, 2012

There are a lot of them, they're all very picky, detailed inner loops, and they're already written and highly tested and optimized. People rewrite it all the time just to find that their versions are incomplete, slow, and buggy and nobody who wants to use LAPACK has patience for any of those three things.

regularfry · on April 28, 2012

It does mean that there might be mileage in a native tool wrapping LAPACK et al in v8.

tgflynn · on April 28, 2012

I don't know much about javascript implementations. Is there no foreign-function interface available ? If you can interface with C you can interface with Fortran (with a little extra work).

TazeTSchnitzel · on April 28, 2012

There is no foreign-function interface, but you could use Emscripten to compile C libraries for it.

gruseom · on April 28, 2012

So regardless of how fast it is, its not going to make it as a stats language

Why do you assume that JS can never be integrated with LAPACK etc.? That's hardly impossible.

sandGorgon · on April 28, 2012

any opinion about F#/Mono which seemingly does have blas/lapack support ?

melling · on April 28, 2012

KickStarter project?

shocks · on April 28, 2012

Eh... I'm getting tired of this "x is faster than y" business. JavaScript might be fast in these examples, but that doesn't mean it's faster at everything. Comparing programming languages is fruitless. X might be faster than Y at Z, but that does not mean X is better suited than Y for all applications.

JavaScript is faster than MatLAB etc in these examples, but as mentioned already it's slower at matrix multiplication and I'm sure that's just one example.

Does JavaScript have tonnes of libs? Does it have type-checking? Does it have all those other things that I would be desperate for if I was performing important calculations? Can I distribute the computing easily? Etc, etc.

Let's stop comparing programming languages as if they're one tool to do one job. Different programming languages have different applications and are suited for different jobs.

Tichy · on April 28, 2012

The fascinating question is: why is Javascript fast? I suppose it is because of the competition between the browsers. Yay for competition!

arunoda · on April 28, 2012

Of course yes. Competition. And it has a good foundation and solid commercial backing with big fish companies.

plg · on April 28, 2012

What about straight C??? There are many great stats libraries in C (e.g. Apophenia). C is not a hipster language but maybe (like polaroid filters in Instagram) it's time for it to make a "retro" comeback. C kicks ass for speed. Obviously.

tern · on April 28, 2012

I'm waiting for somebody to make a CoffeeScript for C. No semicolons, comprehensions, syntactic sugar for function pointers, etc.

amalter · on April 28, 2012

http://golang.org/

plg · on April 28, 2012

PS http://apophenia.sourceforge.net/

gte910h · on April 28, 2012

Who the hell would use the relatively library less javascript to do analysis?

Sorry, R, Matlab, Python, Syntax, Fortran have actual libraries for this stuff, JS, no.

xtracto · on April 28, 2012

Yeah, I find it kind of funny when people compare a general purpose programming language with a statistical software. In R you have libraries for things like Apprximate Bayesian Computation, parametric and non-parametric statistics, and even neural networks.

Sure, you could achieve the same with a general purpose PL, but you would have to implement everything from scratch.

gte910h · on May 1, 2012

Several general purpose languages (Fortran, Python, and Matlab ) have very nice statistical programming packages at the current date.

mistercow · on April 28, 2012

JS does have a few (like jStat) but they're fairly young. Still, I don't think the article was suggesting that everybody should drop everything and jump on JS for statistics work. But it does raise the question of whether more focus should be put into the development of statistical libraries for JS.

lucian1900 · on April 28, 2012

PyPy is often faster than v8, so a more complete benchmark should include it.

wheaties · on April 28, 2012

Pypy should have direct access to lapack. No need to bring in Pandas. Lapack is just that fast.

jbooth · on April 28, 2012

Fastest for everything but the statistical parts. Not that someone couldn't write the bindings to C for server-side JS, but they haven't.

igorgue · on April 28, 2012

Does performance really matters? I rather have richer libraries (like R has) than performance, since it's impossible to plot for example, all your Apache logs or any other big data problem, you just need a subset of the data and plot them, and with that you don't need a super fast language.

NonEUCitizen · on April 28, 2012

His table shows js is 40x slower on matrix multiplication.

driverdan · on April 28, 2012

If you look at the code it's not using WebWorkers. It's pretty unfair to compare single threaded vs multithreaded. I don't know how JS would perform with better code but it'd certainly be better than 40x slower.

batista · on April 28, 2012

And nearly the same or better in every other area.

platzhirsch · on April 28, 2012

Ergo, JavaScript isn't the fasted language for that matters, because matrix multiplication is too important.

friggeri · on April 28, 2012

Except that the matrix multiplication benchmark uses pure javascript (see the source here: https://github.com/JuliaLang/julia/blob/master/test/perf/per...). I wonder how JS would do with bindings to LAPACK.

And then there is a bias in those benchmarks, see for example the ones for quicksort: in Python they only time the duration of the sort itself, whereas (at least in Julia and JS) they time both the creation of the random array and the time needed to sort it.

simonster · on April 28, 2012

It's not just that. It's that there's no concept of a vector or matrix at all, and no operator overloading to allow these concepts to be introduced into the language in an idiomatic way. You could put these things into a bastardized JavaScript JIT, but that seems at least as awkward as Julia.

btilly · on April 28, 2012

http://coffeescript.org/ has demonstrated how to fix that problem.

If the underlying engine is fast, a more convenient syntax can be introduced.

simonster · on April 28, 2012

I thought about this a little bit, and I don't think this would be very trivial. CoffeeScript is designed to map easily onto JavaScript. A transcompiler that compiles JavaScript with matrix extensions to performant plain JavaScript would likely be significantly more complex than the CoffeeScript transcompiler.

Consider that you want to translate the matrix operation A * B into A.times(B). You have two options:

1) Figure out what's a matrix before runtime, using static type inference. 2) Translate the code into JavaScript that determines whether to treat the code as a matrix at runtime.

In the first case, you don't need a JIT at all. JITs exist largely because you can't do perfect type inference in dynamic languages. If you can do perfect type inference on all acceptable code (a la RPython), or if you require type annotations, you can compile straight to C or machine code.

In the second case, you take a speed hit of 25-50% on scalar operations for the guard, at least in modern versions of SpiderMonkey and V8 (see http://jsperf.com/cost-of-multiplication-via-function).

You can probably get acceptable performance out of combining static type inference with guards. My understanding is that this is what SpiderMonkey does internally. But at this point, it might be easier to integrate your functionality into an existing JIT than to write your transcompiler with type inference, particularly since you will have to implement matrix and vector ops inside the JS engine to achieve acceptable performance anyway.

sycren · on April 28, 2012

or there is no dedicated library for matrix multiplication compared to the other languages..

cassandravoiton · on April 28, 2012

More to the point - who cares. All these languages are hopelessly slow. If performance matters do it in a performant language like C++, C or FORTRAN. If it does not matter - then it does not matter and so stop going on about it.

simonster · on April 28, 2012

No, it does matter. A lot of scientific computation is one-time-use code. What one cares about is the amount of time to write, execute, and debug the code. If it will take you much less time to write the code in a high-level language (which is usually the reason people use high-level languages), it may very well be worth the 2x performance hit from Julia, or even the larger performance hits of MATLAB and R. Additionally, when the amount of time spent performing vector and matrix operations greatly exceeds the amount of time spent in the interpreter, most of these languages will be as fast as C.

I write MATLAB code that takes 5 minutes to run on a regular basis. If I were to write it in C, I would lose productivity, because it would take much more than 5 minutes longer to write. If I were to write it in Julia, it would probably take about the same amount of time to write, but I would hypothetically have the results in a few seconds. That matters.