Drop-in GPU Acceleration of GNU Octave

jordigh · on Sept 11, 2014

Sigh, I wish people would address the problem here that nvidia's CUDA runtime is non-free. There is no OpenCL free implementation that I know of either, except Clover, but it currently merely translates the OpenCL to run on the CPU instead of the GPU. I think I've heard of some free CUDA implementations that were in similar embryonic stages.

Octave is a GNU package. GNU's purpose is to ensure that you can use free software. Running to tie Octave to flashy features like GPU acceleration without first pausing to fix the initial problem of non-free GPU acceleration is putting the horse before the cart. This works against Octave's goal, to provide a free alternative to Matlab, one that lets you understand and control your computations down to the hardware level. If we don't emphasise software freedom, then there is no need for Octave, since we already have Matlab. Indeed, it is this very freedom that nvidia is abusing here to accelerate Octave's BLAS libraries, a task that would be much more difficult with Matlab, where they don't have the source code.

I know that nobody wants to even think that this problem exists and even fewer people want to fix Clover because it's such a difficult task, but it's a task that we can't ignore.

Note also that because the GPU libraries are not a system library as defined by the GPL (they are not shipped with the OS), you can't even distribute GPU-accelerated Octave object code. We consider the Octave C++ API to fall well under the domain of the GPL's copyleft.

Personally, as a GNU Octave developer, I am very unhappy that nvidia is using Octave to advertise its hardware and non-free drivers. I am also unhappy that nvidia is luring users to use non-free software, acting against our goals. I reiterate Linus Torvalds's well-known sentiments against nvidia.

jacquesm · on Sept 11, 2014

While I appreciated your principled stance I believe it is more of a graded issue and one with several practical steps in between that are all 'freer' than the previous step.

- matlab + windows + CUDA

- octave + linux + CUDA

- octave + linux + Free OpenCL

So you're halfway there and it could be better but it could also be worse.

Writing a very high performance piece of software against a bunch of trade-secret grade hardware for which the manufacturer already provides a bunch of support for many people is not worth the trouble. As long as Nvidia supports their hardware using CUDA and OpenCL is not going to be brought up to that level this will likely continue.

A positive view would seem to me that this midway position will allow more people to move to Octave which in the longer run might free up some funds somewhere to tackle the problem you mention.

sjtrny · on Sept 12, 2014

Don't forget MATLAB runs on Linux too!

lightcatcher · on Sept 12, 2014

Forget non-free CUDA runtime, there are larger problems with the CUDA toolchain with respect to free software. The actual instructions executed by Nvidia's chips are pretty much undocumented. Nvidia provides great documentation for PTX, which is essentially assembly for a virtual GPU. All tools to convert PTX to SASS (the actual instructions ran on GPU) are proprietary and undocumented. Nvidia does this because it gives them much more freedom to modify the SASS language as well as the chips themselves. The existence of standards-free SASS code (and no open source compiler that can target SASS) makes truly open source GPU computing on Nvidia hardware impossible.

seiji · on Sept 11, 2014

All nvidia did was make an API compatible BLAS library with automatic GPU offloading. Other programs using BLAS functions then get automatically GPU accelerated. Nobody altered octave. They just LD_PRELOAD it in place. The same thing works with R too.

The goal is "have software that works," not "have software that retains ideological purity at the cost of advancement."

aw3c2 · on Sept 11, 2014

That's your goal. In my opinion our goal as society should absolutely be software that everyone can "run, copy, distribute, study, change and improve".

melling · on Sept 11, 2014

That's great. However, we'll all be dead and the job still won't be finished. I'm all for free software but the "Stallman method" is going to take a couple of lifetimes. Being a little practical will address the real problem of expensive software. I've been an Emacs user for 20 years but these days I find myself using Sublime and IntelliJ much more often. I think we're all better off if we also support good commercial and create a competitive environment

RichardFord · on Sept 12, 2014

There is no "our goal as society". But the vast majority of people that care about running Matlab, Octave don't share your goal.

wtallis · on Sept 12, 2014

FYI, Beignet (https://01.org/zh/beignet) is an open-source implementation for Intel GPUs that includes all the required features of OpenCL 1.1.

jordigh · on Sept 12, 2014

Thanks, I did not know about Beignet. This is a step in the right direction.

CGudapati · on Sept 12, 2014

Completely unrelated and Sorry for derailing but isn't it cart before the horse? I mean the horse is always in front of the cart. I am not a native English speaker. Am i missing anything?

graphene · on Sept 12, 2014

You are correct.

dredmorbius · on Sept 12, 2014

FYI, you're correct.

foxhill · on Sept 11, 2014

there is also pocl (http://portablecl.org/), which aims to one day target GPUs (although currently it does not).

the issue with free OpenCL implementations on the GPU is that the runtime will be closely coupled with the device driver, so short of open source drivers, there most likely wont be an OpenCL runtime that uses proprietary drivers.

that said, i believe both open source nvidia and radeon drivers support some level of OpenCL, although it's been a while since i've checked them..

nvidia will do what nvidia will do, they obviously have quite a lot of upstream swimming to do before they realise that CUDA is not the best approach to take.

SixSigma · on Sept 12, 2014

Releases software with plugin architecture.

Get's mad when people use it.

zurn · on Sept 12, 2014

The entire GPU programming landscape is so fragmented and unapproachable now. If they'd only exposed the processors directly and worked to unify behind a common compiler frontend... Imagine a world where a good GPGPU language was a standard GCC/LLVM frontend and all the GPU vendors worked in harmony to improve their back-ends in the upstream compiler. This would enable development of new cross-vendor GPU languages to compete with the ailing OpenCL.

(This is what Apple has done with Metal Shading Language btw, minus the open source part. And what Intel's Larrabee would have enabled.)

Even if you disregard the licensing, the union of NV + AMD + Intel capabilities (software stacks included) is so weak that it almost never worth the effort. The comically bad software from AMD and lack of hardware oomph from Intel means the only option is to require NVidia and build on their software support. This is passable for a narrow sector of activities that can tolerate vendor lock-in (Cuda), like short lived HPC projects.

All this has resulted in lack of open GPU programming languages. OpenCL is better than nothing, but even if the implementations were of usable quality, it's not a good compiler target for higher level languages and it's not a good language to write by hand.

The situation pretty much guarantees GPU computing stays in the fringes for the foreseeable future.

foxhill · on Sept 12, 2014

> The entire GPU programming landscape is so fragmented and unapproachable now.

it's fragmented, true, but it's not that bad.

> If they'd only exposed the processors directly and worked to unify behind a common compiler frontend...

they don't need to, LLVM-SPIR is supposed to enable this - compile kernels to IR and let the runtime JIT it into the GPUs required binary.

> And what Intel's Larrabee would have enabled.

intel did release the larrabee, sort of. it's called the Xeon Phi (and it isn't exactly great..)

> The comically bad software from AMD

AMD have a well deserved reputation, but things aren't that bad now.

> OpenCL is better than nothing, but even if the implementations were of usable quality

i don't know what problems you run into specifically, but most of the runtimes are definitely of usable quality. ironically, it's apple have the worst runtime (but even that isn't so bad).. think of that what you will!

things have improved in the OpenCL space, and they will continue to improve for the foreseeable future. AMD and intel are doing good here, and nvidia actually do support OpenCL. questions of performance portability aside, CL is a pretty good option for people wanting to run on GPUs.

zurn · on Sept 13, 2014

LLVM-SPIR doesn't have any production quality implementations for the popular hardware, nor are any announced. This nonwithstanding, it's only proof of the fragmentation as it's merely one of many vapourware competitors on this sector.

Xeon Phi is not Larrabee the GPU, it's a product of salvage & pivot from the project resulting in a HPC sidecar. (HPC sector I adressed in my original comment).

graphene · on Sept 12, 2014

AMD and ARM are working on something that sounds like what you describe: http://en.wikipedia.org/wiki/Heterogeneous_System_Architectu...

I believe nVidia has a similar effort going (can't remember the name), so it's still not a single agreed standard, but it's moving in that direction I feel.

lightcatcher · on Sept 12, 2014

nVidia's PTX is the most similar thing I can think of (and I know nVidia's GPGPU products pretty well right now). PTX is an ISA for a virtual machine, which makes it similar to HSAIL. PTX doesn't come with a memory model specification though.

However, I don't think either HSA or PTX is very important if nVidia, AMD, and Intel (wrt Xeon Phi's) don't start agreeing on standards. I haven't seen anything to convince me that nVidia wants to work within standards, and CUDA seems to have most of the market share right now, so I'm not too hopefully for portable heterogeneous computing in the near future.

To my grandparent poster: I don't think working with GPUs is that bad right now if you bind yourself to a vendor. In my experience, drinking the CUDA kool-aid (CU-aid?) isn't all that bad.

pandaman · on Sept 12, 2014

I think it's too early to expose processors. The GPU processors are still beyond a thick layer of fixed-function hardware. The good thing is that everybody is working to get rid of it and you can see how more and more HW is replaced with software.

For example, the memory access used to be entirely in the fixed function hardware and the processor could only index buffers that were set up elsewhere. Nowadays it's pretty common to have full access to memory from the processor itself.

In a few years we will get GPUs that will be able to run compute tasks entirely in software (the graphics will most likely remain fixed-function for much longer) and then exposing the GPU ISA will make much more sense.

osivertsson · on Sept 11, 2014

We choose GNU Octave, since it supports single precision which the GK104 we’ll be running performs best on.

Is single-precision computations useful in the context where GNU Octave is generally used?

valarauca1 · on Sept 11, 2014

Single vs Double precisions is fine.

Most of Matlab calculations are homework or simulations. Generally speaking the real world will add more errors then double floating points will reduce.

Honestly GNU Octave has been a surprisingly fast developing product. Matlab only got GPU acceleration 2 years ago (according to my buddy who uses Matlab) (which is very fast for a GNU project). A lot of academics are picking up on it, even professors recommending it.

I think the most glowing recommendation I heard was, "Well its free, and free goes along way when you're living on a research stipend."

wtallis · on Sept 11, 2014

Double-precision is a good default for a tool like Octave, but I agree that single-precision can be plenty accurate for a lot of uses. It's often just not worth the added time and code complexity to determine where single-precision should be used.

maaku · on Sept 11, 2014

Depends very much on the algorithm. Some are sensitive to rounding precision, others are not.

mjcohen · on Sept 11, 2014

Note that GNU Octave, the real thing, is available for Android along with a number of its toolboxes: https://play.google.com/store/search?q=octave&c=apps

bane · on Sept 11, 2014

For even more math fun, Maxima is also on Android.

https://play.google.com/store/apps/details?id=jp.yhonda&hl=e...

lightcatcher · on Sept 12, 2014

Single precision is about 2.5x faster than double precision on current GPUs (for matrix multiplication, which is compute dominated). It really depends on the application, but in my experience more often than not the only reason you are using GPUs is because you want to squeeze out every last ounce of performance. In these cases, single precision makes a lot of sense (assuming you're algorithm doesn't depend heavily on 64 bits of precision).

Within the neural nets community, single precision is almost always used (at least on GPUs).

foxhill · on Sept 11, 2014

it could very easily run double precision, but then they'd have to use their tesla cards to run it, and at several thousand dollars a piece, wont be quite as attractive to the user..

wtallis · on Sept 12, 2014

Consumer GPUs can do double-precision. At the high end, consumer parts usually (though not always) get crippled performance relative to their pro counterparts, but for the low and mid-range chips even the workstation cards have lackluster double-precision performance.

foxhill · on Sept 12, 2014

well, yes, technically correct, but a CPU has much better double precision performance than that.

1/24th (or 1/32th in maxwell) of single precision performance doesn't really sell the usefulness of CUDA.

gh02t · on Sept 12, 2014

Uh most recent Nvidia GPUs support double precision, albeit at a speed penalty.

And they are using a Tesla card in the article. It says they are using a K10.

foxhill · on Sept 12, 2014

of course they all can do double, but at 1/24th of the speed of single. (1/32th in maxwell)

they use a GK104 based tesla, i.e a "dual GPU" GTX680, so it has no real double precision. unless you consider 2x 95GFlops noteworthy.

cieyo9938y88 · on Sept 12, 2014

I dont know much, but I REMEMBER the finger gesture of Linus or the LINUX founder about no no vidia and the big hassles on no no vidia drivers for BSD and FreeBSD. Python, MATLAB and closed source?? - Using Haskell on toy examples can get FUNNY SUPRISE. FUNNY SURPRISE. FUNNY SURPRISE.

How about connecting the deep secret GPU to the no function Intel chip TSX transaction and one safe thread, while observing the side channels and effects for better GUESSING GAMES and understanding PSEUDO or fake random numbers?

codygman · on Sept 12, 2014

> Using Haskell on toy examples can get FUNNY SUPRISE.

Funny surprises such as?