Postgres on the GPU

cs702 · on April 23, 2013

If I understood correctly, this module allows you to export data to, and then access (read-only), foreign tables architected specifically for faster copying of data to/from GPU, allowing you to speed up queries that benefit from GPU computation by 10-20x. Nice.

On the surface, this seems very similar to https://news.ycombinator.com/item?id=5592886 except it's nicely integrated with Postgres. Or am I missing something?

tmostak?

masklinn · on April 23, 2013

> this module allows you to export data to, and then access (read-only), foreign tables architected specifically for faster copying of data to/from GPU

That seems to be pretty much it, although the read-only part is a limitation of Postgres rather than a feature of the module.

Furthermore, it seems to be dispatching queries intelligently so you can perform all queries against the FDW table with (I hope) minimal overhead, if the qualifier can't be compiled to a kernel the fdw will run it as a normal on-CPU qualifier. That's a thoughtful touch.

rektide · on April 23, 2013

PgOpenCL takes a different approach- instead of a Foreign Data Wrapper (FDW) that exposes tables, it's a language for writing postgres functions in, where the language is just opencl. http://www.slideshare.net/3dmashup/pgopencl

Alas, Tim's been talking about this for two years now, and as far as we know he's the only one whose ever seen the code.

Rickasaurus · on April 24, 2013

Very cool. This should be upvoted more.

hippich · on April 23, 2013

NVidia's CUDA only (for now?)

Anyone can explain why opensource projects embrace CUDA over OpenCL? As I understand OpenCL is more generic API which could be potentially used with CPUs and GPUs.

dljsjr · on April 23, 2013

There are probably a lot of factors. I worked on CUDA code for around a year, and used to understand the landscape pretty well, but if I were to start a high-performance computing project today I'd probably take my lumps and go with OpenCL. There would be a lot of lumps.

Firstly, CUDA is just more mature; there is a very large and well-established set of libraries for a lot of common operations, there is a decent sized community, and Nvidia even produces specialized hardware (Tesla cards) designed just for CUDA.

Second, all that generic-ness of OpenCL doesn't come for free. With Nvidia, you're just working with one architecture; CUDA cards. Optimizing your kernels is much easier. OpenCL is just generically parallel, so you could have any sort of crazy heterogeneous high-performance computing environment you have to fiddle with (any number of CPU's with different chipsets and any number of GPU's with different chipsets).

I haven't used OpenCL myself, but almost purely anecdotally I have heard many people say that CUDA is often slightly faster[1] and the code is easier to write.

TL;DR: CUDA sacrifices flexibility for ease of development and performance gains. OpenCL wants to be everything for everyone, and comes with the typical burdens.

[1]: Maybe this is a result of OpenCL being more generic and so harder to optimize.

smosher · on April 23, 2013

So why would you go with OpenCL? Is it portability? For a while I thought that was worth it but I am having a hard time remaining convinced of that.

cronin101 · on April 23, 2013

I've been working on a rather large computation library using OpenCL. OpenCL is useful for providing an abstraction over multiple device types. If you are only interested in producing highly-tuned parallel code to execute on NVidia hardware, I suggest sticking to CUDA for the above reasons.

I utilised the OpenCL programming interface to write code that would run the same kernel functions on CPU and/or GPU devices (using heuristics to trade-off latency/throughput) which is something that is not possible afaik using the CUDA toolchain.

TL;DR YMMV and horses for courses.

apaprocki · on April 23, 2013

FYI regarding highly-tuned code -- An ex ATI/AMD GPU core designer told me that the price you pay for writing optimized code in OpenCL versus the device specific assembler is roughly 3x. Something to keep in mind if you're targeting a large enough system to OpenCL and you find spots that can't be pushed any faster.

cronin101 · on April 23, 2013

Unlike previous versions, OpenCL 2.0 been shown to only be about 30%[1] slower than CUDA and can approach comparable performance given enough optimisation.

Since I am working on code generation of Kernels to perform dynamic tasks, I can't afford to write at the lowest level available. (I'm accelerating Python/Ruby routines though so OpenCL gives a significant bonus without much pain at all.)

[1] http://dl.acm.org/citation.cfm?id=2066955 (Sorry about the paywall, I access through University VPN)

DiabloD3 · on April 23, 2013

OpenCL is a standards compliant compute API that is supported by Nvidia, AMD, Intel, IBM, Sony, Apple, and several other companies.

Nvidia is in the slow process of eventually discontinuing further CUDA support, and it is recommended to write new code in OpenCL only.

qb45 · on April 23, 2013

> Nvidia is in the slow process of eventually discontinuing further CUDA support, and it is recommended to write new code in OpenCL only.

[Citation needed]

Their OpenCL support is still limited to v1.1 (released in 2010), while just few months ago they've released a new major version of CUDA with tons of features nowhere to be seen in (any vendor's) OpenCL.

smosher · on April 23, 2013

Yeah, you're going to have to back that up. CUDA is meant to be supported on all future nVidia cards.

pjmlp · on April 24, 2013

How come, given that only CUDA has direct support for C++ and FORTRAN compilers that target the GPU?

m_mueller · on April 24, 2013

Furthermore Python[1], Matlab[2], F#[3]. Furthermore parallel device debuggers (TotalView, Allinea), profilers (NVIDIA). There's a long way for OpenCL to catch up, if ever (because there might be a better standard coming further down the line).

[1] http://www.techpowerup.com/181585/NVIDIA-CUDA-Gets-Python-Su...

[2] http://www.mathworks.com/discovery/matlab-gpu.html

[3] https://www.quantalea.net/media/pdf/2012-11-29_Zurich_FSharp...

dljsjr · on April 23, 2013

Portability, accessibility, etc. "Because it's hard" is never a good excuse to not do something.

jamesaguilar · on April 23, 2013

> "Because it's hard" is never a good excuse to not do something.

Well that's certainly not true in the general case.

dljsjr · on April 23, 2013

On the contrary, I'd argue that it's not true in specific cases.

"Because it's hard" is a cop-out.

"It's too hard to accomplish given constraint [X]" where X is a deadline, financial constraints, or other real/tangible resource limitations might be one thing. But if you're working on your own timeline on some sort of open-source project, or there is nothing external preventing you from acquiring the expertise/resources to conquer the hard problem, then "Because it's hard" is an absolutely shitty excuse to not do something.

jamesaguilar · on April 23, 2013

I suggest you read "It's too hard" when written by other developers as, "It's too hard [given that I spend N hours a week on this and would rather actually accomplish something in the next two months than learn the 'right' API]." Or, "It's too hard [given various constraints that I'm not going to explain to you but are valid to me.]" It'll save you having to give speeches about shitty excuses.

That said, if it makes sense for your project, make it happen! :)

robryan · on April 23, 2013

Even if the long term goal is more portable GPU support it still makes some sense to get a CUDA implementation up first if it is easier to get to. It then allows real world testing faster, can always go to openCL later once they know more.

redtuesday · on April 24, 2013

Just out of curiosity: how often did you see that happen (not only related to GPU's, but technologie decisions overall)? In my (little) experience the change at a later moment will not happen. Most of the time because the management has a new idea/project which you have to attend to.

smosher · on April 23, 2013

It's often a great excuse to do something else instead. If you can't get what you need done without the more difficult option, sure do it. But there's no sense in going down the harder path needlessly.

I'm not trying to convince you you don't need it or shouldn't do it, I was looking for a datapoint about what you find valuable in OpenCL.

rprospero · on April 23, 2013

Well, the portability can be a killer feature. I've been writing quite a bit of OpenCL code lately. I have an AMD GPU, so CUDA is a non-starter. I'll eventually replace the AMD card with an NVIDIA one, so it won't be as big of a problem, but my OpenCL code will still be fine then.

stonemetal · on April 23, 2013

CUDA code is GPU only, OpenCL can run on both CPU and GPU. There are limits so it isn't all win, but it means you don't have to implement twice.

metrix · on April 23, 2013

You go with OpenCL so that you can use AMD's Fusion processor that will start will soon allow a GPU and CPU to share main memory.

fiatmoney · on April 24, 2013

You can do this already, and also with Intel's Ivy Bridge (at least on Windows).

pjmlp · on April 24, 2013

It is the same story as OpenGL vs DirectX, it is all about the support the developers get from the vendors.

fixxer · on April 23, 2013

You're preaching to the choir. Looks like NEC funded this.

NVidia seems to be the preferred hardware for institutions/big companies. I'm not sure if this is because NVidia's architecture is better for supercomputers or if they're simply better at marketing to those types of customers

NVidia funds a lot of academics in my space, and I've found academia to be very anti-open source for those reasons, which amuses me greatly.

Case in point, Matlab. Why is this taught in a world with Python/Numpy/MatplotLib?

mjn · on April 23, 2013

Matlab seems like inertia/culture to me: it's the longtime de-facto standard in engineering. Since it's what everyone uses, it's got packages for everything, and papers will often come with prototype Matlab implementations. Roughly like the cultural position R holds in statistics. Matlab's hold on engineering is also bolstered by its widespread use in industry: students want to learn it, because it's what their future employers use, and professors / research scientists like to use it because it's what their industrial collaborators use.

In my area of CS (artificial intelligence) it seems considerably less popular. I don't really remember how to use it, since the last time I used it seriously was in some engineering (but not CS) courses in undergrad.

eru · on April 23, 2013

In addition, the MathWorks have so far managed not too screw up too badly and are keeping Matlab up to date. (They are definitely quite nice as an employer.)

fixxer · on April 23, 2013

I've heard this argument a lot (since 1999), but I'm not confident it holds true anymore. The free alternatives are so good.

You may be right regarding Professors, but that is also changing as they age out.

mjn · on April 23, 2013

Could be; we don't use Matlab much in my own research area, so recent change could've happened under my radar. When I've occasionally had contact with engineers in industry, though, Matlab still seemed to be everywhere. The most recent two examples were someone doing DSP, and someone doing mechanical engineering, and both had all their stuff built on top of Matlab+Simulink.

frozenport · on April 23, 2013

Matlab knows how to control numerical precision and many algorithms produce the best results when running on that platform.

In the world of electrical engineering Matlab can do things that other packages can't.

From personal experience, I have spent many hours looking at the results of a atan2 function in C++ and Matlab and trying to get them to agree. After a day of work I was able to get them to agree by precisely controlling the rounding modes and using my own atan2 function. This was not fun, and I would rather give somebody 1k to take care of it for me.

smosher · on April 23, 2013

NVidia funds a lot of academics in my space, and I've found academia to be very anti-open source for those reasons, which amuses me greatly.

Wait, are you saying academia is anti-opensource because of nVidia funding?

fixxer · on April 23, 2013

I'm saying there is a systematic advantage to using proprietary technologies in academic research (companies have money, so you can write a grant and they will pay you $). Case in point, look at apps coming out of academia and you'll see a lot of WindowsPhone. It is because Microsoft gives away a ton of free phones (I have one on my desk at this moment) and Azure time.

smosher · on April 23, 2013

Ok, that tracks better than it being something nVidia in particular did.

I don't see a serious problem with the scenario you describe, though. You're not really describing a hostile scenario, just an affinity for commercial software.

There is a bit of a problem of course; I find a lot of papers that describe how to do things with commercial technology that isn't in the budget. That hasn't been insurmountable for me in any way, but maybe others have had more serious problems with it.

ludwigvan · on April 23, 2013

The free alternatives are not so good for beginners. One of MATLAB's main strengths is the embedded editor + repl, whereas with Python/Numpy/Matplotlib stack, there are just too many moving parts. The environment of MATLAB can be emulated with ipython workbook or emacs, but I don't believe it is easy enough for beginners.

lrem · on April 23, 2013

Are you aware of Sage [1]? It's a Python-based, batteries-included, integrated maths system. It is actually more popular than Matlab around the lab here. Incidentally, we don't get much funding from corporations (but yes, we have licenses for Matlab, Maple and Mathematica for everyone, just somehow Sage is more popular).

[1]: http://sagemath.org/

apaprocki · on April 23, 2013

One reason why lots of things typically use CUDA is because NVidia makes datacenter rackmounted GPU gear. So there is no need to run generically if that is the only available "production" hardware.

nwhitehead · on April 23, 2013

The fun part of parallel programming is getting things running on your GPU, parallelizing the algorithm, then tuning and optimizing the code. This is easier, faster, and more pleasant in CUDA with it's mature tools and ecosystem. That's why open source projects often use CUDA.

The advantage of OpenCL is that it runs on more platforms (not just NVIDIA). The problem is that it's more complicated and more of a headache.

My advice to programmers is to start with CUDA and play around with your problem for a while. Time spent learning how GPUs work, what kinds of operations are efficient, and how to parallelize algorithms is not wasted if you switch to OpenCL later. Once you've made some progress then make an informed decision about whether you want to go to production with CUDA or OpenCL.

guard-of-terra · on April 23, 2013

Another question, is CUDA actually fast enough?

Because in bitcoin mining, it's always ATI/AMD and OpenCL because ATI cards are like ten times faster than Nvidia cards. This is because of architecture differencies.

Does it not affect this postgres table-scanning task? I wonder if they did any benchmarks.

duskwuff · on April 23, 2013

AMD's advantage in Bitcoin mining was purely due to an architectural quirk: their shader cores supported bitwise rotation, but Nvidia's didn't. Bitwise rotation is a rare instruction outside of certain crypto algorithms (like SHA256!), so this really means very little for general-purpose performance.

http://www.extremetech.com/computing/153467-amd-destroys-nvi...

mrb · on April 23, 2013

"AMD's advantage in Bitcoin mining was purely due to an architectural quirk"

False. I authored a Bitcoin miner utilizing this quirk (bit_align). I was also the first to leverage another instruction exclusive to AMD (bfi_int): https://bitcointalk.org/?topic=2949 bit_align "only" gave AMD a 1.7x advantage over Nvidia. The biggest perf gains (2x-3x!) came from the fact AMD has more execution units: https://en.bitcoin.it/wiki/Why_a_GPU_mines_faster_than_a_CPU... (I also authored this section of the wiki).

guard-of-terra · on April 23, 2013

Scrypt too? Because Nvidia performance on scrypt is bad either.

jacquesm · on April 23, 2013

Don't use hashing/bitcoinmining/password cracking as your benchmark if that is not your workload.

jtc331 · on April 23, 2013

This is because the hashing algorithm is highly dependent on integer rotate right instruction. AMD's implements it in 1 clock cycle, Nvidia in 3. So it's a special case.

fiatmoney · on April 23, 2013

NVidia cards do not efficiently implement bit rotation, while AMD cards do, and it happens to be the core part of the SHA algorithm used for bitcoin. In general for an arbitrary task they're fairly close in performance.

mrb · on April 23, 2013

AMD cards dominate Nvidia in pretty much all password hash bruteforcing algos, even those that do not rely on bit rotation (bit_align). See http://golubev.com/gpuest.htm for example. It is true that another instruction helps in more cases (bfi_int which I talked about at http://blog.zorinaq.com/?e=43) but in general, AMD cards have a lot more raw integer and floating point compute resources (execution units) than Nvidia cards.

fiatmoney · on April 24, 2013

They're also moving away from the VLIW design with each successive generation, partly to better support more flexible GPGPU approaches.

babas · on April 23, 2013

That is really interesting. I had to look it up.

https://en.bitcoin.it/wiki/Mining_hardware_comparison

The fastest Nvidia showing is the Tesla S2070 which is a $18k Server with 8 GPU's! It can just barely keep up with a slightly over clocked single gpu HD7970.

manvsmachine · on April 23, 2013

I'm not a miner, so correct me if I'm wrong, but that seems like a bit of an apples-to-oranges comparison. Tesla cards (and the servers designed around them) are intended for specific use cases: mission-critical enterprise solutions and scientific HPC. As a result, they run slower processor and memory speeds in comparison to nVidia's own consumer products, use ECC memory, and are optimized for double-precision over single-precision performance. Mining with a Tesla is like gaming with a Quadro card.

mprovost · on April 23, 2013

There is nothing mission critical or 'enterprise' about Tesla/Fermi cards. You can crash them and lock up your whole machine. Even if you can reboot the OS the card may not respond and the rebooted OS won't see it, we sometimes have to physically shut the machine down to reset the Nvidia card. Nvidia is still a gaming company at heart and it's going to take a while for them to adjust to providing equipment that is meant to be reliable and not just fast.

tlarkworthy · on April 23, 2013

there are matrix multiplication routines developed for CUDA. OpenCL you would have to do everything from the ground up. So Nvidia gave everyone a head start for numerical computation, and that edge has ever since snowballed.

sanxiyn · on April 24, 2013

You can use http://viennacl.sourceforge.net/ for OpenCL.

eelsen · on April 23, 2013

Try programming in both CUDA and OpenCL and see which one you would choose.

homerowilson · on April 23, 2013

Nvidia uses ECC RAM on their GPU compute cards, an important consideration for serious HPC computing.

wtallis · on April 24, 2013

No, they don't. They implement ECC by re-purposing some of the existing RAM to hold the parity data. Enabling ECC reduces the usable amount of memory, and also can hurt performance. It offers some improved reliability, but it's nothing like a real server-grade memory system.

jacquesm · on April 23, 2013

CUDA is a lot more mature and easier to program for. OpenCL likely has the future, but it takes more work to set it up and if you have Nvidia cards it is harder to optimize.

potkor · on April 23, 2013

Nvidia software has always been much better than AMD. AMD on Linux is a complet disaster. So if you work on GPU it's natural to go Nvidia.

papsosouid · on April 23, 2013

>AMD on Linux is a complet disaster.

So is nvidia. Both companies produce absolutely horrible drivers. Not just for linux either, tons of bluescreens, crashes and other windows instability issues are video driver bugs. That is what happens when the sole concern is speed, and stability is totally ignored.

cmccabe · on April 24, 2013

I have to agree with the grandparent. AMD's graphics drivers for Linux are a complete disaster. NVidia's are only a partial disaster. And sometimes that's the best you get.

tkahn6 · on April 23, 2013

From my experience porting CUDA code to OpenCL code, CUDA is much cleaner and more succinct since it is able to assume a lot about the underlying hardware.

rarrrrrr · on April 23, 2013

Brilliant!

Perhaps it's just my scars showing, but I'm concerned about database system stability with active GPU hardware added to the box.

I probably wouldn't add the GPU hardware to a master but rather do the queries that would benefit from it on a streaming replica, where an occasional kernel panic won't be so severe.

Regardless, looking forward to trying it.

jonknee · on April 23, 2013

It's read only...

oakwhiz · on April 23, 2013

I think that he is referring to the tendency of GPU drivers to crash. Even if the DB is read-only from the GPUs' standpoint, if the DB goes down because of faulty drivers it's still a problem.

NuZZ · on April 24, 2013

On the flip side, if this is sufficiently adopted, it could present motivation to driver developers, thus improved drivers. Perhaps this and the linux gaming movement could mean some symbiosis for driver development.

Ignoring Windows as I guess I don't really take Windows servers too seriously.

metrix · on April 23, 2013

Postgres on the GPU just screams HSA http://hsafoundation.com and more importantly AMD and their upcoming Kaveri processor. which will significantly reduce the latency of performing GPU calculations since the CPU and GPU will share the same cache.

DigitalTurk · on April 23, 2013

Really, really cool!

I've toyed with the idea of doing pattern matching (and graph rewriting) on the GPU before but this looks like it's much more advanced than I thought was feasible.

I'm surprised they went with CUDA instead of OpenCL though. CUDA is proprietary NVidea technology and does not work for non-NVidea devices.

perlgeek · on April 23, 2013

All the examples seem to use numbers (integers and floats). It would be interesting to see if it can work efficiently with variable-width strings, which is the main workload that I encounter. But even if not, I see the value working with lots of data.

hippich · on April 23, 2013

GPU is good at calculating stuff (nVidia in particular at calculating floats.) I do not see reasons to do text stuff with GPU

qb45 · on April 24, 2013

GPUs had been used for some kind of fuzzy string matching [1] and worked well thanks to their huge memory bandwidth.

[1] http://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorith...

jaakl · on April 24, 2013

The query sample is interesting: it seems to be find objects (locations) nearby given point. But there is CPU-only solution to index the data properly and have 1000x faster response times: PostGIS. So if before you threw more CPU/RAM if you did not know how to make things faster in smart way, then now you throw more GPU. Anyway, for sure there can be cases where GPU -based solution could be better alternative than using traditional existing solutions like specialized indexes.

spitfire · on April 23, 2013

I notice the GPU load time is about 53ms. This is a discrete graphics card so I do wonder how an integrated APU will affect this. I can imagine the overhead there being virtually nil.

The long term trend is for the GPU to merge with the CPU, so I think we'll see more of this in the future.

agnsaft · on April 23, 2013

This is cool, however, Postgres could probably achieve higher performance without the GPU as well if they added concurrency on the CPU for certain type of operations (e.g. aggregation, sorting, etc). That would be a killer feature.

kyhoolee · on April 25, 2013

interesting, I'm also currently working with GPU

DiabloD3 · on April 23, 2013

I don't understand why people keep writing new code in legacy hardware that most people don't own.

Its written in CUDA, not OpenCL. Stop that.

ketralnis · on April 23, 2013

Yeah, why do people keep solving the problems they have rather than the problems you wish they had? Bastards.