Hacker News new | past | comments | ask | show | jobs | submit login
Postgres on the GPU (postgresql.org)
397 points by Rickasaurus on April 23, 2013 | hide | past | favorite | 77 comments



If I understood correctly, this module allows you to export data to, and then access (read-only), foreign tables architected specifically for faster copying of data to/from GPU, allowing you to speed up queries that benefit from GPU computation by 10-20x. Nice.

On the surface, this seems very similar to https://news.ycombinator.com/item?id=5592886 except it's nicely integrated with Postgres. Or am I missing something?

tmostak?


> this module allows you to export data to, and then access (read-only), foreign tables architected specifically for faster copying of data to/from GPU

That seems to be pretty much it, although the read-only part is a limitation of Postgres rather than a feature of the module.

Furthermore, it seems to be dispatching queries intelligently so you can perform all queries against the FDW table with (I hope) minimal overhead, if the qualifier can't be compiled to a kernel the fdw will run it as a normal on-CPU qualifier. That's a thoughtful touch.


PgOpenCL takes a different approach- instead of a Foreign Data Wrapper (FDW) that exposes tables, it's a language for writing postgres functions in, where the language is just opencl. http://www.slideshare.net/3dmashup/pgopencl

Alas, Tim's been talking about this for two years now, and as far as we know he's the only one whose ever seen the code.


Very cool. This should be upvoted more.


NVidia's CUDA only (for now?)

Anyone can explain why opensource projects embrace CUDA over OpenCL? As I understand OpenCL is more generic API which could be potentially used with CPUs and GPUs.


There are probably a lot of factors. I worked on CUDA code for around a year, and used to understand the landscape pretty well, but if I were to start a high-performance computing project today I'd probably take my lumps and go with OpenCL. There would be a lot of lumps.

Firstly, CUDA is just more mature; there is a very large and well-established set of libraries for a lot of common operations, there is a decent sized community, and Nvidia even produces specialized hardware (Tesla cards) designed just for CUDA.

Second, all that generic-ness of OpenCL doesn't come for free. With Nvidia, you're just working with one architecture; CUDA cards. Optimizing your kernels is much easier. OpenCL is just generically parallel, so you could have any sort of crazy heterogeneous high-performance computing environment you have to fiddle with (any number of CPU's with different chipsets and any number of GPU's with different chipsets).

I haven't used OpenCL myself, but almost purely anecdotally I have heard many people say that CUDA is often slightly faster[1] and the code is easier to write.

TL;DR: CUDA sacrifices flexibility for ease of development and performance gains. OpenCL wants to be everything for everyone, and comes with the typical burdens.

[1]: Maybe this is a result of OpenCL being more generic and so harder to optimize.


So why would you go with OpenCL? Is it portability? For a while I thought that was worth it but I am having a hard time remaining convinced of that.


I've been working on a rather large computation library using OpenCL. OpenCL is useful for providing an abstraction over multiple device types. If you are only interested in producing highly-tuned parallel code to execute on NVidia hardware, I suggest sticking to CUDA for the above reasons.

I utilised the OpenCL programming interface to write code that would run the same kernel functions on CPU and/or GPU devices (using heuristics to trade-off latency/throughput) which is something that is not possible afaik using the CUDA toolchain.

TL;DR YMMV and horses for courses.


FYI regarding highly-tuned code -- An ex ATI/AMD GPU core designer told me that the price you pay for writing optimized code in OpenCL versus the device specific assembler is roughly 3x. Something to keep in mind if you're targeting a large enough system to OpenCL and you find spots that can't be pushed any faster.


Unlike previous versions, OpenCL 2.0 been shown to only be about 30%[1] slower than CUDA and can approach comparable performance given enough optimisation.

Since I am working on code generation of Kernels to perform dynamic tasks, I can't afford to write at the lowest level available. (I'm accelerating Python/Ruby routines though so OpenCL gives a significant bonus without much pain at all.)

[1] http://dl.acm.org/citation.cfm?id=2066955 (Sorry about the paywall, I access through University VPN)


OpenCL is a standards compliant compute API that is supported by Nvidia, AMD, Intel, IBM, Sony, Apple, and several other companies.

Nvidia is in the slow process of eventually discontinuing further CUDA support, and it is recommended to write new code in OpenCL only.


> Nvidia is in the slow process of eventually discontinuing further CUDA support, and it is recommended to write new code in OpenCL only.

[Citation needed]

Their OpenCL support is still limited to v1.1 (released in 2010), while just few months ago they've released a new major version of CUDA with tons of features nowhere to be seen in (any vendor's) OpenCL.


Yeah, you're going to have to back that up. CUDA is meant to be supported on all future nVidia cards.


How come, given that only CUDA has direct support for C++ and FORTRAN compilers that target the GPU?


Furthermore Python[1], Matlab[2], F#[3]. Furthermore parallel device debuggers (TotalView, Allinea), profilers (NVIDIA). There's a long way for OpenCL to catch up, if ever (because there might be a better standard coming further down the line).

[1] http://www.techpowerup.com/181585/NVIDIA-CUDA-Gets-Python-Su...

[2] http://www.mathworks.com/discovery/matlab-gpu.html

[3] https://www.quantalea.net/media/pdf/2012-11-29_Zurich_FSharp...


Portability, accessibility, etc. "Because it's hard" is never a good excuse to not do something.


> "Because it's hard" is never a good excuse to not do something.

Well that's certainly not true in the general case.


On the contrary, I'd argue that it's not true in specific cases.

"Because it's hard" is a cop-out.

"It's too hard to accomplish given constraint [X]" where X is a deadline, financial constraints, or other real/tangible resource limitations might be one thing. But if you're working on your own timeline on some sort of open-source project, or there is nothing external preventing you from acquiring the expertise/resources to conquer the hard problem, then "Because it's hard" is an absolutely shitty excuse to not do something.


I suggest you read "It's too hard" when written by other developers as, "It's too hard [given that I spend N hours a week on this and would rather actually accomplish something in the next two months than learn the 'right' API]." Or, "It's too hard [given various constraints that I'm not going to explain to you but are valid to me.]" It'll save you having to give speeches about shitty excuses.

That said, if it makes sense for your project, make it happen! :)


Even if the long term goal is more portable GPU support it still makes some sense to get a CUDA implementation up first if it is easier to get to. It then allows real world testing faster, can always go to openCL later once they know more.


Just out of curiosity: how often did you see that happen (not only related to GPU's, but technologie decisions overall)? In my (little) experience the change at a later moment will not happen. Most of the time because the management has a new idea/project which you have to attend to.


It's often a great excuse to do something else instead. If you can't get what you need done without the more difficult option, sure do it. But there's no sense in going down the harder path needlessly.

I'm not trying to convince you you don't need it or shouldn't do it, I was looking for a datapoint about what you find valuable in OpenCL.


Well, the portability can be a killer feature. I've been writing quite a bit of OpenCL code lately. I have an AMD GPU, so CUDA is a non-starter. I'll eventually replace the AMD card with an NVIDIA one, so it won't be as big of a problem, but my OpenCL code will still be fine then.


CUDA code is GPU only, OpenCL can run on both CPU and GPU. There are limits so it isn't all win, but it means you don't have to implement twice.


You go with OpenCL so that you can use AMD's Fusion processor that will start will soon allow a GPU and CPU to share main memory.


You can do this already, and also with Intel's Ivy Bridge (at least on Windows).


It is the same story as OpenGL vs DirectX, it is all about the support the developers get from the vendors.


You're preaching to the choir. Looks like NEC funded this.

NVidia seems to be the preferred hardware for institutions/big companies. I'm not sure if this is because NVidia's architecture is better for supercomputers or if they're simply better at marketing to those types of customers

NVidia funds a lot of academics in my space, and I've found academia to be very anti-open source for those reasons, which amuses me greatly.

Case in point, Matlab. Why is this taught in a world with Python/Numpy/MatplotLib?


Matlab seems like inertia/culture to me: it's the longtime de-facto standard in engineering. Since it's what everyone uses, it's got packages for everything, and papers will often come with prototype Matlab implementations. Roughly like the cultural position R holds in statistics. Matlab's hold on engineering is also bolstered by its widespread use in industry: students want to learn it, because it's what their future employers use, and professors / research scientists like to use it because it's what their industrial collaborators use.

In my area of CS (artificial intelligence) it seems considerably less popular. I don't really remember how to use it, since the last time I used it seriously was in some engineering (but not CS) courses in undergrad.


In addition, the MathWorks have so far managed not too screw up too badly and are keeping Matlab up to date. (They are definitely quite nice as an employer.)


I've heard this argument a lot (since 1999), but I'm not confident it holds true anymore. The free alternatives are so good.

You may be right regarding Professors, but that is also changing as they age out.


Could be; we don't use Matlab much in my own research area, so recent change could've happened under my radar. When I've occasionally had contact with engineers in industry, though, Matlab still seemed to be everywhere. The most recent two examples were someone doing DSP, and someone doing mechanical engineering, and both had all their stuff built on top of Matlab+Simulink.


Matlab knows how to control numerical precision and many algorithms produce the best results when running on that platform.

In the world of electrical engineering Matlab can do things that other packages can't.

From personal experience, I have spent many hours looking at the results of a atan2 function in C++ and Matlab and trying to get them to agree. After a day of work I was able to get them to agree by precisely controlling the rounding modes and using my own atan2 function. This was not fun, and I would rather give somebody 1k to take care of it for me.


NVidia funds a lot of academics in my space, and I've found academia to be very anti-open source for those reasons, which amuses me greatly.

Wait, are you saying academia is anti-opensource because of nVidia funding?


I'm saying there is a systematic advantage to using proprietary technologies in academic research (companies have money, so you can write a grant and they will pay you $). Case in point, look at apps coming out of academia and you'll see a lot of WindowsPhone. It is because Microsoft gives away a ton of free phones (I have one on my desk at this moment) and Azure time.


Ok, that tracks better than it being something nVidia in particular did.

I don't see a serious problem with the scenario you describe, though. You're not really describing a hostile scenario, just an affinity for commercial software.

There is a bit of a problem of course; I find a lot of papers that describe how to do things with commercial technology that isn't in the budget. That hasn't been insurmountable for me in any way, but maybe others have had more serious problems with it.


The free alternatives are not so good for beginners. One of MATLAB's main strengths is the embedded editor + repl, whereas with Python/Numpy/Matplotlib stack, there are just too many moving parts. The environment of MATLAB can be emulated with ipython workbook or emacs, but I don't believe it is easy enough for beginners.


Are you aware of Sage [1]? It's a Python-based, batteries-included, integrated maths system. It is actually more popular than Matlab around the lab here. Incidentally, we don't get much funding from corporations (but yes, we have licenses for Matlab, Maple and Mathematica for everyone, just somehow Sage is more popular).

[1]: http://sagemath.org/


One reason why lots of things typically use CUDA is because NVidia makes datacenter rackmounted GPU gear. So there is no need to run generically if that is the only available "production" hardware.


The fun part of parallel programming is getting things running on your GPU, parallelizing the algorithm, then tuning and optimizing the code. This is easier, faster, and more pleasant in CUDA with it's mature tools and ecosystem. That's why open source projects often use CUDA.

The advantage of OpenCL is that it runs on more platforms (not just NVIDIA). The problem is that it's more complicated and more of a headache.

My advice to programmers is to start with CUDA and play around with your problem for a while. Time spent learning how GPUs work, what kinds of operations are efficient, and how to parallelize algorithms is not wasted if you switch to OpenCL later. Once you've made some progress then make an informed decision about whether you want to go to production with CUDA or OpenCL.


Another question, is CUDA actually fast enough?

Because in bitcoin mining, it's always ATI/AMD and OpenCL because ATI cards are like ten times faster than Nvidia cards. This is because of architecture differencies.

Does it not affect this postgres table-scanning task? I wonder if they did any benchmarks.


AMD's advantage in Bitcoin mining was purely due to an architectural quirk: their shader cores supported bitwise rotation, but Nvidia's didn't. Bitwise rotation is a rare instruction outside of certain crypto algorithms (like SHA256!), so this really means very little for general-purpose performance.

http://www.extremetech.com/computing/153467-amd-destroys-nvi...


"AMD's advantage in Bitcoin mining was purely due to an architectural quirk"

False. I authored a Bitcoin miner utilizing this quirk (bit_align). I was also the first to leverage another instruction exclusive to AMD (bfi_int): https://bitcointalk.org/?topic=2949 bit_align "only" gave AMD a 1.7x advantage over Nvidia. The biggest perf gains (2x-3x!) came from the fact AMD has more execution units: https://en.bitcoin.it/wiki/Why_a_GPU_mines_faster_than_a_CPU... (I also authored this section of the wiki).


Scrypt too? Because Nvidia performance on scrypt is bad either.


Don't use hashing/bitcoinmining/password cracking as your benchmark if that is not your workload.


This is because the hashing algorithm is highly dependent on integer rotate right instruction. AMD's implements it in 1 clock cycle, Nvidia in 3. So it's a special case.


NVidia cards do not efficiently implement bit rotation, while AMD cards do, and it happens to be the core part of the SHA algorithm used for bitcoin. In general for an arbitrary task they're fairly close in performance.


AMD cards dominate Nvidia in pretty much all password hash bruteforcing algos, even those that do not rely on bit rotation (bit_align). See http://golubev.com/gpuest.htm for example. It is true that another instruction helps in more cases (bfi_int which I talked about at http://blog.zorinaq.com/?e=43) but in general, AMD cards have a lot more raw integer and floating point compute resources (execution units) than Nvidia cards.


They're also moving away from the VLIW design with each successive generation, partly to better support more flexible GPGPU approaches.


That is really interesting. I had to look it up.

https://en.bitcoin.it/wiki/Mining_hardware_comparison

The fastest Nvidia showing is the Tesla S2070 which is a $18k Server with 8 GPU's! It can just barely keep up with a slightly over clocked single gpu HD7970.


I'm not a miner, so correct me if I'm wrong, but that seems like a bit of an apples-to-oranges comparison. Tesla cards (and the servers designed around them) are intended for specific use cases: mission-critical enterprise solutions and scientific HPC. As a result, they run slower processor and memory speeds in comparison to nVidia's own consumer products, use ECC memory, and are optimized for double-precision over single-precision performance. Mining with a Tesla is like gaming with a Quadro card.


There is nothing mission critical or 'enterprise' about Tesla/Fermi cards. You can crash them and lock up your whole machine. Even if you can reboot the OS the card may not respond and the rebooted OS won't see it, we sometimes have to physically shut the machine down to reset the Nvidia card. Nvidia is still a gaming company at heart and it's going to take a while for them to adjust to providing equipment that is meant to be reliable and not just fast.


there are matrix multiplication routines developed for CUDA. OpenCL you would have to do everything from the ground up. So Nvidia gave everyone a head start for numerical computation, and that edge has ever since snowballed.


You can use http://viennacl.sourceforge.net/ for OpenCL.


Try programming in both CUDA and OpenCL and see which one you would choose.


Nvidia uses ECC RAM on their GPU compute cards, an important consideration for serious HPC computing.


No, they don't. They implement ECC by re-purposing some of the existing RAM to hold the parity data. Enabling ECC reduces the usable amount of memory, and also can hurt performance. It offers some improved reliability, but it's nothing like a real server-grade memory system.


CUDA is a lot more mature and easier to program for. OpenCL likely has the future, but it takes more work to set it up and if you have Nvidia cards it is harder to optimize.


Nvidia software has always been much better than AMD. AMD on Linux is a complet disaster. So if you work on GPU it's natural to go Nvidia.


>AMD on Linux is a complet disaster.

So is nvidia. Both companies produce absolutely horrible drivers. Not just for linux either, tons of bluescreens, crashes and other windows instability issues are video driver bugs. That is what happens when the sole concern is speed, and stability is totally ignored.


I have to agree with the grandparent. AMD's graphics drivers for Linux are a complete disaster. NVidia's are only a partial disaster. And sometimes that's the best you get.


From my experience porting CUDA code to OpenCL code, CUDA is much cleaner and more succinct since it is able to assume a lot about the underlying hardware.


Brilliant!

Perhaps it's just my scars showing, but I'm concerned about database system stability with active GPU hardware added to the box.

I probably wouldn't add the GPU hardware to a master but rather do the queries that would benefit from it on a streaming replica, where an occasional kernel panic won't be so severe.

Regardless, looking forward to trying it.


It's read only...


I think that he is referring to the tendency of GPU drivers to crash. Even if the DB is read-only from the GPUs' standpoint, if the DB goes down because of faulty drivers it's still a problem.


On the flip side, if this is sufficiently adopted, it could present motivation to driver developers, thus improved drivers. Perhaps this and the linux gaming movement could mean some symbiosis for driver development.

Ignoring Windows as I guess I don't really take Windows servers too seriously.


Postgres on the GPU just screams HSA http://hsafoundation.com and more importantly AMD and their upcoming Kaveri processor. which will significantly reduce the latency of performing GPU calculations since the CPU and GPU will share the same cache.


Really, really cool!

I've toyed with the idea of doing pattern matching (and graph rewriting) on the GPU before but this looks like it's much more advanced than I thought was feasible.

I'm surprised they went with CUDA instead of OpenCL though. CUDA is proprietary NVidea technology and does not work for non-NVidea devices.


All the examples seem to use numbers (integers and floats). It would be interesting to see if it can work efficiently with variable-width strings, which is the main workload that I encounter. But even if not, I see the value working with lots of data.


GPU is good at calculating stuff (nVidia in particular at calculating floats.) I do not see reasons to do text stuff with GPU


GPUs had been used for some kind of fuzzy string matching [1] and worked well thanks to their huge memory bandwidth.

[1] http://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorith...


The query sample is interesting: it seems to be find objects (locations) nearby given point. But there is CPU-only solution to index the data properly and have 1000x faster response times: PostGIS. So if before you threw more CPU/RAM if you did not know how to make things faster in smart way, then now you throw more GPU. Anyway, for sure there can be cases where GPU -based solution could be better alternative than using traditional existing solutions like specialized indexes.


I notice the GPU load time is about 53ms. This is a discrete graphics card so I do wonder how an integrated APU will affect this. I can imagine the overhead there being virtually nil.

The long term trend is for the GPU to merge with the CPU, so I think we'll see more of this in the future.


This is cool, however, Postgres could probably achieve higher performance without the GPU as well if they added concurrency on the CPU for certain type of operations (e.g. aggregation, sorting, etc). That would be a killer feature.


interesting, I'm also currently working with GPU


I don't understand why people keep writing new code in legacy hardware that most people don't own.

Its written in CUDA, not OpenCL. Stop that.


Yeah, why do people keep solving the problems they have rather than the problems you wish they had? Bastards.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: