JITing PostgreSQL using LLVM [pdf]

assface · on Feb 4, 2018

AFAIK, there are only a handful of DBMSs that do complete query compilation with the LLVM:

* MemSQL (http://highscalability.com/blog/2016/9/7/code-generation-the...)

* Tableau/TUM HyPer (https://blog.acolyer.org/2016/05/23/efficiently-compiling-ef...)

* CMU Peloton (http://db.cs.cmu.edu/papers/2017/p1-menon.pdf)

* Vitesse (https://www.youtube.com/watch?v=PEmVuYjhQFo)

I think Greenplum was talking about doing this too, but that was about a year ago (http://engineering.pivotal.io/post/orca-profiling/)

Most systems just compile the predicates (Impala, SparkSQL).

A lot of companies are talking about adding this now. The performance gains are significant.

mpala · on Feb 4, 2018

That's pretty misleading. All systems I am aware of essentially try to compile query-specific code and avoid re-compiling runtime code that doesn't vary between queries (e.g. if you look at the HyPeR paper, that's exactly what they describe).

Compiling everything is questionable. There's not much point re-compiling runtime code or code outside of the hot path, it's expensive and doesn't bring any benefit.

E.g. things that aren't beneficial to compile per query include: * Loops over a column of the same datatype, with no query specific branches (e.g. decoding a column of integers) * Other static code, e.g. some hash table operations * Outer loops that don't execute frequently * Rarely executed code, e.g. error handling.

There are two general designs that let you compile only the necessary things. 1) the runtime calls into compiled code for hot loops vs 2) the compiled code drives the query and calls into the runtime. A lot of the systems you mentioned do the second, but Impala does the first, which seems to be the source of some misunderstanding. Also there were some cases where hot loops in earlier versions of Impala weren't compiled, but that's changed - generally we try to ensure that all hot loops are compiled.

I think generally the optimal design wouldn't be complete query compilation, but rather something more like a traditional JIT that selectively compiles parts of the query.

Source: work on Impala's query compilation

jbapple · on Feb 4, 2018

FWIW, Impala compiles more than just predicates. See, for instance, https://github.com/apache/impala/blob/e98d2f1c0af270930cd8a5...

nimish · on Feb 4, 2018

Sql server does in memory compilation as well.

assface · on Feb 4, 2018

AFAIK, SQL Server's Hekaton engine only compiles the queries inside of stored procedures and UDFs. They do not compile every query that comes in from the client:

https://docs.microsoft.com/en-us/sql/relational-databases/in...

pjmlp · on Feb 4, 2018

Just as Oracle, using an intermediate representation originally designed for Ada compilers.

alecco · on Feb 4, 2018

SQL query compilation in LLVM and no mention of HyperDB??

Edit: down votes... hah

assface · on Feb 4, 2018

Which "HyperDB" are you talking about? I already listed Tableau's HyPer. Here are the other "hyper" systems that I know about and they don't do LLVM compilation:

* HyperDex (http://hyperdex.org/)

* HyperGraphDB (http://www.hypergraphdb.org/)

* HyperSQL (http://hsqldb.org/)

* Hypertable (http://hypertable.org/)

alecco · on Feb 4, 2018

The one from the groundbreaking 2011 VLDB paper on SQL query compilation with LLVM http://hyper-db.de

jbapple · on Feb 4, 2018

That's the same as Tableau's HyPer. From the page you linked to:

"The commercial license for HyPer, a spin-off of TUM, has been acquired by Tableau Software"

alecco · on Feb 4, 2018

I stand corrected

dfox · on Feb 4, 2018

For me this is great demonstration of how SSDs have changed the performance landscape. Exactly this idea was discussed on P2D2 10 or so years ago and generally dismissed as too much work for negligible performace gain because the disk IO will still dominate. Today I believe that this makes sense.

Edit: and probably it is not only about SSDs, but also about cheap DRAM as today in most deployments I know about most queries only touch indices and tuples that are in RAM (and in fact often the whole pg_base is smaller on disk than RAM of the server).

anarazel · on Feb 4, 2018

I think even 10 years ago that argument was less often true than people might have thought. A raid of a few rotational disks, with multiple controllers, could easily deliver a few hundred MB/s back then too. And thus it was easy to get CPU bottlenecked back then too.

The pg community often said so as well, particularly because it was not commonly used in a major analytical capacity...

fulafel · on Feb 4, 2018

Also, DRAM hasn't gotten that much cheaper in 10 years. In 2008 you could already get reasonably priced commodity x86 servers with 128 GB RAM, so for most DB applications you could keep all your working set cached and only worry about writes on your storage layer.

https://blogs-images.forbes.com/jimhandy/files/2011/12/DRAM-...

http://thememoryguy.com/wp-content/uploads/2016/05/2016-05-0...

https://en.wikipedia.org/wiki/List_of_Dell_PowerEdge_Servers...

pgaddict · on Feb 5, 2018

I don't recall the exact discussion from P2D2, but AFAIK the reasoning was more along the lines "There are other bottlenecks that we need to address first." That is, issues that would either limit the JIT gains or issues with better cost/benefit ratio (measured in developer time).

elvinyung · on Feb 4, 2018

The most interesting part I found about this is that the decision to do JIT compilation is part of the cost-based optimization process. It's a nice way to make PG versatile enough to be performant at both OLTP and OLAP queries.

anarazel · on Feb 4, 2018

Thanks. Honestly, I went there primarily because it is trivial to implement. I guess it might be one of the "laziness is good" cases ;)

I think it's good enough to do so on a per query basis in the first version, but after that it will probably need to become more sophisticated.

rurban · on Feb 4, 2018

What I learned from it:

* You really need to add a C++ compiler to your configure.ac. I see some tricks coming up with clang++ compiling the llvm gluework to bitcode and shipping this, but I fear it will be too big. And then you need clang.

* ORC (lazy compilation is back) can finally do again what the legacy jit could do: native jit on demand with a stub, into modules and adding functions to modules on the fly.

But the new llvm stuff is exciting: saving the old headers as bitcode headers, merging and doing expensive inline optimizations with these in the background. Remember, the legacy jit was gone when it couldn't support cross-jitting to foreign architectures needed for lldb. OCR jit can now do what MCJIT did, cross-arch jitting, and it got lazy compilation back. The module abstraction with it's resolver quirks are still there, but it's still just a simple interface to the compiler and linker lib. It's still extremely awkward to use, only via the C++ interface, as you have to mixin all the compiler, linker, resolver classes with lambda's for your wanted behavior. But at least it's functional again.

anarazel · on Feb 4, 2018

> * You really need to add a C++ compiler to your configure.ac.

Indeed, I initially tried very hard not doing so, but it turned out to be infeasible.

> I see some tricks coming up with clang++ compiling the llvm gluework to bitcode and shipping this, but I fear it will be too big. And then you need clang.

Not sure I understand what you're proposing here? You mean to avoid needing a c++ compiler? That'd not work, as the generated bitcode is not architecture independent.

If you instead mean that the installed version will contain bitcode of its own source, yes, that's the plan to facilitate inlining (& specialization). It's not that big, and can be located in a separate package.

> * ORC (lazy compilation is back) can finally do again what the legacy jit could do: native jit on demand with a stub, into modules and adding functions to modules on the fly.

I use orcjit, but its lazy stuff isn't particularly interesting for my usecase. Lazily JITing is done a layer above LLVM. There's already one pointer indirection anyway, adding another indirection via a stub isn't useful...

> It's still extremely awkward to use, only via the C++ interface, as you have to mixin all the compiler, linker, resolver classes with lambda's for your wanted behavior. But at least it's functional again.

Yea, I'm not a big fan of the ORC APIs. Additionally natively there still isn't any debugger / profiler integration - I don't quite know how people are using it without those...

rurban · on Feb 5, 2018

> If you instead mean that the installed version will contain bitcode of its own source, yes, that's the plan to facilitate inlining (& specialization). It's not that big, and can be located in a separate package.

Yes, that was my idea.

> Additionally natively there still isn't any debugger / profiler integration - I don't quite know how people are using it without those...

That's the best thing about this postgresql llvmjit project. He added nice patches for debugger and profiler integration. This is needed locally only for the devs, so not a blocker.

anarazel · on Feb 5, 2018

> He added nice patches for debugger and profiler integration.

He is me ;)

AlphaSite · on Feb 4, 2018

I wonder how well the SQL/query tree model fits into Graals JITting system. It feels like they should map near Y perfectly in theory.

ksec · on Feb 4, 2018

Holy Moly, that is an Interesting thought!

Tostino · on Feb 3, 2018

I love reading the threads on -hackers with all of this development, it's really amazing stuff. I hope I get to see it make it into core soon!

grzm · on Feb 3, 2018

For those curious, here's (one of?) the threads 'Tostino is referring to:

https://www.postgresql.org/message-id/flat/CA%2BTgmoYqvQAYLt...

MaxBarraclough · on Feb 4, 2018

Great stuff!

It's interesting to see LLVM taking over the world not only in ahead-of-time compilation, but also in JIT.

There are other JIT compiler frameworks out there, like GNU Lightning, but only LLVM seems to have any traction.

anarazel · on Feb 4, 2018

> There are other JIT compiler frameworks out there, like GNU Lightning, but only LLVM seems to have any traction.

I looked at a few libraries before deciding on LLVM. Lightning isn't that interesting for postgres' use case for a few reasons. The biggest issue is that it can really be used to implement inlining of operators defined in C - there's no equivalent of LLVM bitcode generated from C that can then be inlined. Secondarily, it doesn't include, afaict, much of an optimizer. That's not great for postgres' usecase.

The inlining issue imo makes it really hard to compete with LLVM, even though there's quite some space for a code generator and optimizer much more tuned towards emission speed.

dis-sys · on Feb 4, 2018

two great projects combining their magical power together, just amazing!

barrkel · on Feb 3, 2018

Cloudera Impala also uses LLVM to JIT chunks of the query plan into machine code.

Xorlev · on Feb 4, 2018

It makes a ton of sense for Impala/Presto, OLAP tends to be about throughput (long queries over lots of data) which benefits from this kind of JIT compilation since the overhead is amortized.

Really excited to see it makes its way into Postgres.

electrum · on Feb 4, 2018

Presto generates Java bytecode which is then turned into machine code by the JVM JIT.

CandidlyFake · on Feb 4, 2018

Don't all DB servers "JIT" and then cache query plan?

riku_iki · on Feb 4, 2018

Executing machine code still can be more performant than interpreting cached execution plan.