PGX: Write Postgres extensions in Rust instead of C

jarym · on July 13, 2020

Guys, this is really really cool. Been looking to learn Rust and it just so happens I need to write a Postgres extension in the near future so two birds and one stone.

Btw - Also, I've seen ZomboDB and while I've not used it I must say it looks EXCELLENT. So congrats on that too.

zombodb · on July 13, 2020

blush

manojlds · on July 14, 2020

Wow it does look great. Going to try it out!

gavinray · on July 13, 2020

This is really cool! The only other similar thing I know of is WASMer's extension (which only supports Rust atm I believe):

https://github.com/wasmerio/postgres-ext-wasm

One question though: If you built and compiled an extension with this, how would you deploy it on top of a Postgres Docker image?

I see instructions for installing and testing locally, via the cargo command, but couldn't find anything on this.

Does it spit out a static asset that you put in the project folder, and copy over during the build pipeline? Something like:

  FROM postgres:12-alpine
  COPY ./my-rust-extension /some/container/directory
  RUN install-command-here

zombodb · on July 13, 2020

That's a great question, and one probably best answered over on pgx's GitHub page.

But! I plan on adding a command to "cargo-pgx" to package up the extension for you into a directory structure (or tarball, maybe).

The idea is that you'd just run: cargo pgx package

And it would just build a --release library, and create the proper directory structure (based on what pg_config) says for the shared library and the associated .sql.

I actually need this ASAP for supporting ZomboDB proper, so... Coming Soon!

gavinray · on July 13, 2020

Awesome, look forward to it =D

zombodb · on July 20, 2020

https://github.com/zombodb/pgx/releases/tag/v0.0.7

!boom!

faizshah · on July 13, 2020

Interesting, it makes coding a postgres extension seem delightfully simple. I have been looking for something to try using rust on! Thanks for posting this.

zombodb · on July 13, 2020

Thank you!

If you decide to jump into it, definitely let us know any pain points you have.

It takes a bit of time to work out the kinks in a thing like this.

rukittenme · on July 13, 2020

Just out of curiosity, what sort of requirements compel a person to write a postgres extension? I know of things like PostGIS but I'm not well versed enough in this world to know the totality of the problem.

zombodb · on July 13, 2020

Author of `pgx` here.

We developed pgx so that we could rewrite "ZomboDB" (https://github.com/zombodb/zombodb) in Rust. ZDB is a custom "Index Access Method" for Postgres. Think btree index, but stored in Elasticsearch.

So that's definitely a thing.

Other ideas might be custom data analytics/transformation/processing engines that you'd prefer to run in the database instead of externally.

Custom data types to represent, for example, street addresses or genetic information.

The only limit is yourself! ;)

boulos · on July 13, 2020

> The only limit is yourself! ;)

For folks who don't get the joke, please visit zombo.com before Flash is finally dead :).

mumblemumble · on July 13, 2020

Alternatively, there's https://html5zombo.com

manojlds · on July 14, 2020

For folks who still didn't get it, like me: https://en.wikipedia.org/wiki/Zombo.com

Dowwie · on July 13, 2020

Have you evaluated Tantivy/Toshi as possible alternative to ES? I only ask because of your native use of Rust. PGX seems like a more robust offering than pg-extend-rs. This is a great addition to the community!

Also, v.0.0.3 doesn't create much confidence about stability! I'll keep an eye on the project and wish the team the best.

zombodb · on July 13, 2020

re: v0.0.3 -- sure. I just published it last night.

We've been working on it since November last year, and have now fully ported ZomboDB to it.

It's proving out nicely, but keep in mind that Postgres' internals are infinitely complex. Getting safe wrappers around all its "things" is going to take a very very long time.

I'd rather get something that seems very stable now, and continue to iterate on it over time.

de6u99er · on July 14, 2020

I think the comment was less about your code but rather about your v0.0.3 version number. I guess a v0.99 would imply more stability.

That's why lots of companies have switched to date based versioning, similar to Ubuntu and others.

zombodb · on July 13, 2020

I've looked into them. It seems they're designed to work within a single process, and it's not quite clear to me how sharing the underlying data files across postgres backends (even with proper Postgres locking) would work.

That's not say they aren't good frameworks. I'm sure they are. It just seems like they're designed for different use cases.

That said, I have other ideas on this front that I can't talk about today. ;)

rukittenme · on July 13, 2020

> The only limit is yourself! ;)

You have no idea how significant of a limit that is... I'm kidding.

Thanks for the response! And zombodb looks very cool. I might bring it up since we have a couple of heavy analytics endpoints right now.

derefr · on July 13, 2020

The most common reason to write a PG extension is to add native support for a structure/data type that PG doesn't already have native support for.

For example, despite Postgres having a (base-10000) numeric/decimal type, there are certain things that can be done much more cheaply in base-256 (e.g. ser/des from bit-packed and/or hex strings), and thus pgmp (https://github.com/dvarrazzo/pgmp) exists as a Postgres libgmp binding.

There are also "data service" type extensions—exposing some other data "into" Postgres so it can participate in a query. PG's own pg_stat_statements extension is an example. Postgres's support for Foreign Data Wrappers has mostly obviated writing these as extensions, though; it's much easier (and safer for your RDBMS-side data!) to just write these as standalone daemons that Postgres mounts as FDW servers.

hans_castorp · on July 14, 2020

A foreign data wrapper is an extension.

To install it, you run "create extension my_fdw"

derefr · on July 14, 2020

Yes, a given Foreign Data Wrapper—that is, the module of functionality that describes the connection strategy for some remote resource, like an ORM "adapter" module—is a Postgres extension. To get PG to speak a new protocol, you need to write a new FDW extension.

But a Foreign Data Wrapper server is just an instance of that class. You don't need to write a Postgres extension, just to use an existing Foreign Data Wrapper.

And, crucially, many of these FDWs are written to be backed by standard (or at least well-documented) network protocols.

Which means that, a large percentage of the time, people thinking "I'll take this third-party thing and integrate its data into Postgres" these days, don't write a Foreign Data Wrapper, but rather choose one of the existing network protocols with an existing Foreign Data Wrapper that supports it, and make their third-party thing speak that protocol, so that they (and anyone else who likes) can mount it as a server using the existing FDW.

Specifically, many systems that want to be "Postgres-compatible" these days (CockroachDB, Materialize, etc.), speak the Postgres wire protocol as if they were Postgres itself. This is not (mainly) so that real clients can talk to them through a libpq binding (as they tend to have their own, more idiomatic client ABI); but rather so that actual Postgres instances can mount them using `postgres_fdw`, the FDW that assumes the other end is another Postgres server and uses libpq to talk to it.

gshulegaard · on July 13, 2020

This is a sort of broad question, but if I were to try and answer I would say that if there is anything you want to be handled by the database you would gravitate towards an extension.

PostGIS is a great example since one of the most well known it additions it makes is a datatype, but that's not the only thing it adds. A datatype on it's own wouldn't be that useful without Spatial Indexing (https://postgis.net/workshops/postgis-intro/indexing.html) or spatial utilities to make spatial joins more expressive (https://postgis.net/workshops/postgis-intro/joins_exercises....).

Rich data types aren't the only candidates for extensions, automated partition management (https://github.com/pgpartman/pg_partman), data sharding (https://www.citusdata.com/), or even database cron scheduling (https://github.com/citusdata/pg_cron) are also good examples of things that are well suited to be extensions.

To be honest, I have a hard time imagining alternative paths where some of this functionality _isn't_ an extension. You might imagine an external daemon for things like pg_cron and pg_partman, but if PostGIS wasn't an extension you would probably be using a different database/tool (or a fork) if you had geospatial requirements. It's worth noting that Citus was a fork of PostgreSQL before it was refactored to be an extension.

I know this isn't a direct answer to your question, but hopefully you find it somewhat useful.

Edit: I totally forgot to add 1 more thing about extensions. When functionality is packaged as an extension there is a much higher likelihood you can mix and match them. For example, I currently run a PostgreSQL cluster using Citus, with pg_partman for automatic time-based partitioning, pg_cron regularly scheduling partition creation, and PostGIS for geospatial datatypes. You could extrapolate the various ways you might use this kind of setup...one that jumps to my mind is a scalable GPS history that could be queried by time period as well as by geographic region.

zombodb · on July 13, 2020

Yeah, "rich data types" is a good point.

pgx provides a #[derive(PostgresType)] macro that lets you represent any normal Rust struct (that serde_cbor can (de)serialize) as a Postgres type. You write no code.

It even generates all the boring SQL boilerplate for you.

I plan on putting together an example about this and doing a twitch stream this week to discuss in detail.

harikb · on July 13, 2020

I think “extension” covers any custom code (even functions that need to import a c library), not just major db features. I have had to do it recently [1] to customize partitioning function to be the way I like. It would be great to do it in a friendlier language

[1] https://harikb.github.io/posts/postgres-11-hash-partitioning...

Hendrikto · on July 13, 2020

> It would be great to do it in a friendlier language

Whether Rust is "friendlier" than C depends a lot on your definition of what that means.

zombodb · on July 13, 2020

You're not wrong, but barring bugs in `pgx` (of which I'm sure there are plenty right now), at least Rust gives you compile-time guarantees around not crashing.

And when running inside your database process, that's a huge win.

zombodb · on July 13, 2020

As a follow-up. Here's what Postgres' "generate_series" function looks like with pgx: https://github.com/zombodb/pgx/blob/0803bd3aa2f8ef2bd80d2e03...

And here's what you'd have to do to implement it in C: https://github.com/postgres/postgres/blob/dad75eb4a8d5835ecc...

zombodb · on July 13, 2020

One more follow-up...

The top one is pgx, the bottom is Postgres. So there's a little room for improvement here with pgx, but that's okay for a v0.0.3 release.

    test=# select count(*) from srf.generate_series(1, 10000000);
    Time: 1552.115 ms (00:01.552)

    test=# select count(*) from generate_series(1, 10000000);
    Time: 1406.357 ms (00:01.406)

anarazel · on July 13, 2020

The largest part of the time executing the above query isn't inside the function, so this isn't that a material comparison. The reason for that is that SRFs in FROM to be materialized into a tuplestore, which isn't free:

    postgres[607045][1]=# SELECT count(*) FROM generate_series(1, 10000000);
    ┌──────────┐
    │  count   │
    ├──────────┤
    │ 10000000 │
    └──────────┘
    (1 row)

    Time: 1249.224 ms (00:01.249)

    postgres[607045][1]=# SELECT count(*) FROM (SELECT generate_series(1, 10000000)) s;
    ┌──────────┐
    │  count   │
    ├──────────┤
    │ 10000000 │
    └──────────┘
    (1 row)

    Time: 460.206 ms

For mostly historical reasons SRFs in the target list can use the non-materializing SRF query protocol, but SRFs in the FROM list can't.

Any chance you could show the timings for the pgx version of the second query?

zombodb · on July 13, 2020

Sure! Top is Postgres, bottom is pgx, after running each 5 times...

    test=# SELECT count(*) FROM (SELECT generate_series(1, 10000000)) s;
      count   
    ----------
     10000000
    (1 row)

    Time: 399.630 ms
    test=# SELECT count(*) FROM (SELECT srf.generate_series(1, 10000000)) s;
      count   
    ----------
     10000000
    (1 row)

    Time: 478.194 ms

Thanks for the reply. I'm not surprised there's room for optimization in pgx, especially in the Iterator-->SRF path.

edit: PS, thanks for the idea of putting what I assume is the backend pid in psql's prompt. I need to go figure out how to do that right now!

edit edit: hmm, I guess that's not the PID.

anarazel · on July 13, 2020

Thanks for the update.

> edit: PS, thanks for the idea of putting what I assume is the backend pid in psql's prompt. I need to go figure out how to do that right now!

Here's my ~/.psqlrc:

    andres@awork3:~/src/postgresql$ cat ~/.psqlrc
    \set QUIET 1

    \pset pager always
    \set VERBOSITY verbose
    \pset null '(null)'
    \set COMP_KEYWORD_CASE upper
    \pset linestyle unicode
    \pset border 2
    \set PROMPT1 '%/[%p][%l]%x%R%# '
    \set PROMPT2 '%/[%p][%l]%x%R%# '
    \set PROMPT3 'c:%/[%p][%l]%x%R%# '
    \set HISTCONTROL ignoredups
    \set HISTSIZE 100000

    \timing on
    \set QUIET 0

dwaltrip · on July 14, 2020

In the function definition of `generate_series`, what is going on with the `default!` macro?

That isn't some sort of way of actually getting real default param values is it? I thought that wasn't possible in Rust.

db48x · on July 13, 2020

That is certainly nicer.

chc · on July 13, 2020

Does Rust give you compile-time guarantees around not crashing? I thought any function could panic without so much as a peep from the compiler.

zombodb · on July 13, 2020

Fair. With pgx, however, Rust "panic!"s are translated into standard Postgres "ERROR"s, such that instead of crashing, only the current transaction aborts.

So while you're pretty much correct in general, pgx handles it in the way a PG extension author would expect.

derefr · on July 13, 2020

One might say that Rust is friendlier downstream, to the ops people running the compiled binary :)

lmm · on July 14, 2020

Really? I struggle to imagine a case for claiming that C is friendlier.

petergeoghegan · on July 14, 2020

The ecosystem of preexisting C extensions to generalize from, for one

Tostino · on July 13, 2020

It just depends on the type of problem you'd like to solve.

I've got a whole bunch of ETL helper functions and patterns i've written for my business that could be packaged up as an extension if I were so inclined for example. Those are all SQL or PL/PGSQL functions, so no need for native code there though...

For PostGIS, a whole lot of that code is just SQL or PL/PGSQL functions...but there are quite a few C functions as well, to take care of the heavy lifting which is less efficient to implement in SQL or PL/PGSQL.

fanf2 · on July 13, 2020

A few years ago I wrote an experimental Logical Decoding plugin that translates database transactions to DNS UPDATE messages. I was super impressed by how nice it was coding to PostgreSQL's plugin APIs in C. I'm currently not sure if a C plugin is the best approach for this particular task: it might be better to use one of the more popular Logical Decoding to JSON plugins and munge the JSON into DNS UPDATEs in something more high-level than C.

cdaringe · on July 13, 2020

We have to write a Json logger versus the default log format

zombodb · on July 13, 2020

`pgx` would let you do that.

You could hook Postgres "emit_log_hook" and probably just use serde to xform the provided "ErrorData" pointer right to json and ship it off where ever you want.

(edit: typeos)

darksaints · on July 14, 2020

This has been on my back burner for about a year now...glad someone has gotten to it. It already looks full featured. I've been thinking that with the right library subset (just like libcore or no_std), it would be a very safe possibility for a trusted language...accessible to users that are not superusers. That would be extremely valuable to RDS users that would love to use something fast, as opposed to the standard trusted languages like pl/python.

zombodb · on July 14, 2020

I've noodled the idea of a pl/pg_rust, but the whole "Cargo.toml" and downloading dependencies from the internet seems bad.

I see what you're saying about no_std tho. Could be doable.

touisteur · on July 14, 2020

Recent similar thing in the Ada world: https://github.com/AdaCore/gnatcoll-db/tree/master/pgxs

fnord123 · on July 13, 2020

Also of interest: rpgffi - https://github.com/posix4e/rpgffi

"R(Rust) PG(Postgresql) FFI (Foreign Function Interface)"

zombodb · on July 13, 2020

That is interesting.

pgx does a lot of that too, but for only pg10/11/12. And then it post-processes the bindings and builds a "common.rs" for symbols that are identical across versions.

That way you can target all three versions at once, and then you can use Rust's #[cfg(feature=)] directive to opt into support for specific versions.

pgx has also done a little bit of work to mask differences between the three versions so they appear as part of "common.rs" anyways.

I wanted a framework that let me know, at development time, if I was trying to do a thing that isn't supported (or the same) across various PG versions.

3ace · on July 14, 2020

At first I thought it is a Postgresql driver for Go that has the same name

https://github.com/jackc/pgx

Snelius · on July 14, 2020

Yes, it's always mess like sqlx.

giovannibonetti · on July 14, 2020

It would be awesome if someone with some Rust knowledge could make an extension to create a compressed string type like Antirez Smaz [1].

I know Postgres compresses TEXT types by default, but only when the data exceeds TOAST_TUPLE_THRESHOLD (default 2 kB)

Some examples on the project page [1] show reduction of size between 10% and more than 50% for some small strings (less than 100 bytes).

[1] https://github.com/antirez/smaz

stubish · on July 14, 2020

I think the existing C API would be a better fit, and the work would likely be a fairly simple modification to the citext extension found in contrib section of the PostgreSQL source if anyone is game.

zombodb · on July 14, 2020

I'd be inclined to agree.

That said, it'd be easy to prototype with pgx, assuming that compression library exists in the Rust ecosystem too?

giovannibonetti · on July 14, 2020

I guess this compression library should do the trick [1], but I haven't tested it yet.

[1] https://docs.rs/compress/0.1.2/compress/entropy/ari/index.ht...

tcyrus · on July 13, 2020

Is there any way to make custom types using pgx?

zombodb · on July 13, 2020

Yes there is. It's not documented/example'd yet tho.

There's a derive macro called #[derive(PostgresType)]. Combine that with serde's Serialize, Deserialize, and you're gtg.

I'm going to be working on more docs and twitch streams over this week.

ficklepickle · on July 13, 2020

Typo in the title: oc->of

dang · on July 13, 2020

Fixed. Thanks!

YalsXHAN · on July 13, 2020

[flagged]

speedgoose · on July 13, 2020

JavaScript is still pretty strong these days.