Guys, this is really really cool. Been looking to learn Rust and it just so happens I need to write a Postgres extension in the near future so two birds and one stone.
Btw - Also, I've seen ZomboDB and while I've not used it I must say it looks EXCELLENT. So congrats on that too.
That's a great question, and one probably best answered over on pgx's GitHub page.
But! I plan on adding a command to "cargo-pgx" to package up the extension for you into a directory structure (or tarball, maybe).
The idea is that you'd just run:
cargo pgx package
And it would just build a --release library, and create the proper directory structure (based on what pg_config) says for the shared library and the associated .sql.
I actually need this ASAP for supporting ZomboDB proper, so... Coming Soon!
Interesting, it makes coding a postgres extension seem delightfully simple. I have been looking for something to try using rust on! Thanks for posting this.
Just out of curiosity, what sort of requirements compel a person to write a postgres extension? I know of things like PostGIS but I'm not well versed enough in this world to know the totality of the problem.
We developed pgx so that we could rewrite "ZomboDB" (https://github.com/zombodb/zombodb) in Rust. ZDB is a custom "Index Access Method" for Postgres. Think btree index, but stored in Elasticsearch.
So that's definitely a thing.
Other ideas might be custom data analytics/transformation/processing engines that you'd prefer to run in the database instead of externally.
Custom data types to represent, for example, street addresses or genetic information.
Have you evaluated Tantivy/Toshi as possible alternative to ES? I only ask because of your native use of Rust. PGX seems like a more robust offering than pg-extend-rs. This is a great addition to the community!
Also, v.0.0.3 doesn't create much confidence about stability! I'll keep an eye on the project and wish the team the best.
re: v0.0.3 -- sure. I just published it last night.
We've been working on it since November last year, and have now fully ported ZomboDB to it.
It's proving out nicely, but keep in mind that Postgres' internals are infinitely complex. Getting safe wrappers around all its "things" is going to take a very very long time.
I'd rather get something that seems very stable now, and continue to iterate on it over time.
I've looked into them. It seems they're designed to work within a single process, and it's not quite clear to me how sharing the underlying data files across postgres backends (even with proper Postgres locking) would work.
That's not say they aren't good frameworks. I'm sure they are. It just seems like they're designed for different use cases.
That said, I have other ideas on this front that I can't talk about today. ;)
The most common reason to write a PG extension is to add native support for a structure/data type that PG doesn't already have native support for.
For example, despite Postgres having a (base-10000) numeric/decimal type, there are certain things that can be done much more cheaply in base-256 (e.g. ser/des from bit-packed and/or hex strings), and thus pgmp (https://github.com/dvarrazzo/pgmp) exists as a Postgres libgmp binding.
There are also "data service" type extensions—exposing some other data "into" Postgres so it can participate in a query. PG's own pg_stat_statements extension is an example. Postgres's support for Foreign Data Wrappers has mostly obviated writing these as extensions, though; it's much easier (and safer for your RDBMS-side data!) to just write these as standalone daemons that Postgres mounts as FDW servers.
Yes, a given Foreign Data Wrapper—that is, the module of functionality that describes the connection strategy for some remote resource, like an ORM "adapter" module—is a Postgres extension. To get PG to speak a new protocol, you need to write a new FDW extension.
But a Foreign Data Wrapper server is just an instance of that class. You don't need to write a Postgres extension, just to use an existing Foreign Data Wrapper.
And, crucially, many of these FDWs are written to be backed by standard (or at least well-documented) network protocols.
Which means that, a large percentage of the time, people thinking "I'll take this third-party thing and integrate its data into Postgres" these days, don't write a Foreign Data Wrapper, but rather choose one of the existing network protocols with an existing Foreign Data Wrapper that supports it, and make their third-party thing speak that protocol, so that they (and anyone else who likes) can mount it as a server using the existing FDW.
Specifically, many systems that want to be "Postgres-compatible" these days (CockroachDB, Materialize, etc.), speak the Postgres wire protocol as if they were Postgres itself. This is not (mainly) so that real clients can talk to them through a libpq binding (as they tend to have their own, more idiomatic client ABI); but rather so that actual Postgres instances can mount them using `postgres_fdw`, the FDW that assumes the other end is another Postgres server and uses libpq to talk to it.
This is a sort of broad question, but if I were to try and answer I would say that if there is anything you want to be handled by the database you would gravitate towards an extension.
To be honest, I have a hard time imagining alternative paths where some of this functionality _isn't_ an extension. You might imagine an external daemon for things like pg_cron and pg_partman, but if PostGIS wasn't an extension you would probably be using a different database/tool (or a fork) if you had geospatial requirements. It's worth noting that Citus was a fork of PostgreSQL before it was refactored to be an extension.
I know this isn't a direct answer to your question, but hopefully you find it somewhat useful.
Edit: I totally forgot to add 1 more thing about extensions. When functionality is packaged as an extension there is a much higher likelihood you can mix and match them. For example, I currently run a PostgreSQL cluster using Citus, with pg_partman for automatic time-based partitioning, pg_cron regularly scheduling partition creation, and PostGIS for geospatial datatypes. You could extrapolate the various ways you might use this kind of setup...one that jumps to my mind is a scalable GPS history that could be queried by time period as well as by geographic region.
pgx provides a #[derive(PostgresType)] macro that lets you represent any normal Rust struct (that serde_cbor can (de)serialize) as a Postgres type. You write no code.
It even generates all the boring SQL boilerplate for you.
I plan on putting together an example about this and doing a twitch stream this week to discuss in detail.
I think “extension” covers any custom code (even functions that need to import a c library), not just major db features. I have had to do it recently [1] to customize partitioning function to be the way I like. It would be great to do it in a friendlier language
You're not wrong, but barring bugs in `pgx` (of which I'm sure there are plenty right now), at least Rust gives you compile-time guarantees around not crashing.
And when running inside your database process, that's a huge win.
The top one is pgx, the bottom is Postgres. So there's a little room for improvement here with pgx, but that's okay for a v0.0.3 release.
test=# select count(*) from srf.generate_series(1, 10000000);
Time: 1552.115 ms (00:01.552)
test=# select count(*) from generate_series(1, 10000000);
Time: 1406.357 ms (00:01.406)
The largest part of the time executing the above query isn't inside the function, so this isn't that a material comparison. The reason for that is that SRFs in FROM to be materialized into a tuplestore, which isn't free:
Fair. With pgx, however, Rust "panic!"s are translated into standard Postgres "ERROR"s, such that instead of crashing, only the current transaction aborts.
So while you're pretty much correct in general, pgx handles it in the way a PG extension author would expect.
It just depends on the type of problem you'd like to solve.
I've got a whole bunch of ETL helper functions and patterns i've written for my business that could be packaged up as an extension if I were so inclined for example. Those are all SQL or PL/PGSQL functions, so no need for native code there though...
For PostGIS, a whole lot of that code is just SQL or PL/PGSQL functions...but there are quite a few C functions as well, to take care of the heavy lifting which is less efficient to implement in SQL or PL/PGSQL.
A few years ago I wrote an experimental Logical Decoding plugin that translates database transactions to DNS UPDATE messages. I was super impressed by how nice it was coding to PostgreSQL's plugin APIs in C. I'm currently not sure if a C plugin is the best approach for this particular task: it might be better to use one of the more popular Logical Decoding to JSON plugins and munge the JSON into DNS UPDATEs in something more high-level than C.
You could hook Postgres "emit_log_hook" and probably just use serde to xform the provided "ErrorData" pointer right to json and ship it off where ever you want.
This has been on my back burner for about a year now...glad someone has gotten to it. It already looks full featured. I've been thinking that with the right library subset (just like libcore or no_std), it would be a very safe possibility for a trusted language...accessible to users that are not superusers. That would be extremely valuable to RDS users that would love to use something fast, as opposed to the standard trusted languages like pl/python.
pgx does a lot of that too, but for only pg10/11/12. And then it post-processes the bindings and builds a "common.rs" for symbols that are identical across versions.
That way you can target all three versions at once, and then you can use Rust's #[cfg(feature=)] directive to opt into support for specific versions.
pgx has also done a little bit of work to mask differences between the three versions so they appear as part of "common.rs" anyways.
I wanted a framework that let me know, at development time, if I was trying to do a thing that isn't supported (or the same) across various PG versions.
I think the existing C API would be a better fit, and the work would likely be a fairly simple modification to the citext extension found in contrib section of the PostgreSQL source if anyone is game.
Btw - Also, I've seen ZomboDB and while I've not used it I must say it looks EXCELLENT. So congrats on that too.