For being "boring and stable" technology, Postgres 16 includes 200 features, which is on par with each of prior years releases. If you're looking for the full set of features it's available here - https://www.postgresql.org/docs/16/release-16.html.
Personal favorites on the list include:
- load_balance_hosts, which is an improvement to libpq so you can load balance across multiple Postgres instances.
- Logical replication on standbys
- pg_stat_io which is a new view that shows IO details.
IMO the biggest reason folks use pgbouncer is not for load balancing (which it can do, -ish) but instead for connection pooling. Postgres connections are expensive for the db server (one process per connection not one thread) so if you have say thousands of web application pods you need to use pgbouncer or similar as a proxy to multiplex those thousands of connections down onto a more manageable number (~200). So no, not really.
(EDIT: if you don't know this already - the _establishment_ of connections is also super expensive. so another reason to pgbounce is to keep connections persistent if you have app servers that are constantly opening and closing conns, or burst open conns, or such like. Even if the total conns to pg doesnt go super high, the cost of constantly churning them can really hurt your db)
Out of curiosity, if the problem of connections being expensive is solvable by PGBouncer-style connection multiplexing, why doesn't Postgres just do that by itself?
If it were easy to do well in-core, I think we'd do it immediately. Unfortunately the interesting pooling modes in pgbouncer also break a few things - which would likely not be acceptable in core postgres. Avoiding such breakage requires non-trivial architectural adjustments.
Historically connection state and "process state" have been tightly coupled, for good server-side pooling they have to be divorced. While good pooling is doable with the current process model (passing the client file descriptor between processes using SCM_RIGHTS), it's much harder with processes than with threads - this is one of the reasons I think we will eventually need to migrate to threads.
Eventually I want to get to a point where we have a limited number of "query execution workers" that handle query execution, utilized by a much larger number of client connections (which do not have dedicated threads each). Obviously it's a long way to go to that. Ah, the fun working on an complicated application with a ~35 year history.
There also are use cases for pgbouncer that cannot be addressed on the server-side - one important one is to run pgbouncer on "application servers", to reduce the TCP+TLS connection establishment overhead and to share connections between application processes / threads. That can yield very substantial performance gains - completely independent of server side pooling support.
Because pgbouncer's transaction-based pooling, which is what the previous poster was referring to, breaks a few postgres features. This is fine for most applications, but not all. See the table on https://www.pgbouncer.org/features.html
yeah good callout. I'd would be totally fine with a server mode, or connection option, that opts you into pooling but then disables the incompatible features. You choose pooled mode if you need it, and the tradeoffs are documented.
I believe there has been talk and perhaps progress towards building it into PG, but i cant find the dev thread right now, and i dont know what the status of that is.
interestingly enough this is what Oracle does AFAIK. They are also process-per-conn & have an optional sidecar proxy thingy that you can run on your oracle host to do the pooling. I would rather it be built more tightly into the rdbms but thats not a terrible solution.
Postgres forks an OS process for each connection, which is relatively heavy weight. Oracle has a similar architecture to Postgres, and they solved it with a "multi-threaded listener". MySQL, in contrast, uses threads, which makes connections lighter weight.
It's fundamentally a question of how the connection listener communicates with the rest of the database, e.g., using shared memory or some other IPC mechanism, work queues, etc. Having too many connections results in problems with concurrent access and lock contention independent of how heavyweight the actual listening process is.
That's pretty much been my take as well. Been looking at CockroachLabs (CockroachDB hosted cloud), and been kind of hard to get any detail if pgbouncer is beneficial for this, since I'm planning to connect via node_compat via Cloudflare Workers, and have kind of decided to punt the issue and connect directly for now in development, and closer to prod release do some tests with/without pgbouncer at each cloud location for the larger prod cluster instances for release.
Not the same service(s) as PG even if it's the same protocol, so I know it's really beneficial for connection queueing WRT my scenario for PG, but no idea on the CDB side.
I've done some work in that space, which was mainly to build an alternative view materialization scheme[0] in PlPgSQL that reifies the internals of CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW but with the following properties:
- the materialized view is a plain table, so
- you can write to it from triggers
- you can have triggers on it
- refreshing a view records the deltas in a
history table (which is useful as a poor
person's logical replication scheme)
- you can mark a view as needing a refresh
Then in the application I have hand-coded triggers to either update the view's materialization directly or to mark the view as needing a refresh. A background job can asynchronously refresh views as needed.
I've also spent some time thinking about the AST form of view queries that PG stores and how one might automatically generate triggers on source tables that update the materialization or mark it as needing a refresh.
As you note, many queries can be very difficult to transform into queries that compute incremental deltas. Moreover, even where it's possible to do that, the time it takes to execute the delta computation might be unacceptably long. For example, if you have a recursively nested grouping schema and you want to maintain a view of the expanded transitive closure of that data, then removing a large from from another might require thousands of row deletions from the materialized view, and that might make the transaction take much too long in a UI -- the obvious thing to do here is to say "sorry, that kind of update takes a while to propagate, but your transaction will complete quickly", so just mark the view as needing a refresh and refresh it asynchronously.
Yeah, that's one of my favourite characteristics of PostgreSQL: if a feature makes it into a release, you can be VERY confident that it works well even across weird corner-cases.
You can use https://github.com/twosigma/postgresql-contrib/blob/master/m... and write triggers to update your views or mark them as needing asynchronous refreshes. It's not as automatic as what you'd like, but it's a lot better than having to keep waiting.
> - Subqueries in the FROM clause can omit aliases
This is great. It never made any sense to me that this was required. For people who are unaware, say you want to understand a table a natural way of doing it might be
select *
from the_table
order by some_metric desc
limit 10
so you'd think you can do the same for queries like
select *
from (
select blah blah blah the rest of the query
) a
order by some_metric desc
limit 10
you need to put the alias 'a' to placate existing postgres even though it's never actually used, which never made any sense to me.
It does the second you want to join to the results of that subquery...this is an optimization that recognizes that the alias is never used and thus can be omitted. Nice QOL feature.
"parallel execution of the string_agg and array_agg aggregate functions" caught my eye - that feels like one of those minor improvements that might enable some pretty interesting new applications.
Have to say, I find the SQL/JSON support in Postgres (jsonb) is probably the single biggest killer feature IMO. I can't believe MS (still) doesn't have an implementation (their SQL to JSON functions suck exponentially by comparison).
Nice to see the continued advancement and progression all around.
Yes... many, many internal deployments are definitely using MS-SQL services. StackOverflow.com uses MS-SQL, for a prominent, public example. There are client libraries via ODBC or direct in many languages and platforms, including Node.js (mssql/tedious) and Rust.
As to where, AWS offers Windows as do many other cloud providers, including a significant portion of Azure VMs. Not to mention, that MS-SQL and SQL-Edge both run on Linux. IIRC, Azure Cloud SQL is also MS-SQL running a non-windows version. There's also Linux/x86_64 Docker images.
Aside: if you're willing to write a big check, MS-SQL replication configuration is far easier than pretty much anything else to setup and configure (UI based flows or scripted). While I personally advocate for PostgreSQL, I've used and mostly like MS-SQL fine.
pg_upgrade is not an option on many systems if you are installing with the package manager as it requires both old and new binaries to be simultaneously present.
There is pg_upgrade and logical replication. I have used both and they both work excellently. Logical replication can be a bit of a hassle to set up though, especially if the database is huge.
AWS has gotten much better in recent years. Pg15 is available on Aurora as of last month. That's about 6-7 months after it was released, a significant improvement over years past.
Yup, but the lack of access to beta versions also means that we will basically get access to access to PostgreSQL 16 in RDS by next year around this time.
Hey, I helped to write the original article, and I also work at AWS.
I'm happy to let you know that PostgreSQL 16 Beta 1 is available in the Amazon RDS Database Preview Environment (https://aws.amazon.com/about-aws/whats-new/2023/05/postgresq...). I hope you get a chance to test, I'm very eager to here any and all feedback about PostgreSQL 16.
Bunch of interesting loosenings ups on logical replication. Allowing standbys to also have subscribers; great for fan out! Bunch of perf improvements (honestly a bunch of it is removing rather onerous limitations I didn't know about!) At the end there seems to be a developer flag for allowing seeming bidirectional replication too.
I am personally very excited to see that work on direct IO has started. It is a huge undertaking so I do not expect any production ready thing any time soon but nice to see that some code landed.
Yeah- we've been working on that for quite a while, and it'll take a few more years to finish it. I hope we can get some important parts into 17 though.
FWIW, there have been smaller prerequisites merged into 15 already, and 16 has a number of improvements that are part of that work. E.g. the more scalable relation extension (making COPY scale much better), and the related buffer mapping changes, come from the AIO effort.
Personally I think the feature is using asynchronous IO and direct IO support is part of that :)
PG 11 is end-of-life, right? So you won't be getting new security updates.
What's worse, libraries will stop supporting it over time, so you'll find it difficult to stay up to date in other places in your stack.
If I was running PG 11 in production, I'd be looking to upgrade sooner rather than later. Being on older-but-stable versions is good risk management, being on end-of-life version is not.
We roll onto new stuff. So soon-ish the 11 will be replaced with a 14 or 15. And then later the 13 will roll up to whatever is good, stable and tested in our environment. And our roll-up plan has never seen an old PG lose client library access. PG has never ever, in 20+ years been the thing that holds us back. I also think risk-management has many more (important) factors than EOL.
Edit: quick check shows we also have PG9 and PG10 in some Live roles too.
Whenever you do a one day full of updates and fix small issues but also gain new features and performance, just remember there is someone who uses some super old db no one wanna touch it again and the migration of that old thing will take weeks!
Personal favorites on the list include:
- load_balance_hosts, which is an improvement to libpq so you can load balance across multiple Postgres instances.
- Logical replication on standbys
- pg_stat_io which is a new view that shows IO details.