Hacker News new | past | comments | ask | show | jobs | submit login
Improving Postgres Connection Scalability: Snapshots (microsoft.com)
142 points by grzm on Oct 27, 2020 | hide | past | favorite | 65 comments



I do not understand why people aren't clamouring against postgres's connection model.

I might be missing something but as I understand it, Postgres had chosen to couple the concept of a connection and a unit of concurrency for simplicity of implementation. However this means that even if your server can theoretically handle 5000 concurrent reads you will never get there because opening 5000 connections isn't practical. You'll likely hit a memory limit.

Why is this generally accepted as ok? Decoupling concurrency from connections seems possible via pipelining commands over a single connection. Is it just too late for a project as mature as Postgres?


> Decoupling concurrency from connections seems possible via pipelining commands over a single connection. Is it just too late for a project as mature as Postgres?

This is commonly done using http://www.pgbouncer.org/

While it's not built-in, it's a really common solution and can achieve high number of connections. It comes with some issues though (no server-side prepared statements). And yes, it's not exactly the same as pipelining on the server itself since it's still limited by the number of bouncer connections, but there's usually enough idle time to sacrifice some latency for throughput.

There's also https://www.pgpool.net/mediawiki/index.php/Main_Page but I'm not familiar with the details.


I always see people recommending external pools like this here on HN. I'm wondering a bit about that as many frameworks/libraries that use Postgres already implement an internal connection pool. Is this recommendation generally assuming that multiple applications will access the server? Or are there other reasons to prefer an external pooler over internal pooling in your application?


They serve somewhat different purposes. For example, application-internal connection pools won't shield the database server from a rapid scale-out of application instances: n * the instance count will still result in n * connections.

I've always considered the #1 purpose of internal pools to be minimising latency, by moving the database connection handshakes outwith the critical path.


> I always see people recommending external pools like this here on HN. I'm wondering a bit about that as many frameworks/libraries that use Postgres already implement an internal connection pool. Is this recommendation generally assuming that multiple applications will access the server? Or are there other reasons to prefer an external pooler over internal pooling in your application?

It's pretty common to want both, particularly for larger installations. If you have more than one or two application servers the number of connections held by the "internal" pools can become pretty large. With naive configuration you often end up with each of the internal pools being substantially over-sized, due to handling some momentary imbalance between the application servers.

In particular if your workload allows to use transaction (or even statement) level pooling, you can get very large boosts by having those two levels of pooling. The application pools ensure the connection overhead is not incurred, the external pooler ensures the total number of connections is manageable.


> Is this recommendation generally assuming that multiple applications will access the server

That's very often a reason to use external pools, yes. Or just think of a single web application that runs on multiple nodes to distribute the load on the application servers. You either carefully configure each connection pool or you use an external pooler to which each node connects to (rather than directly to the database)


Even if you configure it carefully, you have the limit of: your db connection capacity < app servers X max internal pool size.

Every time you add another instance, you're not adding a "happy path number" of connections. You're adding up to the internal max, because at some point you will get spammed with real or bot traffic.

It's easier to aggregate the connections than carefully balance that setup every time you make infra changes or scale up. And you need some headroom for "oh, this point release of the library spawns a thread now and gets its own connection pool... oops".


>>> I'm wondering a bit about that as many frameworks/libraries that use Postgres already implement an internal connection pool.

That's precisely the problem.

The usual misconfigured python/ruby application will throw 16 connections at the database.

The model for these languages is to deploy one process per core, so there will be 8 concurrent processes running. 8 cores * 16 pool = 256 connections total.

Multiply by N parallel servers and that's even more connections.

There are issues with large amount of connections and postgresql, hence people decide to put pgbouncer in front of the database to mitigate.


Its for IMO, crap languages that don't have proper threading support. Every language with native decent threading implementation has its own connection pool in the client or as an add-on library


It is a common solution, but losing track of where connections are from can be a pretty big price to pay.

I am painfully aware of this due to a situation where a rogue machine with an out of date version of software kept making unwanted updates. It literally took months of off and on searching to locate the machine. And the most irritating thing was that Postgres internally contains all of the tools that we needed to trace where connections came from..which is how we knew that it was connecting through pgbouncer.


Dumb q: pgbouncer didn't have logs? There was no way to modify the connection or request to add identifying info? Genuinely curious...


It had logs, but they were not useful.

However reading docs, it looks like my complaint is dated. In version 1.6 (released in 2015) they added the application_name_add_host parameter which sticks the client ip and port to application_name. This allows bad queries to be tracked back to the source machine.

I wish I had had that feature available to me when I needed it...


This actually doesn't solve the underlying issue that I'm referencing. While this lets you share a pool between multiple processes it does not change the high memory requirements on Postgres to support a highly concurrent workload. One connection (10mb) still can only be doing one thing at a time. Most of the lifetime of a request the connection is idle but locked


That might of been a serious problem in 1995, but certainly throwing 64-128gb of ram at a postgres box isn't that uncommon?

I'm also curious how you got 10mb per connection (if I'm reading your parenthetical correctly). I've seen a PG box with a few hundred active connections sitting happily at just a gig of ram. Obviously WHAT you do with those connections may well change that but that isn't pg-wire overhead.


It's uncommon to throw 128 GB at a postgres. Remember that the world has moved to the cloud, where you pay a hefty fee per GB.

It's entirely possible to throw gigabytes of RAM as long as the company is willing to foot the bill. You're going into thousands of dollars per month depending how big and how many replicas we're talking about.


One connection is not actually 10 MB. There is great article explaining the difference, in Linux, beetween reported memory usage and real physical actual usage. I'll edit if manage to find it.

Edit: anarazel actually posted it in its reply.


I'll guess: money. Postgres is decades old and was designed when the internet was smaller. Doing a large, fundamental change like this requires an already experienced person (or maybe more than one) to devote a lot of time designing and implementing some solution. This time costs money. So some company must be willing to employ or pay some people to work full-time on this for months. Anyone qualified to work on this should be very expensive, so full costs to pay experts for months of their time would be in the ~$100-200k level. Much outside the donate-a-cup-of-coffee-each-month range, and outside of any small startup's budget, too.

This suggests that the various companies employing people to work on Postgres-related stuff (like Microsoft, perhaps due to their purchase of Citus) have more lucrative work they'd rather do instead of improve this at the design level.

This problem is now perhaps larger than open source is designed to handle because of how expensive it is to fix. Very few people (zero in this case) are willing to freely donate months of their life to improve a situation that will enrich other companies.

Regarding the difficulty of doing this: the blog post here describes how the concurrency and transaction safety model is related to connections, so any connection-related work must also be aware of transaction safety (very scary).


> This suggests that the various companies employing people to work on Postgres-related stuff (like Microsoft, perhaps due to their purchase of Citus) have more lucrative work they'd rather do instead of improve this at the design level.

Well I - the author of this post, employed by MS - did just work quite a while on improving connection scalability. And, as I outlined in the precursor blog post, improving snapshot scalability practically is a prerequisite of changing the connection model to handle significantly larger numbers of connections...

Trying to fix things like this in one fell swoop instead of working incrementally tends to not work in my experience. It's much more likely to succeed if a large project can be chopped up into individually beneficial steps.


Napkin time... 5 TB RAM would still leave 1 GB per connection. Getting enough CPU and IO in a box for 5000 concurrent queries would be much harder than the RAM. I think core counts for conventional x86 servers top out around 128-256 cores (not counting multi box custom interconnect systems that appear to software as SSI)


It's extremely rare for workloads that need high connection counts to utilize every connection to the extent that they're practically never idle. To the contrary, usually the majority are idle - latency alone leads to that, given the fast queries such workloads commonly have. Not to speak of the applications holding those connections usually also doing other stuff than sending pipelined queries (including just waiting for incoming requests themselves).


The question was originally about 5k concurrent reads. But even for 4500 connections idle you'd still want 500 cores and a monster IO subsystem? (There are middlewares to handle the idle connection pooling more efficiently though.)


> But even for 4500 connections idle you'd still want 500 cores and a monster IO subsystem?

Entirely depends on your workload:

These days it's very common to have the hot working set fit into the database server's buffer cache - in which case there will be little IO. And even in cases where you have a lot of IO - a decent NVMe SSD can do several 100k IOPS (with some doing more than 1M IOPS), with per IO latencies in the mid two digit microsecond range.

And because common queries can be processed within fractions of a millisecond, often 500 busy connections can be handled by a few cores.


> And because common queries can be processed within fractions of a millisecond, often 500 busy connections can be handled by a few cores.

If even 500 (out of 4500) connections are busy, what's the catch? Isn't this just a lot of scheduled CPU work trying to run concurrently when it would be faster if queued? Fast NVMe IO, or the fact that the queries are a few ms each, are just factors that makes the workload more CPU intensive (vs IO bound), the scenario would be more plausible if you had slow disks :)

I guess being bottlenecked by the network can be one scenario if your DB and app aren't close and/or your queries involve bulk data like GIS shapes...


Think you're mixing up parallelism and concurrency. Other databases with connections that support pipelining can easily get to high concurrency even with a single core. The key is the cpu can kick off 100 queries to a single connection and just wake up when a response is available.


> Think you're mixing up parallelism and concurrency. Other databases with connections that support pipelining can easily get to high concurrency even with a single core. The key is the cpu can kick off 100 queries to a single connection and just wake up when a response is available.

Which RDBMS implement a protocol with tagged queries? I don't think that's common. From my, quite fallible, memory, the DB protocols supporting pipelining all just do in-order query execution. As I mentioned before, postgres does support that.

The TCP connection piece of a database connection isn't really expensive - that's the transaction and query execution state. These days (used to be different) it's not that hard to write an application handling 10s-100s of thousands of TCP connections, as long as most of those aren't continually sending data.

Obviously that's where the process-per-connection limitations for postgres come in - having that several 100k processes isn't really a good option. Not even just because of the process overhead, but even just the context switch and process scheduler overheads become more and more debilitating.

But even if postgres' connection model were switched to many-connections-per-process/thread - you still need to have the per-connection state somewhere; obviously transactional semantics need to continue to work. And the per-connection transaction state is where the snapshot scalability limitation the article is talking about was. So the snapshot scalability issue needed to be fixed before a connection model switch really made sense.


But where this kind of concurrency without corresponding parallelism make sense? There's no point in trying to do thousands of concurrent reads if there's no available parallelism on the same scale, especially keeping transaction / isolation level snapshots open, it's just an overloaded system. Then you want to do backpressure with a maybe some queuing.


> I do not understand why people aren't clamouring against postgres's connection model.

There are people wanting to change that - including me, the author of the blog post. I explained in an earlier blog post ([1]) why I chose to work on making snapshots more scalable at this time:

> Lastly, there is the aspect of wanting to handle many tens of thousands of connections, likely by entirely switching the connection model. As outlined, that is a huge project / fundamental paradigm shift. That doesn’t mean it should not be tackled, obviously.

> Addressing the snapshot scalability issue first thus seems worthwhile, promising significant benefits on its own.

> But there’s also a more fundamental reason for tackling snapshot scalability first: While e.g. addressing some memory usage issues at the same time, as switching the connection model would not at all address the snapshot issue. We would obviously still need to provide isolation between the connections, even if a connection wouldn’t have a dedicated process anymore.

> However this means that even if your server can theoretically handle 5000 concurrent reads you will never get there because opening 5000 connections isn't practical. You'll likely hit a memory limit.

It's quite possible to have 5000 connections, even leaving poolers aside. When using huge_pages=on, a connection has an overhead of < 2MiB ([2]). Obviously 10GiB isn't peanuts, but it's also not a crazy amount.

> Why is this generally accepted as ok?

Postgres is an open source project. It's useful in a lot of cases. It's not in some others - partially due to non-fundamental limitations. There's a fairly limited set of developers - we can only work on so many things at a time...

> Decoupling concurrency from connections seems possible via pipelining commands over a single connection.

Could you expand on what you mean here?

> Is it just too late for a project as mature as Postgres?

No. It's entirely doable to decouple processes and connections. It however definitely is a large project, with some non-trivial prerequisites.

[1] https://techcommunity.microsoft.com/t5/azure-database-for-po...

[2] https://blog.anarazel.de/2020/10/07/measuring-the-memory-ove...

EDIT: formatting woes


There is a much better reference somewhere (possibly from Ingres times, or later), but here is Stonebraker describing how PostgreSQL ended up with connection-per-process in 1986:

> DBMS code must run as a sparate process from the application programs that access the database in order to provide data protection. The process structure can use one DBMS process per application program (i.e., a process-per-user model [STON81]) or one DBMS process for all application programs (i.e., a server model). The server model has many performance benefits (e.g., sharing of open file descriptors and buffers and optimized task switching and message send- ing overhead) in a large machine environment in which high performance is critical. However, this approach requires that a fairly complete special-purpose operating system be built. In constrast, the process-per-user model is simpler to implement but will not perform as well on most conventional operating systems. We decided after much soul searching to implement POSTGRES using a process-per-user model architecture because of our limited programming resources. POSTGRES is an ambitious undertaking and we believe the additional complexity introduced by the server architecture was not worth the additional risk of not getting the system running. Our current plan then is to implement POSTGRES as a process-per-user model on Unix 4.3 BSD.

(THE DESIGN OF POSTGRES, https://dsf.berkeley.edu/papers/ERL-M85-95.pdf )

There is another reference directly related to Postgres or PostgreSQL that made it even more clear, I expect it was probably later on. In effect it indicated someone involved in the project had strong intentions of getting to adding threading "real soon now". I'll update the comment if I figure out where that's from.

Threads were still a research thing by the mid 80s, so its absence from such an old design is easy to understand. Pthreads wasn't even standardized until 1996, although several unices (e.g. SunOS) already had popular implementations long before that.


You might be thinking of

> In POSTGRES they are run as subprocesses managed by the POSTMASTER. A last aspect of our design concerns the operating system process structure. Currently, POSTGRES runs as one process for each active user. This was done as an expedient to get a system operational as quickly as possible. We plan on converting POSTGRES to use lightweight processes available in the operating systems we are using. These include PRESTO for the Sequent Symmetry and threads in Version 4 of Sun/OS.

From: The implementation of POSTGRES - Michael Stonebraker, Lawrence A. Rowe and Michael Hirohama

Hat tip to Thomas Munro. I think he pointed this quote out to me in the past.


Thanks for doing this work! Postgres is amazing and this area is the one place where eg. MySQL has a clear lead.

> Postgres is an open source project. It's useful in a lot of cases. It's not in some others - partially due to non-fundamental limitations.

For me at least, postgres has the problem that it's too useful. It has so many clear advantages over the alternatives that it often makes sense to choose postgres over more specialized tools.

For example, even if I was primarily storing JSON data, postgres is still a good choice, because it offers better consistency guarantees than most document databases, and has a more powerful query model.

The end result is that the few limitations remaining (eg. the connection model) are felt all the more strongly: I cannot simply say "ah I need lots of connections, I'll use MySQL here" without also giving up on all sorts of other features that I would normally use. (Although MySQL is still improving)


I wasn't criticizing Postgres as much as I was a bit confused why I don't hear more people talking about this issue.

I think poolers confuse the issue because they solve the problem of allowing multiple processes to share the same pool, they don't increase throughput to the server.

The 5000 connection example was probably missing some context. I actually was using 10mb per connection as the overhead which creates a decently expensive server for not a lot of throughput.

By pipelining what I mean is allowing multiple requests to be in flight on the connection, each tagged with an ID that gets queued on the server. When a worker process gets around to processing it, it'll send back a response with the same ID. This increases driver complexity but also makes it so you need very few connections per client.


> Could you expand on what you mean here?

Not sure what OP was thinking about, but for instance SQL Server supports something called “multiple active result sets” (MARS) on a single connection.

https://docs.microsoft.com/en-us/sql/relational-databases/na...


FWIW: Oracle also uses one process per connection on Unix/Linux. I think since Oracle 12 this can be changed during installation, but it's still the default.

When using connection pools, this isn't really such a problem in the majority of the cases.


Oracle doesn't handle TCP connections, the Oracle Net Listener (a separate process) is responsible for that. It supports connection pooling.

I'm no expert on how this scales compared to Postgres as it is hard to find benchmarks, thanks to Oracle licensing terms.


In oracle its been an option for a long time(since atleast 10g, I think before), shared vs dedicated servers in their language, and in my experience pretty rare to use shared servers.


Shared servers represent a built-in connection pool. They still use one process for each connection.

Oracle uses a thread model in Windows (one "orcle" process, each connection is a thread), but not on Unix/Linux.


May be worth noting that pgbouncer is one method of dealing with large connection counts


It's kinda bad, but fine if you have low latency from servers to DB.

And, if your other servers are very careful about not holding transactions open. That's a big one. You can't be sloppy and hang a connection open with transactions running. This is bad for slow languages where every transaction takes 10+ms.

And it's bad if your language doesn't support connection pools/threading.

In practice that means you can get full performance with fast threaded languages like Java, C#, Go, Rust that can use connection pooling. But performance will suffer if language is slow or doesn't have good enough threading to do connection pooling.

So yeah, it's a bad-ish design that works fine if you're using a fast threaded language. If you're trying to squeeze max performance out of Ruby or something you're going to be disappointed in other ways anyway


I don't know anything about Postgres in this regard, but one common reason for not pipelining requests is fairness: You don't want one client to be able to starve others out by being very noisy.


The postgres protocol does support pipelining of queries - and it can be a huge boon in latency sensitive workloads. They'll get processed in-order on the server side, with results being sent back while the next query is being processed.

The biggest weakness around pipelining is that a fair number of drivers don't support it yet, including the C client interface that is part of postgres (there's a patch being reviewed right now adding it there). The common jdbc driver, .net, and a few other popular ones do support it though.


It would be amazing to see Postgres do this over TDS.


I've not seen any real efforts to add support for different protocols to postgres. Do you really think just adding protocol level support for e.g. TDS would be useful, if the the SQL dialect still was postgres's? If so - why?

While I am employed by MS, I just work on PG, and I have long before starting at MS. So I just have the open source hacker's perspective on this, not any MS perspective.


Contribution to Postgres from Microsoft. We are in reality check.

This is an awesome thing. Would love to see more and more open source contributions from Microsoft.


Microsoft purchased CitusDB a while back, so you can expect many many contributions to Postgres to be from Microsoft now.


I'm deeply glad MS is contributing not controlling, given the declining state of MSSQL. Maybe it'll motivate them to pull their socks up over their own flagship product.


there is a reason why MSSQL is called tedious in npm moudle.


what is declining in MSSQL? last 3 years were an absolute gamechanger for my use cases: dockerized linux option for linux testing, very helpful columnstore improvements, the HA story is quite impressive, in fact there are only two major drawbacks that i can tell: no command line tool like pg_dump and the awe-inspiring price of the license (though i hear still less than oracle).


start here https://www.mssqltips.com/sqlservertip/3074/use-caution-with... Many are fixed but should never have been there. Some are still extant.

There's plenty more low grade shit in what was once a goof product.

I haven't used the features you discuss but then I need basic functionality to work. I've run into stupid performance bugs with trivial CTEs, and I found a link showing a CTE error which I could duplicate (pgres got it right, mssql didn't). Can't find link, sorry.


The cost, god the cost has got so much higher. I love SQL Server but am being forced off it, especially as you need enterprise to do queryable replicas etc.


Personally I'm not fond of Microsoft embracing anything I use.


That's a very unfortunate point of view to take, stuck in the mists of time. Wouldn't it be better to be forward-looking instead?


In general I don't think $enterprise hearting open source is a good thing in the long run. Even if there are accidental benefits, the incentives are just not aligned.

I also find Microsoft to be one of the most evil, invasive and user hostile tech company out there despite their recent PR deluge. My opinion is pretty much set in stone at this point but I welcome anyone to cheerlead them if they so wish.


Would love to hear more regarding Microsoft evil, invasive and user hostile. I've certainly consumed the last ~5 year PR propaganda, and am not aware of the bad stuff?


I fail to see how Windows is not the epitome of user abuse, granted I only occasionally have to use it nowadays so might be off. For starters migrating users to Windows 10 via dark UIs and even forced upgrades. Generally abusing the update channel and forcing updates users be damned. Enabling and re-enabling telemetry. Adding and re-adding Microsoft services people did not ask for. Ads in the Windows start menu. Intercepting installation of Firefox/Chrome.

In other news Microsoft just recently extended GitHub with certifications after embracing it https://news.ycombinator.com/item?id=24855453

Personally I also dislike their sales tactics, at uni I had my fair share of Sponsored by MS classes, using .NET and Visual Studio.

Just search for Microsoft at the bottom of this page really.


Embrace (GP) => Extend (looking forward) => Extinguish (We wanted the best, but it turned out like always)


I find it quite a mental leap to go from "Microsoft improved connection scalability in Postgres" to "Microsoft is going to extinguish Postgres, arguably the most popular open-source database in existence".


Times have changed, economic incentives have not. Linux, Postgres, et al will survive but have no doubt that Microsoft will try to extinguish them.

EEE is not some discredited strategy from Gates era Microsoft that magically vanished with a few executive changes, it is a fundamental defense mechanism of umbrella enterprises driven largely by a bunch of little decisions by self interested actors at all levels of the hierarchy. Gates' Microsoft helped give it a name with a catchy alliteration, but Nadella's Microsoft will define what that phrase means in the cloud era.


Fool me once, shame on you.

Fool me twice, shame on me.

Fool me thrice, you might be Microsoft.


Maybe this will be more like Chrome than IE. Lightning fast and powerfully invasive


'This post details the improvements I recently contributed to Postgres 14 (to be released Q3 of 2021)'

I think this timeframe, committing something that gets releases in a year adds to the rock stable nature of Postgres.


TBH, early this year I tried pretty hard to get these changes into PG 13 (released a few weeks ago). Missed the mark by a week or two. So this amount of time between complex commits and release isn't quite the norm.

Still would have been a while obviously - feature freeze was April 7th, I think.


Solid work. I have seen a few proposed patches or improvements to Postgres handling of connections BUT as with other situations it always seems there's a small number of core maintainers that seem to shoot these improvements down pretty hard so they end up not going anywhere.

There's an on-going debate about whether this should be inside Postgres or handled externally (like pgBouncer and co. do). I am firmly in the camp that it ought to be handled internally.

Maybe with the pedigree of Microsoft/Citus contributing this we might finally see something change.


PostgreSQL should stop creating a separate PROCESS per connection. This is just a waste of resources.


It definitely improves resilience against crashing worker processes though.


That benefit while pretty useful is probably not worth the cost if PostgreSQL had been built today, but changing from processes to threads is no small task.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: