Hacker News new | past | comments | ask | show | jobs | submit login

Lets see pgsql deal with 15000 client connections.



It’s really shocking how wide the difference is here between Postgres and MySQL.

One of the Planetscale guys did a podcast and said at GitHub every MySQL instance is doing like 50k+ connections while if you need more than 100 connections in Postgres you already need PgBouncer.


Why is it "shocking"?

It's just an architectural decision to spawn a process per connection that Postgres made long time ago.

It's a tradeoff like most decisions.

Back in the days MySQL had huge issues with scaling to multi-core systems (maybe they fixed it now, I haven't used MySQL for a long time) while Postgres never had this problem.

When designing an application around Postgres you just have to take into account that:

1. opening a new connection is rather expensive, so you should have a connection pool

2. opening too many connections is problematic, so you should keep the size of your pool rather small and return connections back into the pool ASAP

That's it.

It's not that hard in practice.


MySQL never had a problem with scaling to multi-core systems; Linux on the other hand did have a problem and its problems became everyone's problems. In the early 2000's the answer to multi-core Linux issues was to buy a system running one of the commercial UNIX. By 2006 or so this became a non-issue for Linux ( for the most part ).

Postgres uses an old design model which predates threads; I have no idea if they have made any progress in updating their design. In the past I have heard the core devs talk about how difficult it would be to do this.

Oracle Database moved to a hybrid process/thread model at some point, this is the superior solution ( I have no idea if it was done well or not, but from standpoint of how to achieve better leverage of CPU vs IO, this is the way ).

If the PG devs had enough time/money, I am sure they would move towards a hybrid model where the focus would be on processor affinity with IO being all tied to events.


Years back (around 10) mysql did have multiple instances of false sharing[0].

Unrelated to mysql - I do consider using redis in any capacity a blunder (it's likely ok for nodejs users, I suppose)

[0]: http://dimitrik.free.fr/MySQL_Connect_2013/MySQL_Perf-Connec...


Why would you do 50k+ connections if they can't be active all at once either way? Unless you have 50k+ cores and IO beefy enough not to get overwhelmed due to that.

You can have as much connections as you want, but you'll have to trade it for having lower work mem numbers, which hurts performance. Traditional advice is to keep it below 500 per PostgreSQL instance (I'd say physical host).

I've ran dozens of micro services handling thousands of requests per second with a total connection limit of around 200 of which most was still unused - all without any server-side pooler.


because people run large amounts of front ends and workers that create a significant amount of connections. it doesn't matter if they are all active.


Why would you want every "frontend" keep an open connection all the time?

> it doesn't matter if they are all active

It does, if the connection is inactive (doesn't hold an open transaction) you should close it or return it to the pool.


so you are suggesting you close a connection between queries?


Between queries in the same transaction? No

Between transactions? Yes, absolutely

In fact, many libraries do it automatically.

For example, SQLAlchemy doc explicitly says [0]:

> After the commit, the Connection object associated with that transaction is closed, causing its underlying DBAPI connection to be released back to the connection pool associated with the Engine to which the Session is bound.

I expect other reasonably sane libs for working with transactional databases do the same.

So, if you are doing pooling correctly, you can only run out of available connections if you want to have a lot of long running transactions.

So, why would you want every of your 50k frontends keep an open transaction simultaneously?

[0] https://docs.sqlalchemy.org/en/20/orm/session_basics.html#co...


Because there's an overhead to make a connection, authenticate, set the default parameters on the connection, etc. I've never seen a framework that closed db connections between requests.

Of course, the better design is to write a nonblocking worker that can run async requests on a single connection, and not need a giant pool of blocking workers, but that is a major architecture plan that can't be added late in a project that started as blocking worker pools. MySQL has always fit well with those large blocking worker pools. Postgres less so.


As I said, you can return the connection to the connection pool.

From the perspective of keeping the number of open connections low it doesn't really matter if you close it or return to the pool, because in either case the connection becomes available to other clients.


I might not be understanding what you're pointing out here. It sounds to me like sqlalchemy is talking about a pool of connections within one process, in which case releasing back to that pool does not close the connection by that process to the database. Parent comment is talking about one connection per process with 50k processes. My comment was that you don't need that many processes if each process can handle hundreds of web requests asynchronously.

If you are saying that a connection pool can be shared between processes without pgbouncer, that is news to me.


Of course, you're right, it is not possible to to share a connection pool between processes without pgbouncer.

> Parent comment is talking about one connection per process with 50k processes.

It is actually not clear what parent comment was talking about. I don't know what exactly did they mean by "front ends".


The most common design for a Web app on Linux in the last 20 years is to have a pool of worker processes, each single-threaded and ready to serve one request. The processes might be apache ready to invoke PHP, or mod-perl, or a pool of ruby-on-rails or perl or python processes receiving the requests directly. Java tends to be threads instead of processes. I've personally never needed to go past about 100 workers, but I've talked to people who scale up to thousands, and they happen to be using MySQL. I've never used pgbouncer, but understand that's the tool to reach for rather than configuring Pg to allow thousands of connections.


Just use connection pooling, which is available point/click as AWS RDS Proxy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: