Hacker News new | past | comments | ask | show | jobs | submit login
Minimal downtime major PostgreSQL version upgrades with pg_easy_replicate (github.com/shayonj)
306 points by shayonj on June 20, 2023 | hide | past | favorite | 62 comments



I did this the "hard way" a year ago. This blog post came in handy:

https://jstaf.github.io/posts/pglogical/

pglogical is magic. Keeping an eye on this project for the future, looks great.


> Binary replication only works with DBs of the same major version. If you want to replicate to a different version, well… you can’t.

Is this true? I thought they fixed that at some point after PG11.


You're thinking of logical replication being added to the core. https://www.postgresql.org/docs/current/logical-replication....

WAL streaming is still only supported between the same major versions and AFAIA that isn't going to change anytime soon.


Love pglogical. I was thinking, it'd be nice to orchestrate this setup using pglogical too. Would attempt it if there is interest. Especially with bi-directional replication setup.


Nice UX!

I checked the code and noticed some things that you might want to address in the future (I've done major version upgrades via logical replication myself several times using a script similar to this).

The "default" way of bringing the target DB in sync with the source one (CREATE PUBLICATION on the source then CREATE SUBSCRIPTION on the target) only works for moderately sized databases. Even on ~50GB I noticed that it may take hours to catch up with the origin. There are a couple of solutions:

1. Drop all indices except for the primary keys before performing initial copy, and then restore the indices after all tables have been copied and replication stream started.

2. Instead of copying into a blank database, you could instead create a logical dump using pg_dump with snapshot isolation, restore it on the target DB using pg_restore, create a subscription in the disabled state, advance the logical replication slot to the LSN with which you created the logical dump, and then enable the subscriptions. This way your target database would only have to process the delta between the time you created a dump and the current time. I have a snippet illustrating how we did it: https://gist.github.com/take-five/1dab3a99c8636a93fc69f36ff9....

3. We found out that pg_dump + pg_restore is still very slow on larger databases (~1TB) and it's untenable to wait several hours. Remember that while you're waiting until the dump is being restored, the source database is accumulating WALs which it should later send to the target DB. If left unchecked for 12-18 hours, it could lead to running out of disk space. This was unacceptable for us, so instead of creating a target DB from a logical dump, we created a copy of the source database from AWS RDS snapshot, upgraded it using pg_upgrade and then set up logical replication (similar to how it's described in Percona blog: https://www.percona.com/blog/postgresql-logical-replication-...). Something like this can probably be achieved with ZFS snapshots.

--

Otherwise, it was very similar to how you programmed your library.

Recently, I saw an article here on HN on the same topic of migrating Postgres databases using logical replication. There was a very nice idea to set up _reverse_ logical replication after switchover so that it's possible to roll back quickly in case something breaks.

EDIT: formatting


These are great shouts! Thank you so much for sharing. One of the operations I ran with this was ~500GB and yeah it takes a few hours to catchup. For us, it wasn't an issue, especially also since we were on AWS Aurora. However, that said, I can totally see it being not feasible on other database engine/systems. I considered the idea of dropping all the indices and re-creating it once the catch up is complete. I'd like pg_easy_replicate it to support that. It should make the initial COPY lot more swift. On a 200GB large DB it cut down down the total time by 90%.

Re: bi-directional replication, you read my mind :). https://github.com/shayonj/pg_easy_replicate/blob/8df94aa93f...

These two features are top of my list to support in pg_easy_replicate


Check out pgcopydb. It uses COPY under the hood in parallel and is a lot faster than pg_dump/pg_restore.


There is also a new (OpenSource) Postgres extension that claims to provide multi-master replication based on logical replication, BDR and pglogical

https://pgedge.github.io/spock/

which can also be used for online upgrades.

It however only supports Postgres 15 and (not yet released) 16


Cool, that's good info. Hadn't come across it before. :)

The GitHub repo page for it is here, for anyone else interested in taking a look: https://github.com/pgEdge/spock


spock and pgedge are cool, but they are NOT open source (and they try hard to make it look like they are).

It is under the Confluent license (with s/confluent/pgedge/), basically a source-available-but-Amazon-cannot-use-it kind of license.


The fastest way to switchover is probably adding a iptables rule on the old server to DNAT to the new server. Unless you have a load-balancer or another network middleware like Kubernetes' clusterIP services (which uses iptables or IPVS internally).

Waiting for a rolling restart or DNS cache expiration could take a while, during which the app is talking to a read-only database, if I understand correctly.


This is why you always give two ip addresses to such servers. One for the server, and one for the service.


I could definitely have done with that idea a couple of weeks back!


Cool idea!


The title says "zero downtime" but the GitHub tagline says "minimal downtime"

Zero... data loss maybe?


Was just going to comment this. It's specifically:

> This ensures zero data loss and minimal downtime for the application.


Be careful if you have sequences as those are not replicated. https://www.jacobreed.dev/blog/migrate-postgres-instances


I will update the readme to mention this - but the lib takes care of updating sequence with the data catchup: https://github.com/shayonj/pg_easy_replicate/blob/13ad6af51f...


Is there an alternative for MySQL? We want to upgrade from 5.7 to 8 pretty soon, so it would be nice to have the appropriate toolkit.


MySQL is easy. You just open the database with the new version and the upgrade happens automatically.


When you do this, does it need to process the whole database before the server can start processing transactions, or is it more of a "upgrade in the background on the fly" kind of thing?


Have done minimal downtime major version upgrades using standard replication, switching between two "master" servers. In the five minute range.


This is also a spot where having failover strategies and circuit breakers can help. Microservices make you think about these things but you don’t have to wait for that forcing function to think about them.

When the database goes down, you have to do something else. Could be the server crashed or could just be a Postgres upgrade. Five minutes is just about the right amount of time for an open circuit to do its job.


Have you done this postgresql/RDS?

Because when I last tried it, the replica has to be the same major version (WAL streaming requires it) and then upgrading it to the latest version can take 10+ hours.

Definitely nowhere near 5 minutes downtime.


You'd want logical replication instead of physical replication. Logical rep doesn't stream the WAL as-is, it streams a representation of the statement that can be replayed on the destination.

It looks like this uses pglogical plugin but there's also a built in decoder I think called "pgoutput"


Self hosted postgres, not RDS.


Looks very interesting!

> The switch is made by putting the user on the source database in READ ONLY mode, so that it is not accepting any more writes and waits for the flush lag to be 0. It is up to user to kick of a rolling restart of your application containers or failover DNS (more on these below in strategies) after the switchover is complete, so that your application isn't sending any read/write requests to the old/source database.

This does mean that there will be a "downtime" during which the DB is in read-only mode, right? I understand this period can be really small, but that depends on how long the app deployment takes.


Thanks for taking a look!

Thats right, it was a typo. Mean't zero data loss and minimal downtime. There are some strategies in the readme like using weighted based DNS failover to reduce the downtime even less without requiring application deployment.


Simple LB like Haproxy works fine for this and similar cases. In very nutshell, Haproxy executes checks every n seconds (in your case it can be readonly mode checks ) and disables that upstream.

We use it over Mysql/Pg+patroni/Redis and works fine - we made tests for low load though, just up to 5k qps fot PG case and up to 80k rps for Redis case.


This can be done with pgbouncer and you can suspend the database so pgbouncer "holds" connections. If the application has high enough timeouts, it will look like it's just taking longer than normal to connect.


That specific case and if you can afford PgBouncer, I agree with you.

On a wider scale, Haproxy adds value - auto tracking on sync/async replicas, replica lag windows, falling back on master if all replicas lagging/down, so on.


Much appreciated. I was actually looking into a few options for this exact same thing. I will give this shot right away.


Thanks for giving it a try. Please don't hesitate to open bugs/feature requests.


I guess this is pretty similar to what heroku does for pg updates?

If using this tool for upgrades with minimal downtime, if I understand right, you need to orchestrate your db callers to switch over to the new primary at the right point, when it's ready? Tips for getting that to happen at just the right point to minimize downtime?


Great question: so you can watch the stats command. The command returns a JSON every second which has a switchover_completed_at field. It is updated the as soon as the switchover is complete.

UPDATE: Some more reading material here: https://github.com/shayonj/pg_easy_replicate#switchover-stra...


Very nice, thanks.


Connect through a single pgbouncer instance and use pgbouncer to orchestrate the switchover by hot reloading/updating the config


interesting, how does this compare to something like pg_auto_failover in a two or three-node PostgreSQL setup?


Let's say I have a Rails app and I want to do a major postgres version upgrade using this tool.

I see one of the requirements is that 'Both databases should have the same schema'. How should I manually load the schema to the new database (running on port 5433 for instance)?


Yeah I think the documentation is a bit ambiguous here. Postgres "schemas" have a very specific meaning, different from what most people think of when they hear the word. It's more like a table namespace.

https://www.postgresql.org/docs/current/ddl-schemas.html


I have historically had a “drop-create script” that could handle building me empty tables. Typically split up so that I can create tables, import (test) data from a snapshot or synthetic source, then enable constraints and indexes.

So even if a tool like that needed a little boost at the beginning to get going, I’d have something that already took care of that.


That’s a great question: internally what we have done is when preparing the new empty database, we also run a “bundle exec rake db:schema:load” against it. Depending on your setup, it can be hard to do.

I am exploring some options like loading schema from a schema.sql file or something along the lines of what pg_dump does for schema export.

Would love to hear any ideas.


The easy answer would be on some non-production copy of the Rails app, point at the new db in database.yml, and simply run `rails db:create; rails db:schema:load`, or some variation rails command like `rails db:prepare`.

Rather than this ad-hoc thing, you could also try to create a new rails "environment" for this use, I guess.


The postgres documentation for logical replication suggests using "pg_dump --schema-only" as a starting point.

I am sure the pg_easy_replicate tool brings something to the table(probably replication lifecycle management) but reading the docs, the bare postgres replication looks just as easy if not easier than the easy replicate tool. I am going to try it on my toy db and see how it goes.

https://www.postgresql.org/docs/13/logical-replication-restr...


This looks like a script that basically just runs all the necessary SQL for you. It looks pretty similar to an internal Python tool I came up with to do the same thing (basically just scripting out setting up logical rep and waiting for it to catch up, then executing switchover commands)


Thats right! Its mostly a CLI orchestration tool (mentioned in the readme), with some logging, stats and safety checks so that user/engineer can be mostly hands off during the process. Perhaps, even automate the entire process by wiring up different pg_easy_replicate commands together.

We do that with Tines, where we can be fully hands off (zero touch ops) and are able to kick off upgrade with switchover for tens of DBs (one at a time for now :D).


    Requirements
    ...
    Ruby 2.7 and later
Damn. This looked interesting right up until the Ruby requirement. :(


How is this valid criticism? This is an excellent use of a scripting language.


I would prefer to not have to install a new language on the server. There's Python and Perl already; you would certainly cover nearly all Linux installations with these two.

But it's a nice tool anyway.


I would prefer if it were written in Rust or Python. But this tool is very useful BTW.


What do you want them to use? Node.js?


Almost anything other than Ruby. Bad previous experiences with it have completely turned me off the entire ecosystem.

If it was "Ruby or Node.js" though, I'd just pick neither. ;)


Would it make sense to use this for backup/redundancy as well?

If not, what would better approaches be


For backup there's tools like barman and pg_backrest. If Postgres is virtualized, it's potentially faster to take disk spanshots & use PITR.

For redundancy (cluster management where you have hot standbys) there's tools like Patroni or Stolon

I don't have experience with Stolon but Patroni can help orchestrate backup tools for you (you still need to pick the tool and do some config)


Looks cool! Does it work with postgres dbs in AWS RDS?


Yep it does. We have used with AWS Aurora. You will need to update the parameter group settings to ensure logical replication is enabled.


Does it work with 9.6 version?


I know I am going to get fairly polarized responses on this one...

Too bad it is written in Ruby.


If you had included any content or justification in your comment, it might have gone differently. "I don't like software written in Ruby" is not interesting to anyone. No polarization.


What if it's written in Assembly (which is hard to read, maintained....) ?


I likecode written in Assembly. No problem with that. If it's written in Assembly in the first place (not disassembled from another language) it's usually not that hard to read...


Based.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: