PostgreSQL pg_rewind
pg_rewind makes it possible to efficiently bring an old primary in sync with a new primary without having to perform a full base backup. This works by looking in the Write Ahead Log to see which pages have been modified, and only copying across those pages.
In this example, we have a primary (running on port 5530) and a standby subscribing to it (on port 5531):
# SELECT * FROM pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid | 11609 usesysid | 16384 usename | rep_user application_name | standby1 client_addr | 127.0.0.1 client_hostname | client_port | 38434 backend_start | 2015-03-29 00:11:55.243319+00 backend_xmin | state | streaming sent_location | 0/C81BB40 write_location | 0/C81BB40 flush_location | 0/C81BB40 replay_location | 0/C81BB40 sync_priority | 0 sync_state | async
Now we'll promote the standby:
$ pg_ctl promote -D standby1 server promoting
And we'll make some changes on this instance:
$ psql -p 5531 postgres # CREATE TABLE x (content text); CREATE TABLE # INSERT INTO x SELECT 'test' FROM generate_series(1,1000); INSERT 0 1000
Now we'll stop old primary and use pg_rewind to re-synchronise it:
$ pg_ctl stop -D primary waiting for server to shut down.... done server stopped $ pg_rewind -D primary --source-server='host=localhost port=5531' -P connected to remote server The servers diverged at WAL position 0/C81BB40 on timeline 1. Rewinding from last common checkpoint at 0/2000060 on timeline 1 reading source file list reading target file list reading WAL in target Need to copy 274 MB (total source directory size is 290 MB) 281142/281142 kB (100%) copied creating backup label and updating control file Done!
And we'll make some changes to get it to subscribe to the new primary:
$ cd primary $ mv recovery.{done,conf} $ vi recovery.conf # edited to set host info to point to port 5531 in this case $ vi postgresql.conf # as our example instances are running on the same server, we'll just change the port so it doesn't conflict
Then start the new standby (old primary):
$ pg_ctl start -D primary
Let's see if it's successfully caught up:
$ psql -p 5531 postgres # connect to the new primary # SELECT * FROM pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid | 11837 usesysid | 16384 usename | rep_user application_name | standby1 client_addr | 127.0.0.1 client_hostname | client_port | 49148 backend_start | 2015-03-29 00:22:39.047657+00 backend_xmin | state | streaming sent_location | 0/C8559B0 write_location | 0/C8559B0 flush_location | 0/C8559B0 replay_location | 0/C855978 sync_priority | 0 sync_state | async
And see if the test data from the new primary is on the new standby:
$ psql -p 5530 postgres # connect to the new standby # SELECT COUNT(*) FROM x; count ------- 1000 (1 row)
All synchronised.
PostgreSQL 9.5 – ADD PG_REWIND, FOR RE-SYNCHRONIZING A MASTER SERVER AFTER FAILBACK
PostgreSQL 9.5 – ADD PG_REWIND, FOR RE-SYNCHRONIZING A MASTER SERVER AFTER FAILBACK
d pg_rewind, for re-synchronizing a master server after failback. Earlier versions of this tool were available (and still are) on github
So, we have a situation, where we have master and slave, some kind of replication, and we have to failover. This is trivial – just promote slave to standalone, and we're good. But we don't have slave then. Normal procedure is to setup slave on old master from scratch, using full data dir sync. But with non-trivial datasizes, it can take long time.pg_rewind should solve the problem by just working on xlogs.Let's see if it will really work that way. For added difficulty I will try to make it so that old master still did *some* work after the failover – this work should be “rolled back" after it will be set to slave from newly elected master.To test it we will need two PostgreSQL's, but first I'll start with just master:With this in place, I have master running, on port 5435. It's very under-configured, but I'm mostly fine with it, just need some small changes so:I set the wal size so small, so that it will be simple to show how the situation works when all necessary wal files are no longer in pg_xlog/ directory.With this in place, I just need to allow replication connections:And of course, restart Pg to make all the setting come live:Now, I need some test data to work on, and show that rewinding history works. So, I'll make a simple table:This generates this, very simple, table:Where id is integer 1..1000000, payload is just long string to make the table and updates heavier, and last_updated is timestamptz that we will update.To update the fields I will use:The where clause is complicated, but I wanted to update semi-random row, but make it so that the same script ran on slave will update different rows. I will put slave on port 5436, so master will always update even rows (with i being divisible by 2), and slave will be updating odd rows.I repeat such update 10 times in a transaction, and as soon as one transaction is done, I start another.After a minute or so, I can see:Now, with this in place, I can setup slave (the updater process still goes on):To have slave working, I will need to change port, and make sure it knows how to recover xlogs from archive, so:With this done, I start slave:and wait for slave to catchup with replication. For this I used this query:Repeating it every 2 seconds, I got:I see that replication is working steadily (the sent/write lags happen every so often, but are immediately cleared). So we're good.Now. for the actual pg_rewind test. I will promote slave to standalone/master:After it got promoted (the update script on old master is still running), we are starting to see differences:Old master is still updating, but new master stopped updated when the promotion happened. So now, I'll stop the updating script, and rerun it on new master…Now, we're getting newer values from 5436, but also please note that newest even rows are older than on old master:And now it's time to bring old master to become slave.First, I need to stop old master:Now I run pg_rewind:This is OK. I purposely made master keep as little xlogs as possible, so now I need to copy some back. Quick inspection shows:So I'll just copy all xlogs from 9B to B5:With this in place, I rerun pg_rewind:This took ~ 1 second. Now – after running pg_rewind, state of /var/tmp/master is like if I had run pg_basebackup from “slave" (current master), including bad port (5436), and no sensible recovery.conf. Let's fix that:After this was done, I should be able to start new slave:After couple of seconds I can see that old master, current slave, is replicating – first from wal archive:Couple more minutes, and streaming kicks-in:Of course, on both server, the “newest" rows will be odd – updated on slave:but the interesting thing is – what will be the newest even rows now?Before pg_rewind old master has newer values than new master. And now:Same thing. So, true to its name, pg_rewind removed changes from old master that didn't make it to new master. And that's just great. Thanks to all involved, pg_rewind will clearly be very useful.
Comments
Post a Comment