PostgreSQL Hot Standby Steps
First, determine the mount points of the Postgresql data directories and the WAL (Write Ahead Log) directories. For performance, the $PGDATA and pg_xlog directories should be on separate volumes.
In the examples below, they are defined as follows:
For Example:
In the examples below, they are defined as follows:
- $PGDATA is the "master" cluster, located at /pgdata/9.3/data, at (for example) 192.168.100.0
- The "slave" (aka "hot standby") will be (for example) located at /pgdata/9.3/data, at (for example) 192.168.100.1
- The WAL staging area on the slave will be /pgdata/WAL_Archive. This is where the WAL segments are sent to from the archive_command on the master. Ideally on a separate volume from $PGDATA.
- The master cluster is up and running normally.
- There is no slave cluster yet.
- You are doing all operations as the "postgres" OS user (not root!)
- ssh between both servers is working
- You are using at least PostgreSQL 9.2
- pg_hba.conf and postgresql.conf are located at $PGDATA/. If not, change the instructions below to match your location.
- You are setting up the slave on a separate server.
- If the servers are not on the same network (eg. different colocations), use the "-z" flag in the "archive_command" rsync on the master to compress before transferring.
- These commands are for Linux, but with some substitution of commands should work on Windows too.
mkdir -p /pgdata/WAL_Archive
chown postgres:postgres /pgdata/WAL_Archive
Step 2.On the master, edit the $PGDATA/postgresql.confwal_level = hot_standby
archive_mode = on ## /pgdata/WAL_Archive is a staging directory on the slave
archive_command = 'rsync -W -az %p postgres@$SLAVE_IP_HERE:/pgdata/WAL_Archive/'
max_wal_senders = 5
wal_keep_segments = 5000 # If you have the room, to help the pg_basebackup
# not fail due to WAL segments being removed from master.
# For clusters will very little traffic, 100 is probably fine
Step 3.On the master, create the replication role, which will be copied to the slave via pg_basebackup. Set a password for the "replication" rolepsql -U postgres -d postgres -c "CREATE USER replication WITH replication ENCRYPTED PASSWORD 'changeme' LOGIN"
Step 4.Modify the master $PGDATA/pg_hba.conf and enable the replication user for the IP of the slave# TYPE DATABASE USER ADDRESS METHOD
#hostssl replication replication $SLAVE_IP_HERE/32 md5
host replication replication $SLAVE_IP_HERE/32 md5
Step 5.Restart the master cluster to pick up the changes to the postgresql.conf. This is done as the cluster superuser.For Example:
pg_ctl -D $PGDATA restart -m fast
## The master cluster MUST be restarted before the pg_basebackup command is executed.
Step 6.On the slave, from $HOME, issue the pg_basebackup command to start setting up the hot standby from the master.## --host=IP_OF_MASTER -> The master's IP
## --pgdata=$PGDATA -> The slave's $PGDATA directory
## --xlog-method=stream -> Opens a second connection to the master to stream the WAL segments rather than pulling them all at the end
## --password will prompt for the replication role's password
## Without compression, "stream" gets the changes via the same method as Streaming Replication time
pg_basebackup --pgdata=$PGDATA --host=IP_OF_MASTER --port=5432 --username=replication --password --xlog-method=stream --format=plain --progress --verbose
## Alternate version with compression, note "--xlog --gzip --format=tar"
#time pg_basebackup --pgdata=$PGDATA --host=IP_OF_MASTER --port=5432 --username=replication --password --xlog --gzip --format=tar --progress --verbose
Step 7.On the slave, after pg_basebackup has successfully completed, edit $PGDATA/postgresql.confhot_standby = on #off # "on" allows queries during recovery
max_standby_archive_delay = 15min # max delay before canceling queries,
# set to hours if backups will be taken from here
max_standby_streaming_delay = 15min # max delay before canceling queries
hot_standby_feedback = on #off
Step 8.On the slave, create $PGDATA/recovery.conf:standby_mode = on
## To promote the slave to a live database, issue "touch /tmp/promote_db"
## Warning: If multiple slaves share the same /tmp/ directory,
## then the trigger file must be named uniquely, else multiple slaves
## could attempt to be promoted in the presence of the trigger file.
trigger_file = '/tmp/promote_db_slave'
## Host can be the master's IP or hostname
primary_conninfo = 'host=IP_OF_MASTER port=5432 user=replication password=CHANGEME'
## Log the standby WAL segments applied to a standby.log file
## TODO: Add the standby.log to a log rotator
## The paths must be explicitly defined, including the path to pg_archivecleanup
restore_command = 'cp /pgdata/WAL_Archive/%f "%p" 2>>/pgdata/9.3/data/pg_log/standby.log'
## XXX: If multiple slaves share the staging WAL directory,
## do not use pg_archivecleanup as WAL segments could be removed
## before being applied to other slaves.
archive_cleanup_command = '/usr/pgsql-9.3/bin/pg_archivecleanup /pgdata/WAL_Archive %r'
## On hot standby clusters, set to 'latest' to switch to the newest timeline in the archive
recovery_target_timeline = 'latest'
Step 9 .The slave should be ready to be started now. Start the cluster on the slave with whatever method works best.pg_ctl -D $PGDATA start
Step 10.The following are the commands I run when recreating a standby over and over for test. I add them to a script called /root/recreate_standby.sh. Runs as root, but that is not necessary - if you run as postgres, removed the "sudo su - postgres -c" command.#!/bin/bash
## This script runs on the standby.
## Executed as root, else remove the "sudo - postgres -c" commands.
## Assumes you have a valid recovery.conf saved at
## $PGDATA/../recovery.conf.bkp
export PGDATA=/path/to/data/dir ## Must be set correctly
export PGPORT=5432
export MASTER=192.168.x.x ## IP or host entry for the master Postgresql server
export PGBIN=/usr/pgsql-9.3/bin
service postgresql-9.3 stop -m immediate
if [ $? != 0 ]; then
service postgresql-9.3 start
echo "Could not shut down PostgreSQL. Aborting."
exit 1
fi
rm -rf $PGDATA
if [ $? != 0 ]; then
echo "Could not remove the PostgreSQL $PGDATA dir. Aborting."
exit 1
fi
## If the replication role is not set to "trust" in the master's
## pg_hba.conf file, the password will need to be passed into the command below,
## and "--no-password" will need to be removed or revised to be "--password"
su - postgres -c "$PGBIN/pg_basebackup --pgdata=$PGDATA --host=$MASTER --port=$PGPORT --username=replication --no-password --xlog-method=stream --format=plain --progress --verbose"
su - postgres -c "cp -p $PGDATA/../recovery.conf.bkp $PGDATA/recovery.conf"
service postgresql-9.3 start
su - postgres -c "$PGBIN/pg_isready -U postgres -p $PGPORT -t2"
while [ $? != 0 ]; do
echo "Sleep 1 second, check if slave is up yet. If not, sleep again."
sleep 1;
su - postgres -c "$PGBIN/pg_isready -U postgres -p $PGPORT -t2"
done
su - postgres -c "$PGBIN/psql -d postgres -U postgres -qXc 'select pg_is_in_recovery() as is_pg_in_recovery'"
exit 0
Comments
Post a Comment