PostgreSQL Replication
Now today with versions 9.X of PostgreSQL, we call it “streaming replication”, and allows you to setup multiple standby servers using both synchronous and asynchronous slaves. Any type of multi-master replication, however, still requires that you use one of the other independent replication options.
Configure Simple Master / Slave Streaming Replication in PostgreSQL in 9.X
Inasmuch as I recently had to setup a simple PostgreSQL master slave configuration recently, I thought that I jot down the notes and share them here to help others that needed to perform the same task someday. These steps assume that you installed your PostgreSQL similar to steps in the previous article: CentOS Install PostgreSQL. The slave server will be overwritten. Any data stored on it will be gone.
Step 1 – Configure SSH Keys for the PostgreSQL User
When the WAL logs get switched out, we typically will want to be able to copy them over to our slave so that we have less risk of losing data in the event of a disaster. Rather than manually copying these files, we can automate the task. However, to automate the copying of these files, we need to create an SSH key on the master server that we will copy over to the slave so that later the master can copy its WAL files over after each switch. Because we already wrote an article on creating ssh keys here at uptimemadeeasy.com, I will refer you to steps 2-5 of the article automate-file-copy-machines-using-rsync-over-ssh to accomplish this first task. Follow those steps, creating the ssh keys as the postgres user on the master server and then copy the key over to the postgres user’s $HOME/.ssh/authorized_keys file on the slave server. When all of this is done, come back here and we will begin the next steps.
Step 2 – Create a PostgreSQL Replication User
Now that we have our ssh keys figured out, our next step is to create a userid that the slave servers can use to contact PostgreSQL on this server. This is pretty simple using the create user syntax:
create user <username> replication encrypted password '<password-here>';
Now, we need to edit the pg_hba.conf file to allow this new replication user access to the master PostgreSQL server. I did this by adding this line to my file:
host replication rep my.ip.add.ress/32 md5
where my.ip.add.ress is the actual ip address of the slave server. After editing this pg_hba.conf file, I had to restart PostgreSQL to make it take.
Step 3 – Changes to the Master Server’s PostgreSQL.conf File
Now it is time to progress to the postgresql.conf file configuration. We have a list of configuration options that we need to change. I will list all of my changes in one spot and then discuss them a bit below:
listen_addresses = '*' wal_level = hot_standby wal_keep_segments = 10 archive_mode = on archive_command = '/usr/bin/rsync -qa %p postgres@prizm2.1on1.com:/data/pgsql/9.3/archive/%f' max_wal_senders = 1 hot_standby = on
Let’s go through each of these:
listen_addresses lists all of the addresses that you want to have your PostgreSQL listen on.listen_addresses lists all of the addresses that you want to have your PostgreSQL listen on.listen_addresses lists all of the addresses that you want to have your PostgreSQL listen on.listen_addresses lists all of the addresses that you want to have your PostgreSQL listen on.
Once you have your postgresql.conf file set on the master, restart the PostgreSQL server.
Name
|
Description
|
listen_addresses
|
Lists all of
the addresses that you want to have your PostgreSQL listen on. You can
set this to a comma separated list or use wildcards. Be careful using
wildcards as changing this value will allow different hosts to connect to it.
Using this option as well as a firewall such as iptables is a good
idea.
|
wal_keep_segments
|
This is the
minimum number of 16MB WAL files to retain in the pg_xlog directory.
More files may exist to handle a large checkpoint. If your slave
server frequently loses connection and has to catch up on lots of data,
setting this to a higher number can help. Remember that the you can
also set your WAL files to be rsynced to the slave server using the
archive_command option.
|
wal_level
|
Can be set to
one of these options to determine the level of wal logging done:
minimal, archive, or hot_standby. Must be set to archive or
preferably hot_standby for streaming replication as minimal does not log
enough information for replication.
|
archive_mode
|
Setting
archive_mode to on enables the WAL logs.
|
archive_command
|
if
archive_mode is set to “on” then this shell command is executed whenever a
WAL file segment (default is 16MB) is completed. In the case above, we
have it copying with rsync the file to the slave server. There are two
variables that can be used in your shell command: %f – filename or %p –
path.
|
max_wal_senders
|
Default is
Zero. Max_wal_senders is the number of concurrent connections
allowed from the slave servers. wal sender connections
count against the total number of connections configured in max_connections.
Be sure to have at least one connection configured in max_wal_senders
per configured slave.
|
hot_standby
|
default is
off. Determines if you can connect and run queries during times that
your server is in archive recovery or standby mode. This is helpful to
replication slaves.
|
4 – Changes to the Slave Server (PostgreSQL.conf and recovery.conf Files)
Next, we go to the slave server and configure its postgresql.conf file and the recovery.conf file.
Edit the postgresql.conf file and set:
hot_standby = on
Now, we create the recovery file in the postgresql data directory. On my repo installed postgresql server, I found a sample recovery file at: /usr/pgsql-9.3/share/recovery.conf.sample. This file tells PostgreSQL how to perform either an archive recovery of a database or how to connect to the master server in the event of replication like we are doing.
standby_mode = on
primary_conninfo = ‘host=pg1.uptimemadeeasy.com port=5432 user=rep password=<my rep user’s password>’
trigger_file = ‘/tmp/postgresql.trigger.5432’
primary_conninfo = ‘host=pg1.uptimemadeeasy.com port=5432 user=rep password=<my rep user’s password>’
trigger_file = ‘/tmp/postgresql.trigger.5432’
standby_modedescription
Step 5 – Materialize the Slave
Parameter
|
Description
|
standby_mode
|
When enabled,
standby_mode causes PostgreSQL to work as a standby in a replication
configuration
|
primary_conninfo
|
Specifies the
connection information used to connect to the master server. Example:
primary_conninfo = ‘host=host.domain.com port=5432 user=userid password
= password’
|
trigger_file
|
While running
as a standby, the slave will continue to restore the XLOG records from the
primary server. The trigger file is what is used to “trigger” a server to
leave standby mode and to begin being a primary server. You can
configure this option by specifying the trigger file to look for.
|
Before we can finalize the replication we need to get an initial copy of the database onto the slave machine. We can do this pretty easily by running the typical binary backup process (pg_start_backup and pg_stop_backup), but instead of backing up the PostgreSQL files, we are copying them to the slave server using rsync and the ssh key that we generated in step 1 above.
$ psql -c “select pg_start_backup(‘mybackup’);” $ rsync -cva --inplace --exclude=*pg_xlog* /data/pgsql/9.3/data/ postgres@pg2.1on1.com:/data/pgsql/9.3/data/ $ psql -c "select pg_stop_backup();"
Restart everything and hope for the best!
This is where we are hoping that we have everything working and, of course, everything really should be up and working once we startup the slave server.
Start the Slave Server
# service postgresql-9.3 start
Watch the log file as it starts up to make sure that there are no errors. So, I will tail the log being written today:
# tail -f /data/pgsql/9.3/data/pg_log/postgresql-Tue.log < 2014-03-24 23:15:47.670 MDT >LOG: started streaming WAL from primary at 5D9/87000000 on timeline 1 < 2014-03-24 23:17:20.233 MDT >LOG: consistent recovery state reached at 5D9/921BBF70 < 2014-03-24 23:17:20.234 MDT >LOG: database system is ready to accept read only connections
Great! Looks like everything is working. We can verify it is working by creating a test table and inserting data to see if they make it into the slave.
On the Master:
postgres=# create database joke; postgres=# \c joke; joke=# create table joketable ( jokeint int, jokechar varchar(40) ); CREATE TABLE joke=# insert into joketable values ( 1, '2 Guys walk into a bar...' ); INSERT 0 1
On the Slave:
joke=# select * from joketable; jokeint | jokechar ---------+--------------------------- 1 | 2 Guys walk into a bar... (1 row)
So, it appears to be working as our database, table, and row data made it to the slave.
Repairing Broken PostgreSQL Streaming Replication
So, I have already broken my PostgreSQL replication. Don’t stress! Take a deep breath. The best thing is that your slave is just a copy of the master’s data. We can just rebuild the slave. Here are the steps that I take to repair my slave when it is broken:
First – Shutdown the Slave and Copy the recovery.conf, postgresql.conf, and Any Other File You Wish to Keep, Then Rename the Data Directory
Shutdown the slave:
# service postgresql-9.3 stop
Make a copy of these important files:
$ cd /data/pgsql/9.3/data $ cp recovery.conf $HOME $ cp postgresql.conf $HOME $ mv $ cd /data/pgsql/9.3/data $ cd /data/pgsql/9.3/data-old
Second, Rematerialize Your Slave From the Master’s Copy
On the Master perform these steps again:
$ psql -c “select pg_start_backup(‘mybackup’);” $ rsync -cva --inplace --exclude=*pg_xlog* /data/pgsql/9.3/data/ postgres@pg2.1on1.com:/data/pgsql/9.3/data/ $ psql -c "select pg_stop_backup();"
Third, Copy Back the Recovery.conf and the Postgresql.conf Files and Restart the Slave
Remember when we backed up these files? We will put them back in place.
$ cp $HOME/recovery.conf /data/pgsql/9.3/data $ cp $HOME/postgresql.conf /data/pgsql/9.3/data
Startup the Slave PostgreSQL
# /sbin/service postgresql-9.3 start
Verify the logs to make sure that it is working.
Other Helpful PostgreSQL Streaming Replication Materials
pg_current_xlog_insert_location()# select
pg_current_xlog_insert_location();
pg_current_xlog_insert_location ——————————— 5DB/93301F20 (1 row) |
master
|
Get the current transaction log insert location
|
pg_current_xlog_location()# select
pg_current_xlog_location();
pg_current_xlog_location
————————– 5DB/93B442A8 (1 row) |
master
|
Get the current transaction log write location.
|
pg_switch_xlog()# select
pg_switch_xlog();
pg_switch_xlog —————- 5DB/93B6AC60 (1 row) |
master
|
Force PostgreSQL to switch to a new transaction log file.
|
pg_is_in_recovery()# select
pg_is_in_recovery();
pg_is_in_recovery ——————- t (1 row) |
master(returns f)slave (returns t)
|
This is a boolean return type. t for true, f for false. This
can help you identify if you are on the slave or master as the slave will
return t.
|
pg_last_xlog_receive_location# select
pg_last_xlog_receive_location();
pg_last_xlog_receive_location ——————————- 5DB/B9FC4880 (1 row) |
slave
|
Find the last transaction log location received that was synced to disk
through streaming replication. Will return a null if not on slave or if
replication is not enabled.
|
pg_last_xlog_replay_location# select pg_last_xlog_replay_location();
pg_last_xlog_replay_location —————————— 5DB/B9FD61 |
slave
|
Find the last transaction log location replayed during a recovery.
Returns NULL if replication is not enabled.
|
pg_last_xact_replay_timestamp()# select
pg_last_xact_replay_timestamp();
pg_last_xact_replay_timestamp ——————————- 2014-03-25 15:34:29.867547-06 (1 row) |
slave
|
Find the timestamp of the last transaction replayed during the recovery.
Reflects the time from the WAL record commit or abort for the
transaction.
|
pg_is_xlog_replay_paused()# select
pg_is_xlog_replay_paused();
pg_is_xlog_replay_paused ————————– f (1 row) |
slave
|
This is a boolean that returns true if the replay has been paused.
Let’s you know if your slave has been paused or not.
|
pg_xlog_replay_pause()# select
pg_xlog_replay_pause();
pg_xlog_replay_pause ———————-(1 row) |
slave
|
Will pause a slave if run. Shows a NULL, but still works –Be
careful.
|
pg_xlog_replay_resume()
|
slave
|
Will restart the slave when run. Shows a NULL, but still works to
start.
|
Here are some other great Links for you that should be useful:
Comments
Post a Comment