Drop Down MenusCSS Drop Down MenuPure CSS Dropdown Menu

Monitoring PostgreSQL Streaming Replication Manually

The people using PostgreSQL and the Streaming Replication feature seem to ask many of the same questions:
  1. How best to monitor Streaming Replication?
  2. What is the best way to do that?
  3. Are there alternatives, when monitoring on Standby, to using the pg_stat_replication view on Master?
  4. How should I calculate replication lag-time, in seconds, minutes, etc.?
In light of these commonly asked questions, I thought a blog would help. The following are some methods I’ve found to be useful.

Monitoring is critical for large infrastructure deployments where you have Streaming Replication for:
  1. Disaster recovery
  2. Streaming Replication is for High Availability
  3. Load balancing, when using Streaming Replication with Hot Standby
PostgreSQL has some building blocks for replication monitoring, and the following are some important

functions and views which can be use for monitoring the replication:
1. pg_stat_replication view on master/primary server.
This view helps in monitoring the standby on Master. It gives you the following details:
pid:              Process id of walsender process
usesysid:         OID of user which is used for Streaming replication.
usename:          Name of user which is used for Streaming replication
application_name: Application name connected to master
client_addr:      Address of standby/streaming replication
client_hostname:  Hostname of standby.
client_port:      TCP port number on which standby communicating with WAL sender
backend_start:    Start time when SR connected to Master.
state:            Current WAL sender state i.e streaming
sent_location:    Last transaction location sent to standby.
write_location:   Last transaction written on disk at standby
flush_location:   Last transaction flush on disk at standby.
replay_location:  Last transaction flush on disk at standby.
sync_priority:    Priority of standby server being chosen as synchronous standby
sync_state:       Sync State of standby (is it async or synchronous).

Example
postgres=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+---------------------------------
pid              | 1114
usesysid         | 16384
usename          | repuser
application_name | walreceiver
client_addr      | 172.17.0.3
client_hostname  |
client_port      | 52444
backend_start    | 15-MAY-14 19:54:05.535695 -04:00
state            | streaming
sent_location    | 0/290044C0
write_location   | 0/290044C0
flush_location   | 0/290044C0
replay_location  | 0/290044C0
sync_priority    | 0
sync_state       | async

2. pg_is_in_recovery() : Function which tells whether standby is still in recovery mode or not.
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
3. pg_last_xlog_receive_location: Function which tells location of last transaction log which was streamed by Standby and also written on standby disk.
postgres=# select pg_last_xlog_receive_location();
pg_last_xlog_receive_location
-------------------------------
0/29004560
(1 row)
4. pg_last_xlog_replay_location: Function which tells last transaction replayed during recovery process. e.g is given below:
postgres=# select pg_last_xlog_replay_location();
pg_last_xlog_replay_location
------------------------------
0/29004560
(1 row)

5. pg_last_xact_replay_timestamp: This function tells about the time stamp of last transaction which was replayed during recovery. Below is an example:
postgres=# select pg_last_xact_replay_timestamp();
 pg_last_xact_replay_timestamp
----------------------------------
15-MAY-14 20:54:27.635591 -04:00
(1 row)
  • Above are some important functions/views, which are already available in PostgreSQL for monitoring the streaming replication.
  • So, the logical next question is, “What’s the right way to monitor the Hot Standby with Streaming Replication on Standby Server?”
If you have Hot Standby with Streaming Replication, the following are the points you should monitor:

1. Check if your Hot Standby is in recovery mode or not:
For this you can use pg_is_in_recovery() function.

2.Check whether Streaming Replication is working or not.
And easy way of doing this is checking the pg_stat_replication view on Master/Primary. This view gives information only on master if Streaming Replication is working.

3. Check If Streaming Replication is not working and Hot standby is recovering from archived WAL file.
For this, either the DBA can use the PostgreSQL Log file to monitor it or utilize the following functions provided in PostgreSQL 9.3:
pg_last_xlog_replay_location();
pg_last_xact_replay_timestamp();

4. Check how far off is the Standby from Master.

There are two ways to monitor lag for Standby.
  i. Lags in Bytes: For calculating lags in bytes, users can use the pg_stat_replication view on the master with the function pg_xlog_location_diff function. Below is an example:
pg_xlog_location_diff(pg_stat_replication.sent_location, pg_stat_replication.replay_location)
which gives the lag in bytes.

ii. Calculating lags in Seconds. The following is SQL, which most people uses to find the lag in seconds:
SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()
             THEN 0
           ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
      END AS log_delay;
Including the above into your repertoire can give you good monitoring for PostgreSQL.
I will in a future post include the script that can be used for monitoring the Hot Standby with PostgreSQL streaming replication
This is a small tip to view the replication status of slave postgresql servers. The simplest way to see the replication state on a master server is executing this query:

select client_addr, state, sent_location, write_location,
        flush_location, replay_location from pg_stat_replication;
This query can output can be like this:
postgres=# select client_addr, state, sent_location,write_location, flush_location, replay_location, sync_priority from pg_stat_replication;
 client_addr  |   state   | sent_location | write_location | flush_location | replay_location | sync_priority
--------------+-----------+---------------+----------------+----------------+-----------------+---------------
 100.195.33.25 | streaming | 10/984C0878   | 10/984C0878    | 10/984C0878    | 10/984C0878     |             0
(1 row)
If the slave is up in hot standby mode, you can tell the time in seconds the delay of transactions applied on the slave with this query:
postgres=#  select now() - pg_last_xact_replay_timestamp() AS replication_delay;
 replication_delay
-------------------
 00:00:08.057668
(1 row)
In a very busy database, with many writes per second, this number will remain fairly accurate. However, in a system where there are few writes, the "replication_delay" will continually grow because the last replayed transaction timestamp isn't increasing (this is generally the same limitation as MySQL's SHOW SLAVE STATUS output).


5.view pg_stat_activity
The pg_stat_replication view will contain one row per WAL sender process, showing statistics about replication to that sender's connected standby server. Only directly connected standbys are listed; no information is available about downstream standby servers.
Column
Type
Description
pid
integer
Process ID of a WAL sender process
usesysid
oid
OID of the user logged into this WAL sender process
usename
name
Name of the user logged into this WAL sender process
application_name
text
Name of the application that is connected to this WAL sender
client_addr
inet
IP address of the client connected to this WAL sender. If this field is null, it indicates that the client is connected via a Unix socket on the server machine.
client_hostname
text
Host name of the connected client, as reported by a reverse DNS lookup of client_addr. This field will only be non-null for IP connections, and only when log_hostname is enabled.
client_port
integer
TCP port number that the client is using for communication with this WAL sender, or -1 if a Unix socket is used
backend_start
timestamp with time zone
Time when this process was started, i.e., when the client connected to this WAL sender
state
text
Current WAL sender state
sent_location
text
Last transaction log position sent on this connection
write_location
text
Last transaction log position written to disk by this standby server
flush_location
text
Last transaction log position flushed to disk by this standby server
replay_location
text
Last transaction log position replayed into the database on this standby server
sync_priority
integer
Priority of this standby server for being chosen as the synchronous standby
sync_state
text
Synchronous state of this standby server


Comments

Popular posts from this blog

How to find the server is whether standby (slave) or primary(master) in Postgresql replication ?

7 Steps to configure BDR replication in postgresql

How to Get Table Size, Database Size, Indexes Size, schema Size, Tablespace Size, column Size in PostgreSQL Database

Ora2PG - Oracle/MySQL to Postgres DB migration Version 20.0

PostgreSQL Introduction