PostgreSQL repmgr + VIP + logical replication 高可用性與邏輯同步架構安裝實做版本 PostgreSQL 14.2 repmgr 5.3 (3) repmgr Cluster設定

以下將介紹 repmgr Cluster 設定

Step 1. 建立repmgr replication superuser account以及repmgr同步資料庫

postgres=# create user repmgr superuser password 'password';
CREATE ROLE
postgres=# create database repmgr owner repmgr;
CREATE DATABASE

Step 2. 調整~/.pgpass內容

[pgadm@pgrep1 ~]$ cat ~/.pgpass 
pgrep1:5699:postgres:pgadm:password
pgrep2:5699:postgres:pgadm:password
pgrep1:5699:repmgr:repmgr:password
pgrep2:5699:repmgr:repmgr:password

Step 3. 調整pg_hba.conf能讓repmgr連線，順序由上往下、由左往右很重要 (兩邊主機的連線都要填上)

[pgadm@pgrep1 ~]$ vim $PGDATA/pg_hba.conf
local   all             all                                     trust
host    all             repmgr         172.30.0.21/32           trust
host    all             repmgr         172.30.0.22/32           trust
...
local   replication     all                                     trust
host    replication     repmgr          172.30.0.21/32          trust
host    replication     repmgr          172.30.0.22/32          trust

Step 4. 設定完後重讀即可 (不需重啟)

[pgadm@pgrep1 ~]$ pg_ctl reload

Step 5. 在另外一台測試連線

[pgadm@pgrep2 ~]$ psql 'host=pgrep1 user=repmgr dbname=repmgr connect_timeout=2'
psql (14.2)
Type "help" for help.

repmgr=# \q

Step 6. 在節點1，建立repmgr.conf設定內容，檔案存放於 /pgdata/repmgr.conf，也可以存放到 /etc/repmgr.conf

[pgadm@pgrep1 ~]$ vim /pgdata/repmgr.conf

node_id=1
node_name='pgrep1'
conninfo='host=pgrep1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/pgdata/dbdata'
replication_user='repmgr'
replication_type='physical'
repmgr_bindir='/pgbin/pghome_1/bin'
pg_bindir='/pgbin/pghome_1/bin'
log_level='info'
log_file='/tmp/repmgr.log'
connection_check_type=ping
reconnect_attempts=3
reconnect_interval=10
monitoring_history=yes
monitor_interval_secs=5
failover='automatic'
promote_command='/pgdata/dbdata/repmgr_promote.sh'
follow_command='/pgdata/dbdata/repmgr_follow.sh %n'

Step 7. 在節點1，建立script /pgdata/dbdata/repmgr_follow.sh

#! /bin/bash
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: follow $1" >> /tmp/repmgr.log 
/pgbin/pghome_1/bin/repmgr standby follow -f /pgdata/repmgr.conf --upstream-node-id=$1 --log-to-file

Step 8. 在節點1，建立script /pgdata/dbdata/repmgr_promote.sh

#!/bin/bash

REPMGRLOG_PATH=/tmp/repmgr.log
IFCONFIG_IF=ens192

#failover
function failover(){
   echo "failover Start" >> $REPMGRLOG_PATH
   /pgbin/pghome_1/bin/repmgr standby promote -f /pgdata/repmgr.conf --log-to-file >> $REPMGRLOG_PATH
   echo "failover Success." >> $REPMGRLOG_PATH
}

#remove VIP
function delete_vip(){
   /bin/ssh -t pgadm@pgrep2 "/bin/sudo /usr/sbin/ip addr del 172.30.0.23/24 dev " $IFCONFIG_IF  >> $REPMGRLOG_PATH
}

#add VIP
function add_vip(){
   /bin/sudo /usr/sbin/ip addr add 172.30.0.23/24 dev $IFCONFIG_IF  >> $REPMGRLOG_PATH
}

#check connection using telnet, if failed then failover
result=`echo -e "\n" | telnet pgrep2 5699 2>/dev/null | grep Connected | wc -l`

#telnet normal then pass
#if connection failed then start failover.
if [ $result -eq 1 ]; then
#      echo "Database Normal." >> $REPMGRLOG_PATH 
      exit 0
else
      echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: remove VIP start." >> $REPMGRLOG_PATH
      delete_vip
      echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: remove VIP finish." >> $REPMGRLOG_PATH
      echo "Connection fail, failover start." >> $REPMGRLOG_PATH 
      echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: promote start" >> $REPMGRLOG_PATH 
      failover    
      echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: promote finish" >> $REPMGRLOG_PATH
fi

#check database when standby conevert to primary

standby_flg=`psql -p 5699 -U repmgr -h localhost -At -c "SELECT pg_is_in_recovery();"`
if [ ${standby_flg} == 'f' ]; then
    echo "database switch to primary. delete VIP"
    echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: remove VIP start." >> $REPMGRLOG_PATH 
      delete_vip
    echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: remove VIP finish." >> $REPMGRLOG_PATH
    echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: add VIP start." >> $REPMGRLOG_PATH 
      add_vip
    echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: add VIP finish." >> $REPMGRLOG_PATH 
    exit 0

elif [ ${standby_flg} == 't' ]; 
    then 
     echo e "`date +%F\ %T`: database switch failed, current database is standby.!\n" >>$REPMGRLOG_PATH
     exit 0 
fi

Step 9. 在節點2，建立repmgr.conf設定內容

[pgadm@pgrep2 ~]$ vim /pgdata/repmgr.conf

node_id=2
node_name='pgrep2'
conninfo='host=pgrep2 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/pgdata/dbdata'
replication_user='repmgr'
replication_type='physical'
repmgr_bindir='/pgbin/pghome_1/bin'
pg_bindir='/pgbin/pghome_1/bin'
log_level='debug'
log_file='/tmp/repmgr.log'
connection_check_type=ping
reconnect_attempts=3
reconnect_interval=10
monitoring_history=yes
monitor_interval_secs=5
failover='manual'
promote_command='/pgdata/repmgr_promote.sh'
follow_command='/pgdata/repmgr_follow.sh %n'

Step 10. 在節點2，建立script /pgdata/dbdata/repmgr_follow.sh

#! /bin/bash
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: follow $1" >> /tmp/repmgr.log 
/pgbin/pghome_1/bin/repmgr standby follow -f /pgdata/repmgr.conf --upstream-node-id=$1 --log-to-file

Step 11. 在節點2，建立script /pgdata/dbdata/repmgr_promote.sh

#!/bin/bash

REPMGRLOG_PATH=/tmp/repmgr.log
IFCONFIG_IF=ens192

#failover
function failover(){
   echo "failover Start" >> $REPMGRLOG_PATH
   /pgbin/pghome_1/bin/repmgr standby promote -f /pgdata/repmgr.conf --log-to-file >> $REPMGRLOG_PATH
   echo "failover Success." >> $REPMGRLOG_PATH
}

#remove VIP
function delete_vip(){
   /bin/ssh -t pgadm@pgrep1 "/bin/sudo /usr/sbin/ip addr del 172.30.0.23/24 dev " $IFCONFIG_IF  >> $REPMGRLOG_PATH
}

#add VIP
function add_vip(){
   /bin/sudo /usr/sbin/ip addr add 172.30.0.23/24 dev $IFCONFIG_IF  >> $REPMGRLOG_PATH
}

#check connection using telnet, if failed then failover
result=`echo -e "\n" | telnet pgrep1 5699 2>/dev/null | grep Connected | wc -l`

#telnet normal then pass
#if connection failed then start failover.
if [ $result -eq 1 ]; then
#      echo "Database Normal." >> $REPMGRLOG_PATH 
      exit 0
else
      echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: remove VIP start." >> $REPMGRLOG_PATH
      delete_vip
      echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: remove VIP finish." >> $REPMGRLOG_PATH
      echo "Connection fail, failover start." >> $REPMGRLOG_PATH 
      echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: promote start" >> $REPMGRLOG_PATH 
      failover    
      echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: promote finish" >> $REPMGRLOG_PATH
fi

#check database when standby conevert to primary

standby_flg=`psql -p 5699 -U repmgr -h localhost -At -c "SELECT pg_is_in_recovery();"`
if [ ${standby_flg} == 'f' ]; then
    echo "database switch to primary. delete VIP"
    echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: remove VIP start." >> $REPMGRLOG_PATH 
      delete_vip
    echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: remove VIP finish." >> $REPMGRLOG_PATH
    echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: add VIP start." >> $REPMGRLOG_PATH 
      add_vip
    echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: add VIP finish." >> $REPMGRLOG_PATH 
    exit 0

elif [ ${standby_flg} == 't' ]; 
    then 
     echo e "`date +%F\ %T`: database switch failed, current database is standby.!\n" >>$REPMGRLOG_PATH
     exit 0 
fi

Step 12. 註冊Primary Database

[pgadm@pgrep1 ~]$ repmgr -f /pgdata/repmgr.conf primary register
INFO: connecting to primary database...
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep1 fallback_application_name=repmgr options=-csearch_path="
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered

Step 13. 確認Cluster狀態

[pgadm@pgrep1 ~]$ repmgr -f /pgdata/repmgr.conf cluster show
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep1 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep1 fallback_application_name=repmgr options=-csearch_path="
 ID | Name   | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                      
----+--------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------
 1  | pgrep1 | primary | * running |          | default  | 100      | 1        | host=pgrep1 user=repmgr dbname=repmgr connect_timeout=2

Step 14. 在節點2，Clone Primary Database (測試)

[pgadm@pgrep2 ~]$ repmgr -h pgrep1 -U repmgr -d repmgr -f /pgdata/repmgr.conf standby clone --dry-run
NOTICE: destination directory "/pgdata/dbdata" provided
INFO: connecting to source node
DETAIL: connection string is: host=pgrep1 user=repmgr dbname=repmgr
DETAIL: current installation size is 32 MB
INFO: "repmgr" extension is installed in database "repmgr"
DEBUG: 1 node records returned by source node
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep1 fallback_application_name=repmgr options=-csearch_path="
WARNING: target data directory appears to be a PostgreSQL data directory
DETAIL: target data directory is "/pgdata/dbdata"
HINT: use -F/--force to overwrite the existing data directory
DEBUG: upstream_node_id determined as 1
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: would execute:
  /pgbin/pghome_1/bin/pg_basebackup -l "repmgr base backup"  -D /pgdata/dbdata -h pgrep1 -p 5699 -U repmgr -X stream 
INFO: all prerequisites for "standby clone" are met

Step 15. 在節點2，Clone Primary Database (正式執行)

[pgadm@pgrep2 ~]$ repmgr -h pgrep1 -U repmgr -d repmgr -f /pgdata/repmgr.conf standby clone
NOTICE: destination directory "/pgdata/dbdata" provided
INFO: connecting to source node
DETAIL: connection string is: host=pgrep1 user=repmgr dbname=repmgr
DETAIL: current installation size is 32 MB
DEBUG: 1 node records returned by source node
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep1 fallback_application_name=repmgr options=-csearch_path="
DEBUG: upstream_node_id determined as 1
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: checking and correcting permissions on existing directory "/pgdata/dbdata"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /pgbin/pghome_1/bin/pg_basebackup -l "repmgr base backup"  -D /pgdata/dbdata -h pgrep1 -p 5699 -U repmgr -X stream 
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /pgdata/dbdata start
HINT: after starting the server, you need to register this standby with "repmgr standby register"

Step 16. 啟動Standby服務

[pgadm@pgrep2 ~]$ pg_ctl start -D $PGDATA
waiting for server to start....2022-03-21 13:41:50.923 CST [18049] LOG:  redirecting log output to logging collector process
2022-03-21 13:41:50.923 CST [18049] HINT:  Future log output will appear in directory "/pgdata/pglog".
 done
server started

Step 17. 確認Streaming Replication狀態 (Primary)

postgres=# \x
Expanded display is on.
postgres=# SELECT * FROM pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 57278
usesysid         | 16385
usename          | repmgr
application_name | pgrep2
client_addr      | 172.30.0.22
client_hostname  | 
client_port      | 53436
backend_start    | 2022-03-21 13:41:53.330463+08
backend_xmin     | 
state            | streaming
sent_lsn         | 0/5000358
write_lsn        | 0/5000358
flush_lsn        | 0/5000358
replay_lsn       | 0/5000358
write_lag        | 
flush_lag        | 
replay_lag       | 
sync_priority    | 0
sync_state       | async
reply_time       | 2022-03-21 13:49:07.501854+08

Step 18. 確認Streaming Replication狀態 (Standby)

postgres=# \x 
Expanded display is on.
postgres=# SELECT * FROM pg_stat_wal_receiver;
-[ RECORD 1 ]---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid                   | 18055
status                | streaming
receive_start_lsn     | 0/5000000
receive_start_tli     | 1
written_lsn           | 0/5000358
flushed_lsn           | 0/5000358
received_tli          | 1
last_msg_send_time    | 2022-03-21 13:51:00.128178+08
last_msg_receipt_time | 2022-03-21 13:50:57.754794+08
latest_end_lsn        | 0/5000358
latest_end_time       | 2022-03-21 13:43:59.154199+08
slot_name             | 
sender_host           | pgrep1
sender_port           | 5699
conninfo              | user=repmgr passfile=/home/pgadm/.pgpass channel_binding=disable connect_timeout=2 dbname=replication host=pgrep1 port=5699 application_name=pgrep2 fallback_application_name=repmgrclu sslmode=disable sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any

Step 19. 註冊Standby Database

[pgadm@pgrep2 pgdata]$ repmgr -f /pgdata/repmgr.conf standby register
INFO: connecting to local node "pgrep2" (ID: 2)
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep2 fallback_application_name=repmgr options=-csearch_path="
INFO: connecting to primary database
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep1 fallback_application_name=repmgr options=-csearch_path="
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
INFO: standby registration complete
NOTICE: standby node "pgrep2" (ID: 2) successfully registered

Step 20. 確認Cluster狀態

[pgadm@pgrep2 pgdata]$ repmgr -f /pgdata/repmgr.conf cluster show
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep1 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep1 fallback_application_name=repmgr options=-csearch_path="
 ID | Name   | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                      
----+--------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------
 1  | pgrep1 | primary | * running |          | default  | 100      | 1        | host=pgrep1 user=repmgr dbname=repmgr connect_timeout=2
 2  | pgrep2 | standby |   running | pgrep1   | default  | 100      | 1        | host=pgrep2 user=repmgr dbname=repmgr connect_timeout=2

參考來源：

REPMGR高可用+VIP方案

PostgreSQL repmgr + VIP + logical replication 高可用性與邏輯同步架構安裝實做版本 PostgreSQL 14.2 repmgr 5.3 (1) 架構介紹與前置作業

PostgreSQL repmgr + VIP + logical replication 高可用性與邏輯同步架構安裝實做版本 PostgreSQL 14.2 repmgr 5.3 (2) PostgreSQL、repmgr 安裝與 PostgreSQL 資料庫設定

PostgreSQL repmgr + VIP + logical replication 高可用性與邏輯同步架構安裝實做版本 PostgreSQL 14.2 repmgr 5.3 (3) repmgr Cluster設定

PostgreSQL repmgr + VIP + logical replication 高可用性與邏輯同步架構安裝實做版本 PostgreSQL 14.2 repmgr 5.3 (4) repmgr switchover/failover 與 VIP 權限設定

PostgreSQL repmgr + VIP + logical replication 高可用性與邏輯同步架構安裝實做版本 PostgreSQL 14.2 repmgr 5.3 (5) repmgr auto failover

PostgreSQL repmgr + VIP + logical replication 高可用性與邏輯同步架構安裝實做版本 PostgreSQL 14.2 repmgr 5.3 (6) logical replication

PostgreSQL repmgr + VIP + logical replication 高可用性與邏輯同步架構安裝實做版本 PostgreSQL 14.2 repmgr 5.3 (7) Config log rotation & systemd & selinux