上一篇promote腳本有使用到ip addr del/add的功能,因此需要以sudo的方式去執行
調整sudoers
Step 1. 每台節點都要調整sudoers
[root@pgrep1 ~]# visudo /etc/sudoers pgadm ALL = NOPASSWD: /usr/sbin/ip
Step 2. 測試ip操作正不正常
[pgadm@pgrep1 ~]$ sudo ip addr add 172.30.0.23/24 dev ens192 [pgadm@pgrep1 ~]$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 ... 2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:50:56:ba:6d:2d brd ff:ff:ff:ff:ff:ff inet 172.30.0.21/24 brd 172.30.0.255 scope global noprefixroute ens192 valid_lft forever preferred_lft forever inet 172.30.0.23/24 scope global secondary ens192 valid_lft forever preferred_lft forever inet6 fe80::250:56ff:feba:6d2d/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: ens224: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 ...
repmgr swicthover 操作
Step 1. dry-run 測試執行 (需要在Standby端執行)
[pgadm@pgrep2 ~]$ repmgr -f /pgdata/repmgr.conf standby switchover --dry-run NOTICE: checking switchover on node "pgrep2" (ID: 2) in --dry-run mode INFO: SSH connection to host "pgrep1" succeeded INFO: able to execute "repmgr" on remote host "pgrep1" INFO: 1 walsenders required, 10 available INFO: demotion candidate is able to make replication connection to promotion candidate INFO: 0 pending archive files INFO: replication lag on this standby is 0 seconds NOTICE: attempting to pause repmgrd on 2 nodes INFO: would pause repmgrd on node "pgrep1" (ID: 1) INFO: would pause repmgrd on node "pgrep2" (ID: 2) NOTICE: local node "pgrep2" (ID: 2) would be promoted to primary; current primary "pgrep1" (ID: 1) would be demoted to standby INFO: following shutdown command would be run on node "pgrep1": "/pgbin/pghome_1/bin/pg_ctl -D '/pgdata/dbdata' -W -m fast stop" INFO: parameter "shutdown_check_timeout" is set to 60 seconds INFO: prerequisites for executing STANDBY SWITCHOVER are met
Step 2. 執行 switchover (需要在Standby端執行)
[pgadm@pgrep2 ~]$ repmgr -f /pgdata/repmgr.conf standby switchover --log-level=debug DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep2 fallback_application_name=repmgr options=-csearch_path=" NOTICE: executing switchover on node "pgrep2" (ID: 2) DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep1 fallback_application_name=repmgr options=-csearch_path=" DEBUG: remote_command(): ssh -o Batchmode=yes -q -o ConnectTimeout=10 pgrep1 /pgbin/pghome_1/bin/repmgr -f /pgdata/repmgr.conf -L debug --version >/dev/null 2>&1 && echo "1" || echo "0" DEBUG: remote_command(): ssh -o Batchmode=yes -q -o ConnectTimeout=10 pgrep1 /pgbin/pghome_1/bin/repmgr -f /pgdata/repmgr.conf -L debug --version 2>/dev/null DEBUG: "repmgr" version on "pgrep1" is 50301 DEBUG: remote_command(): ssh -o Batchmode=yes -q -o ConnectTimeout=10 pgrep1 test -f /pgdata/repmgr.conf && echo 1 || echo 0 DEBUG: remote_command(): ssh -o Batchmode=yes -q -o ConnectTimeout=10 pgrep1 /pgbin/pghome_1/bin/repmgr -f /pgdata/repmgr.conf -L debug node check --data-directory-config --optformat -LINFO 2>/dev/null DEBUG: remote_command(): ssh -o Batchmode=yes -q -o ConnectTimeout=10 pgrep1 /pgbin/pghome_1/bin/repmgr -f /pgdata/repmgr.conf -L debug node check --replication-config-owner --optformat -LINFO 2>/dev/null DEBUG: remote_command(): ssh -o Batchmode=yes -q -o ConnectTimeout=10 pgrep1 /pgbin/pghome_1/bin/repmgr -f /pgdata/repmgr.conf -L debug node check --remote-node-id=2 --replication-connection DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep1 fallback_application_name=repmgr options=-csearch_path=" DEBUG: remote_command(): ssh -o Batchmode=yes -q -o ConnectTimeout=10 pgrep1 /pgbin/pghome_1/bin/repmgr -f /pgdata/repmgr.conf -L debug node check --terse -LERROR --archive-ready --optformat DEBUG: lag is 0 NOTICE: attempting to pause repmgrd on 2 nodes DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep1 fallback_application_name=repmgr options=-csearch_path=" DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep2 fallback_application_name=repmgr options=-csearch_path=" DEBUG: pausing repmgrd on node "pgrep1" (ID: 1) DEBUG: pausing repmgrd on node "pgrep2" (ID: 2) NOTICE: local node "pgrep2" (ID: 2) will be promoted to primary; current primary "pgrep1" (ID: 1) will be demoted to standby NOTICE: stopping current primary node "pgrep1" (ID: 1) DEBUG: remote_command(): ssh -o Batchmode=yes -q -o ConnectTimeout=10 pgrep1 /pgbin/pghome_1/bin/repmgr -f /pgdata/repmgr.conf -L debug node service --action=stop --checkpoint DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep1 fallback_application_name=repmgr options=-csearch_path=" NOTICE: issuing CHECKPOINT on node "pgrep1" (ID: 1) DETAIL: executing server command "/pgbin/pghome_1/bin/pg_ctl -D '/pgdata/dbdata' -W -m fast stop" INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout") DEBUG: ping status is: PQPING_REJECT DEBUG: sleeping 1 second until next check INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout") DEBUG: ping status is: PQPING_NO_RESPONSE DEBUG: remote_command(): ssh -o Batchmode=yes -q -o ConnectTimeout=10 pgrep1 /pgbin/pghome_1/bin/repmgr -f /pgdata/repmgr.conf -L debug node status --is-shutdown-cleanly NOTICE: current primary has been cleanly shut down at location 0/1F000028 DEBUG: local node last receive LSN is 0/1F0000A0, primary shutdown checkpoint LSN is 0/1F000028 NOTICE: promoting standby to primary DETAIL: promoting server "pgrep2" (ID: 2) using pg_promote() NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete DEBUG: setting node 2 as primary and marking existing primary as failed NOTICE: STANDBY PROMOTE successful DETAIL: server "pgrep2" (ID: 2) was successfully promoted to primary DEBUG: executing: /pgbin/pghome_1/bin/repmgr -f /pgdata/repmgr.conf -L debug --no-wait -d \'user=repmgr connect_timeout=2 dbname=repmgr host=pgrep2\' node rejoin > /tmp/node-rejoin.1648446114.log 2>&1 && echo "1" || echo "0" DEBUG: remote_command(): ssh -o Batchmode=yes -q -o ConnectTimeout=10 pgrep1 /pgbin/pghome_1/bin/repmgr -f /pgdata/repmgr.conf -L debug --no-wait -d \'user=repmgr connect_timeout=2 dbname=repmgr host=pgrep2\' node rejoin > /tmp/node-rejoin.1648446114.log 2>&1 && echo "1" || echo "0" WARNING: node "pgrep1" not found in "pg_stat_replication" INFO: waiting for node "pgrep1" (ID: 1) to connect to new primary; 1 of max 60 attempts (parameter "node_rejoin_timeout") DETAIL: checking for record in node "pgrep2"'s "pg_stat_replication" table where "application_name" is "pgrep1" WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 2 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 3 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 4 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 5 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" INFO: waiting for node "pgrep1" (ID: 1) to connect to new primary; 6 of max 60 attempts (parameter "node_rejoin_timeout") DETAIL: checking for record in node "pgrep2"'s "pg_stat_replication" table where "application_name" is "pgrep1" WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 7 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 8 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 9 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 10 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" INFO: waiting for node "pgrep1" (ID: 1) to connect to new primary; 11 of max 60 attempts (parameter "node_rejoin_timeout") DETAIL: checking for record in node "pgrep2"'s "pg_stat_replication" table where "application_name" is "pgrep1" WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 12 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 13 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 14 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 15 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" INFO: waiting for node "pgrep1" (ID: 1) to connect to new primary; 16 of max 60 attempts (parameter "node_rejoin_timeout") DETAIL: checking for record in node "pgrep2"'s "pg_stat_replication" table where "application_name" is "pgrep1" WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 17 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 18 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 19 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 20 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" INFO: waiting for node "pgrep1" (ID: 1) to connect to new primary; 21 of max 60 attempts (parameter "node_rejoin_timeout") DETAIL: checking for record in node "pgrep2"'s "pg_stat_replication" table where "application_name" is "pgrep1" WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 22 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 23 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 24 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 25 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" INFO: waiting for node "pgrep1" (ID: 1) to connect to new primary; 26 of max 60 attempts (parameter "node_rejoin_timeout") DETAIL: checking for record in node "pgrep2"'s "pg_stat_replication" table where "application_name" is "pgrep1" WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 27 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 28 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 29 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 30 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" INFO: waiting for node "pgrep1" (ID: 1) to connect to new primary; 31 of max 60 attempts (parameter "node_rejoin_timeout") DETAIL: checking for record in node "pgrep2"'s "pg_stat_replication" table where "application_name" is "pgrep1" WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 32 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 33 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 34 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 35 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" INFO: waiting for node "pgrep1" (ID: 1) to connect to new primary; 36 of max 60 attempts (parameter "node_rejoin_timeout") DETAIL: checking for record in node "pgrep2"'s "pg_stat_replication" table where "application_name" is "pgrep1" WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 37 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 38 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 39 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 40 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" INFO: waiting for node "pgrep1" (ID: 1) to connect to new primary; 41 of max 60 attempts (parameter "node_rejoin_timeout") DETAIL: checking for record in node "pgrep2"'s "pg_stat_replication" table where "application_name" is "pgrep1" WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 42 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 43 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 44 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 45 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" INFO: waiting for node "pgrep1" (ID: 1) to connect to new primary; 46 of max 60 attempts (parameter "node_rejoin_timeout") DETAIL: checking for record in node "pgrep2"'s "pg_stat_replication" table where "application_name" is "pgrep1" WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 47 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 48 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 49 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 50 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" INFO: waiting for node "pgrep1" (ID: 1) to connect to new primary; 51 of max 60 attempts (parameter "node_rejoin_timeout") DETAIL: checking for record in node "pgrep2"'s "pg_stat_replication" table where "application_name" is "pgrep1" WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 52 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 53 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 54 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 55 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" INFO: waiting for node "pgrep1" (ID: 1) to connect to new primary; 56 of max 60 attempts (parameter "node_rejoin_timeout") DETAIL: checking for record in node "pgrep2"'s "pg_stat_replication" table where "application_name" is "pgrep1" WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 57 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 58 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 59 of max 60 attempts WARNING: node "pgrep1" not found in "pg_stat_replication" DEBUG: sleeping 1 second waiting for node "pgrep1" (ID: 1) to connect to new primary; 60 of max 60 attempts ERROR: node "pgrep2" (ID: 2) promoted to primary, but demotion candidate "pgrep1" (ID: 1) did not connect to the new primary DETAIL: check the PostgreSQL log file on demotion candidate "pgrep1" (ID: 1) DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep1 fallback_application_name=repmgr options=-csearch_path=" WARNING: node "pgrep1" not found in "pg_stat_replication" NOTICE: switchover is incomplete DETAIL: node "pgrep2" is now primary but node "pgrep1" is not attached as standby DEBUG: unpausing repmgrd on node "pgrep1" (ID: 1) DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep1 fallback_application_name=repmgr options=-csearch_path=" DEBUG: unpausing repmgrd on node "pgrep2" (ID: 2) DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=pgrep2 fallback_application_name=repmgr options=-csearch_path=" NOTICE: STANDBY SWITCHOVER has completed with issues HINT: see preceding log message(s) for details
Step 3. 檢查Cluster狀態會有些問題,如果有此問題,則需要重啟Old Primary
[pgadm@pgrep1 ~]$ repmgr -f /pgdata/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+--------+---------+----------------------+----------+----------+----------+----------+--------------------------------------------------------- 1 | pgrep1 | primary | ! running as standby | | default | 100 | 7 | host=pgrep1 user=repmgr dbname=repmgr connect_timeout=2 2 | pgrep2 | standby | ! running as primary | | default | 100 | 8 | host=pgrep2 user=repmgr dbname=repmgr connect_timeout=2 WARNING: following issues were detected - node "pgrep1" (ID: 1) is registered as primary but running as standby - node "pgrep2" (ID: 2) is registered as standby but running as primary
重啟Old Primary
[pgadm@pgrep1 ~]$ pg_ctl stop -D $PGDATA waiting for server to shut down.... done server stopped [pgadm@pgrep1 ~]$ pg_ctl start -D $PGDATA waiting for server to start....2022-03-28 13:48:36.342 CST [324277] LOG: redirecting log output to logging collector process 2022-03-28 13:48:36.342 CST [324277] HINT: Future log output will appear in directory "/pgdata/pglog". done server started [pgadm@pgrep1 ~]$ repmgr -f /pgdata/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+--------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------- 1 | pgrep1 | standby | running | pgrep2 | default | 100 | 7 | host=pgrep1 user=repmgr dbname=repmgr connect_timeout=2 2 | pgrep2 | primary | * running | | default | 100 | 8 | host=pgrep2 user=repmgr dbname=repmgr connect_timeout=2
Note: switchover 不會觸發promote的動作
Step 4. 手動執行 follow 看同步是否跟上
[pgadm@pgrep1 ~]$ repmgr -f /pgdata/repmgr.conf standby follow NOTICE: attempting to find and follow current primary INFO: timelines are same, this server is not ahead DETAIL: local node lsn is 0/23005968, follow target lsn is 0/23005968 NOTICE: setting node 1's upstream to node 2 NOTICE: STANDBY FOLLOW successful DETAIL: standby attached to upstream node "pgrep2" (ID: 2)
0 留言