postgresql高可用之repmgr自动failover -m6米乐安卓版下载

多米爸比 2020-02-05

2983

repmgr搭建参考上一篇 https://www.modb.pro/db/15358

$ vi /opt/repmgr.conf
failover='automatic'
connection_check_type=ping
reconnect_attempts=6
reconnect_interval=10 
#如果主节点失败，则要进行切换的工作，并记录到日志
promote_command='/opt/pgsql/bin/repmgr standby promote -f /opt/repmgr.conf --log-to-file'
#如果有连接到主节点的其他从节点，会根据命令连接到新主的节点
follow_command='/opt/pgsql/bin/repmgr standby follow -f /opt/repmgr.conf --log-to-file --upstream-node-id=%n'

$ vi data6000/postgresql.conf
shared_preload_libraries = 'repmgr'

$ /opt/pgsql/bin/repmgrd -f /opt/repmgr.conf
--通过事件记录查看repmgrd的开启情况
[postgres@ha1 ~]$  /opt/pgsql/bin/repmgr -f /opt/repmgr.conf cluster event --event=repmgrd_start
 node id | name  | event         | ok | timestamp           | details                                               
--------- ------- --------------- ---- --------------------- --------------------------------------------------------
 1       | node1 | repmgrd_start | t  | 2020-02-02 22:33:30 | monitoring cluster primary "node1" (id: 1)            
 2       | node2 | repmgrd_start | t  | 2020-02-02 22:33:25 | monitoring connection to upstream node "node1" (id: 1)
--查看repmgr和repmgrd进程
[postgres@ha1 ~]$ ps f -u postgres
  pid tty      stat   time command
13124 pts/0    s      0:00 -bash
13389 pts/0    r      0:00  \_ ps f -u postgres
13191 ?        s      0:00 /opt/pgsql/bin/repmgrd -f /opt/repmgr.conf
 3247 ?        ss     0:00 /opt/pg12/bin/postgres -d /opt/data6000
 3272 ?        ss     0:00  \_ postgres: logger   
 3680 ?        ss     0:00  \_ postgres: checkpointer   
 3681 ?        ss     0:00  \_ postgres: background writer   
 3682 ?        ss     0:00  \_ postgres: walwriter   
 3683 ?        ss     0:00  \_ postgres: autovacuum launcher   
 3684 ?        ss     0:00  \_ postgres: archiver   
 3685 ?        ss     0:00  \_ postgres: stats collector   
 3686 ?        ss     0:00  \_ postgres: logical replication launcher  
13184 ?        ss     0:00  \_ postgres: repmgr repmgr 192.168.99.202(41551) idle
13189 ?        ss     0:00  \_ postgres: repmgr repmgr 192.168.99.200(49369) idle
13289 ?        ss     0:00  \_ postgres: walsender repmgr 192.168.99.202(41735) streaming 0/17003308
[postgres@ha2 ~]$ ps f -u postgres
  pid tty      stat   time command
 3926 pts/0    s      0:00 -bash
 4056 pts/0    r      0:00  \_ ps f -u postgres
 3950 ?        s      0:00 /opt/pgsql/bin/repmgrd -f /opt/repmgr.conf
 3241 ?        ss     0:00 /opt/pg12/bin/postgres -d /opt/data6000
 3259 ?        ss     0:00  \_ postgres: logger   
 3260 ?        ss     0:00  \_ postgres: startup   recovering 000000070000000000000017
 3674 ?        ss     0:00  \_ postgres: checkpointer   
 3675 ?        ss     0:00  \_ postgres: background writer   
 3676 ?        ss     0:00  \_ postgres: stats collector   
 3948 ?        ss     0:00  \_ postgres: repmgr repmgr 192.168.99.202(62092) idle
 4002 ?        ss     0:00  \_ postgres: walreceiver   streaming 0/17003308

[postgres@ha1 ~]$ /opt/pgsql/bin/pg_ctl -d /opt/data6000/ stop
waiting for server to shut down.... done
server stopped
--查看备库ha2日志，可以看到1 分钟后，无法连接到主库ha1，程序已经开始了切库的服务
[postgres@ha2 ~]$ cat /opt/data6000/repmgr.log
...
[2020-02-03 11:25:34] [info] node "node2" (id: 2) monitoring upstream node "node1" (id: 1) in normal state
[2020-02-03 11:25:36] [warning] unable to ping "host=192.168.99.200 port=6000 user=repmgr dbname=repmgr connect_timeout=2"
[2020-02-03 11:25:36] [detail] pqping() returned "pqping_no_response"
[2020-02-03 11:25:36] [warning] unable to connect to upstream node "node1" (id: 1)
[2020-02-03 11:25:36] [info] checking state of node 1, 1 of 6 attempts
[2020-02-03 11:25:36] [warning] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.99.200 port=6000 fallback_application_name=repmgr"
[2020-02-03 11:25:36] [detail] pqping() returned "pqping_no_response"
[2020-02-03 11:25:36] [info] sleeping 10 seconds until next reconnection attempt
[2020-02-03 11:25:46] [info] checking state of node 1, 2 of 6 attempts
[2020-02-03 11:25:46] [warning] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.99.200 port=6000 fallback_application_name=repmgr"
[2020-02-03 11:25:46] [detail] pqping() returned "pqping_no_response"
[2020-02-03 11:25:46] [info] sleeping 10 seconds until next reconnection attempt
[2020-02-03 11:25:56] [info] checking state of node 1, 3 of 6 attempts
[2020-02-03 11:25:56] [warning] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.99.200 port=6000 fallback_application_name=repmgr"
[2020-02-03 11:25:56] [detail] pqping() returned "pqping_no_response"
[2020-02-03 11:25:56] [info] sleeping 10 seconds until next reconnection attempt
[2020-02-03 11:26:06] [info] checking state of node 1, 4 of 6 attempts
[2020-02-03 11:26:06] [warning] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.99.200 port=6000 fallback_application_name=repmgr"
[2020-02-03 11:26:06] [detail] pqping() returned "pqping_no_response"
[2020-02-03 11:26:06] [info] sleeping 10 seconds until next reconnection attempt
[2020-02-03 11:26:16] [info] checking state of node 1, 5 of 6 attempts
[2020-02-03 11:26:16] [warning] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.99.200 port=6000 fallback_application_name=repmgr"
[2020-02-03 11:26:16] [detail] pqping() returned "pqping_no_response"
[2020-02-03 11:26:16] [info] sleeping 10 seconds until next reconnection attempt
[2020-02-03 11:26:26] [info] checking state of node 1, 6 of 6 attempts
[2020-02-03 11:26:26] [warning] unable to ping "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.99.200 port=6000 fallback_application_name=repmgr"
[2020-02-03 11:26:26] [detail] pqping() returned "pqping_no_response"
[2020-02-03 11:26:26] [warning] unable to reconnect to node 1 after 6 attempts
[2020-02-03 11:26:26] [info] 0 active sibling nodes registered
[2020-02-03 11:26:26] [info] primary and this node have the same location ("default")
[2020-02-03 11:26:26] [info] no other sibling nodes - we win by default
[2020-02-03 11:26:26] [notice] this node is the only available candidate and will now promote itself
[2020-02-03 11:26:26] [info] promote_command is:
  "/opt/pgsql/bin/repmgr standby promote -f /opt/repmgr.conf --log-to-file"
[2020-02-03 11:26:26] [notice] redirecting logging output to "/opt/data6000/repmgr.log"
[2020-02-03 11:26:26] [notice] promoting standby to primary
[2020-02-03 11:26:26] [detail] promoting server "node2" (id: 2) using pg_promote()
[2020-02-03 11:26:26] [notice] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
[2020-02-03 11:26:27] [notice] standby promote successful
[2020-02-03 11:26:27] [detail] server "node2" (id: 2) was successfully promoted to primary
[2020-02-03 11:26:27] [info] 0 followers to notify
[2020-02-03 11:26:27] [info] switching to primary monitoring mode
[2020-02-03 11:26:27] [notice] monitoring cluster primary "node2" (id: 2)
--ha2查看集群状态，ha2已经promote成功
[postgres@ha2 ~]$ /opt/pgsql/bin/repmgr -f /opt/repmgr.conf cluster show
 id | name  | role    | status    | upstream | location | priority | timeline | connection string                                                        
---- ------- --------- ----------- ---------- ---------- ---------- ---------- ---------------------------------------------------------------------------
 1  | node1 | primary | - failed  |          | default  | 100      | ?        | host=192.168.99.200 port=6000 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | * running |          | default  | 100      | 14       | host=192.168.99.202 port=6000 user=repmgr dbname=repmgr connect_timeout=2
warning: following issues were detected
  - unable to connect to node "node1" (id: 1)

[postgres@ha1 ~]$ /opt/pgsql/bin/repmgr -f /opt/repmgr.conf node rejoin -d'host=192.168.99.202 port=6000 user=repmgr dbname=repmgr connect_timeout=2'
info: local node 1 can attach to rejoin target node 2
detail: local node's recovery point: 0/20000028; rejoin target node's fork point: 0/200000a0
notice: setting node 1's upstream to node 2
warning: unable to ping "host=192.168.99.200 port=6000 user=repmgr dbname=repmgr connect_timeout=2"
detail: pqping() returned "pqping_no_response"
notice: starting server using "/opt/pgsql/bin/pg_ctl  -w -d '/opt/data6000' start"
notice: node rejoin successful
detail: node 1 is now attached to node 2
--ha1查看集群状态
[postgres@ha1 ~]$ /opt/pgsql/bin/repmgr -f /opt/repmgr.conf cluster show
 id | name  | role    | status    | upstream | location | priority | timeline | connection string                                                        
---- ------- --------- ----------- ---------- ---------- ---------- ---------- ---------------------------------------------------------------------------
 1  | node1 | standby |   running | node2    | default  | 100      | 13       | host=192.168.99.200 port=6000 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | * running |          | default  | 100      | 14       | host=192.168.99.202 port=6000 user=repmgr dbname=repmgr connect_timeout=2
 
 --备库ha1执行checkpoint确保时间基线一致
 [postgres@ha1 ~]$ /opt/pgsql/bin/psql -p6000 -upostgres
postgres=# delete from t;
error:  cannot execute delete in a read-only transaction
postgres=# checkpoint;
checkpoint
postgres=# \q
[postgres@ha1 ~]$ /opt/pgsql/bin/repmgr -f /opt/repmgr.conf cluster show
 id | name  | role    | status    | upstream | location | priority | timeline | connection string                                                        
---- ------- --------- ----------- ---------- ---------- ---------- ---------- ---------------------------------------------------------------------------
 1  | node1 | standby |   running | node2    | default  | 100      | 14       | host=192.168.99.200 port=6000 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | * running |          | default  | 100      | 14       | host=192.168.99.202 port=6000 user=repmgr dbname=repmgr connect_timeout=2
从上面可以看出通过node rejoin将刚才已经关闭的主库再次上线，并以备库的身份连到新主，加入集群成功。

[postgres@ha2 ~]$ /opt/pgsql/bin/psql -p6000 -upostgres
postgres=# insert into t values(5,'from node2');
insert 0 1
postgres=# select * from t;
 id |             info              
---- -------------------------------
  1 | 2020-02-02 21:11:51.66735 08
  2 | 2020-02-02 22:05:21.757823 08
  3 | 2020-02-02 22:07:03.546043 08
  4 | 2020-02-03 11:03:42.050227 08
  5 | from node2
(5 rows)
--关闭主库ha2
[postgres@ha2 ~]$ /opt/pgsql/bin/pg_ctl -d /opt/data6000/ stop
--1分钟后查看备库ha1,可以看到已经切换为主库
[postgres@ha1 ~]$ /opt/pgsql/bin/repmgr -f /opt/repmgr.conf cluster show
 id | name  | role    | status    | upstream | location | priority | timeline | connection string                                                        
---- ------- --------- ----------- ---------- ---------- ---------- ---------- ---------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 15       | host=192.168.99.200 port=6000 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | - failed  |          | default  | 100      | ?        | host=192.168.99.202 port=6000 user=repmgr dbname=repmgr connect_timeout=2
warning: following issues were detected
  - unable to connect to node "node2" (id: 2)
--此时ha2先不直接使用node rejoin
--ha1主库插入一些数据，并执行checkpoint
[postgres@ha1 ~]$ /opt/pgsql/bin/psql -p6000 -upostgres
postgres=# insert into t values(6,'from node1');
insert 0 1
postgres=# select * from t;
 id |             info              
---- -------------------------------
  1 | 2020-02-02 21:11:51.66735 08
  2 | 2020-02-02 22:05:21.757823 08
  3 | 2020-02-02 22:07:03.546043 08
  4 | 2020-02-03 11:03:42.050227 08
  5 | from node2
  6 | from node1
(6 rows)
postgres=# checkpoint;
checkpoint
--ha2手工启动服务
[postgres@ha2 ~]$ /opt/pgsql/bin/pg_ctl -d /opt/data6000/ start &
[postgres@ha2 ~]$ /opt/pgsql/bin/repmgr -f /opt/repmgr.conf cluster show
 id | name  | role    | status               | upstream | location | priority | timeline | connection string                                                        
---- ------- --------- ---------------------- ---------- ---------- ---------- ---------- ---------------------------------------------------------------------------
 1  | node1 | standby | ! running as primary |          | default  | 100      | 15       | host=192.168.99.200 port=6000 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | * running            |          | default  | 100      | 14       | host=192.168.99.202 port=6000 user=repmgr dbname=repmgr connect_timeout=2
warning: following issues were detected
  - node "node1" (id: 1) is registered as standby but running as primary
--ha2插入一些数据，并执行checkpoint
[postgres@ha2 ~]$ /opt/pgsql/bin/psql -p6000 -upostgres
postgres=# insert into t values(7,'from node2');
insert 0 1
postgres=# select * from t;
 id |             info              
---- -------------------------------
  1 | 2020-02-02 21:11:51.66735 08
  2 | 2020-02-02 22:05:21.757823 08
  3 | 2020-02-02 22:07:03.546043 08
  4 | 2020-02-03 11:03:42.050227 08
  5 | from node2
  7 | from node2
(6 rows)
--ha2关闭服务
[postgres@ha2 ~]$ /opt/pgsql/bin/pg_ctl -d /opt/data6000/ stop
--ha2执行node rejoin
[postgres@ha2 ~]$ /opt/pgsql/bin/repmgr -f /opt/repmgr.conf node rejoin -d'host=192.168.99.200 port=6000 user=repmgr dbname=repmgr connect_timeout=2'
error: this node cannot attach to rejoin target node 1
detail: rejoin target server's timeline 15 forked off current database system timeline 14 before current recovery point 0/22000028
hint: use --force-rewind to execute pg_rewind
[postgres@ha2 ~]$ /opt/pgsql/bin/repmgr -f /opt/repmgr.conf node rejoin -d'host=192.168.99.200 port=6000 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind
notice: pg_rewind execution required for this node to attach to rejoin target node 1
detail: rejoin target server's timeline 15 forked off current database system timeline 14 before current recovery point 0/22000028
error: --force-rewind specified but pg_rewind cannot be used
detail: "wal_log_hints" is set to "off" and data checksums are disabled
从上面可以看出node rejoin操作失败，因为两边库的数据已经不一致。

[postgres@ha1 ~]$ /opt/pgsql/bin/repmgr -f /opt/repmgr.conf primary unregister --node-id=2
info: node "node2" (id: 2) was successfully unregistered
[postgres@ha1 ~]$ vi /opt/data6000/postgresql.conf
wal_log_hints=on
[postgres@ha1 ~]$ /opt/pgsql/bin/pg_ctl -d /opt/data6000/ restart &
[postgres@ha2 ~]$ rm -rf /opt/archive6000/*
[postgres@ha2 ~]$ rm -rf /opt/data6000/*
[postgres@ha2 ~]$ /opt/pgsql/bin/repmgr -h 192.168.99.200 -p6000 -u repmgr -d repmgr -f /opt/repmgr.conf standby clone
[postgres@ha2 ~]$ vi /opt/data6000/postgresql.conf 
primary_conninfo='host=192.168.99.200 port=6000 user=repuser'
[postgres@ha2 ~]$ /opt/pgsql/bin/pg_ctl -d /opt/data6000/ start &
[postgres@ha2 ~]$ /opt/pgsql/bin/repmgr -f /opt/repmgr.conf standby register
[postgres@ha2 ~]$ /opt/pgsql/bin/repmgr -f /opt/repmgr.conf cluster show
 id | name  | role    | status    | upstream | location | priority | timeline | connection string                                                        
---- ------- --------- ----------- ---------- ---------- ---------- ---------- ---------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 15       | host=192.168.99.200 port=6000 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | standby |   running | node1    | default  | 100      | 15       | host=192.168.99.202 port=6000 user=repmgr dbname=repmgr connect_timeout=2
 
 [postgres@ha1 ~]$ > /opt/data6000/repmgr.log
 [postgres@ha2 ~]$ > /opt/data6000/repmgr.log
 
 [postgres@ha1 ~]$ /opt/pgsql/bin/repmgrd -f /opt/repmgr.conf
 [postgres@ha2 ~]$ /opt/pgsql/bin/repmgrd -f /opt/repmgr.conf

[postgres@ha1 ~]$ /opt/pgsql/bin/psql -p6000 -upostgres
postgres=# insert into t values(101,'from node1 '||now());
insert 0 1
postgres=# select * from t;
 id  |                   info                   
----- ------------------------------------------
   1 | 2020-02-02 21:11:51.66735 08
   2 | 2020-02-02 22:05:21.757823 08
   3 | 2020-02-02 22:07:03.546043 08
   4 | 2020-02-03 11:03:42.050227 08
   5 | from node2
   6 | from node1
 100 | from node1 2020-02-03 12:20:27.444394 08
 101 | from node1 2020-02-03 12:33:26.365621 08
(8 rows)
--关闭主库ha1
[postgres@ha1 ~]$ /opt/pgsql/bin/pg_ctl -d /opt/data6000/ stop
--1分钟后查看备库ha2,可以看到已经切换为主库
[postgres@ha2 ~]$ /opt/pgsql/bin/repmgr -f /opt/repmgr.conf cluster show
 id | name  | role    | status    | upstream | location | priority | timeline | connection string                                                        
---- ------- --------- ----------- ---------- ---------- ---------- ---------- ---------------------------------------------------------------------------
 1  | node1 | primary | - failed  |          | default  | 100      | ?        | host=192.168.99.200 port=6000 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | * running |          | default  | 100      | 16       | host=192.168.99.202 port=6000 user=repmgr dbname=repmgr connect_timeout=2
warning: following issues were detected
  - unable to connect to node "node1" (id: 1)
--ha2新主库插入一些数据，并执行checkpoint
[postgres@ha2 ~]$ /opt/pgsql/bin/psql -p6000 -upostgres
postgres=# insert into t values(200,'from node2 '||now());
insert 0 1
postgres=# select * from t;
 id  |                   info                   
----- ------------------------------------------
   1 | 2020-02-02 21:11:51.66735 08
   2 | 2020-02-02 22:05:21.757823 08
   3 | 2020-02-02 22:07:03.546043 08
   4 | 2020-02-03 11:03:42.050227 08
   5 | from node2
   6 | from node1
 100 | from node1 2020-02-03 12:20:27.444394 08
 101 | from node1 2020-02-03 12:33:26.365621 08
 200 | from node2 2020-02-03 12:39:04.007665 08
(9 rows)
postgres=# checkpoint;
checkpoint
--ha1手工启动服务
[postgres@ha1 ~]$ /opt/pgsql/bin/pg_ctl -d /opt/data6000/ start &
[postgres@ha1 ~]$ /opt/pgsql/bin/repmgr -f /opt/repmgr.conf cluster show
 id | name  | role    | status               | upstream | location | priority | timeline | connection string                                                        
---- ------- --------- ---------------------- ---------- ---------- ---------- ---------- ---------------------------------------------------------------------------
 1  | node1 | primary | * running            |          | default  | 100      | 15       | host=192.168.99.200 port=6000 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | standby | ! running as primary |          | default  | 100      | 16       | host=192.168.99.202 port=6000 user=repmgr dbname=repmgr connect_timeout=2
warning: following issues were detected
  - node "node2" (id: 2) is registered as standby but running as primary
--ha1插入一些数据，并执行checkpoint
[postgres@ha1 ~]$ /opt/pgsql/bin/psql -p6000 -upostgres
postgres=# insert into t values(300,'from node1 '||now());
insert 0 1
postgres=# select * from t;
 id  |                   info                   
----- ------------------------------------------
   1 | 2020-02-02 21:11:51.66735 08
   2 | 2020-02-02 22:05:21.757823 08
   3 | 2020-02-02 22:07:03.546043 08
   4 | 2020-02-03 11:03:42.050227 08
   5 | from node2
   6 | from node1
 100 | from node1 2020-02-03 12:20:27.444394 08
 101 | from node1 2020-02-03 12:33:26.365621 08
 300 | from node1 2020-02-03 12:42:21.438444 08
(9 rows)
--ha1关闭服务
[postgres@ha1 ~]$ /opt/pgsql/bin/pg_ctl -d /opt/data6000/ stop
--ha1执行node rejoin
[postgres@ha1 ~]$ /opt/pgsql/bin/repmgr -f /opt/repmgr.conf node rejoin -d'host=192.168.99.202 port=6000 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind
notice: pg_rewind execution required for this node to attach to rejoin target node 2
detail: rejoin target server's timeline 16 forked off current database system timeline 15 before current recovery point 0/27000028
notice: executing pg_rewind
detail: pg_rewind command is "/opt/pgsql/bin/pg_rewind -d '/opt/data6000' --source-server='host=192.168.99.202 port=6000 user=repmgr dbname=repmgr connect_timeout=2'"
pg_rewind: servers diverged at wal location 0/260000a0 on timeline 15
pg_rewind: rewinding from last common checkpoint at 0/26000028 on timeline 15
pg_rewind: done!
notice: 0 files copied to /opt/data6000
notice: setting node 1's upstream to node 2
warning: unable to ping "host=192.168.99.200 port=6000 user=repmgr dbname=repmgr connect_timeout=2"
detail: pqping() returned "pqping_no_response"
notice: starting server using "/opt/pgsql/bin/pg_ctl  -w -d '/opt/data6000' start"
warning: unable to ping "host=192.168.99.200 port=6000 user=repmgr dbname=repmgr connect_timeout=2"
detail: pqping() returned "pqping_reject"
notice: node rejoin successful
detail: node 1 is now attached to node 2
--ha1查看集群状态
[postgres@ha1 ~]$ /opt/pgsql/bin/repmgr -f /opt/repmgr.conf cluster show
 id | name  | role    | status    | upstream | location | priority | timeline | connection string                                                        
---- ------- --------- ----------- ---------- ---------- ---------- ---------- ---------------------------------------------------------------------------
 1  | node1 | standby |   running | node2    | default  | 100      | 15       | host=192.168.99.200 port=6000 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | * running |          | default  | 100      | 16       | host=192.168.99.202 port=6000 user=repmgr dbname=repmgr connect_timeout=2
--ha1执行checkpoint或者服务restart,timeline保持一致
--验证数据
[postgres@ha1 ~]$ /opt/pgsql/bin/psql -p6000 -upostgres
postgres=# select * from t;
 id  |                   info                   
----- ------------------------------------------
   1 | 2020-02-02 21:11:51.66735 08
   2 | 2020-02-02 22:05:21.757823 08
   3 | 2020-02-02 22:07:03.546043 08
   4 | 2020-02-03 11:03:42.050227 08
   5 | from node2
   6 | from node1
 100 | from node1 2020-02-03 12:20:27.444394 08
 101 | from node1 2020-02-03 12:33:26.365621 08
 200 | from node2 2020-02-03 12:39:04.007665 08
(9 rows)
postgres=# delete from t;
error:  cannot execute delete in a read-only transaction
[postgres@ha2 ~]$ /opt/pgsql/bin/psql -p6000 -upostgres
postgres=# select * from t;
 id  |                   info                   
----- ------------------------------------------
   1 | 2020-02-02 21:11:51.66735 08
   2 | 2020-02-02 22:05:21.757823 08
   3 | 2020-02-02 22:07:03.546043 08
   4 | 2020-02-03 11:03:42.050227 08
   5 | from node2
   6 | from node1
 100 | from node1 2020-02-03 12:20:27.444394 08
 101 | from node1 2020-02-03 12:33:26.365621 08
 200 | from node2 2020-02-03 12:39:04.007665 08
(9 rows)
postgres=# insert into t values(400,'from node2 '||now());
insert 0 1

从上面测试可以看出加上wal_log_hints后，主备库数据如果不一致时仍然可以使用node rejoin操作成功，新主库上线至旧主库rejoin之间时间段内，旧主库的写数据会丢失。

[postgres@ha1 ~]$ /opt/pgsql/bin/repmgr -f /opt/repmgr.conf node rejoin -d’host=192.168.99.202 port=6000 user=repmgr dbname=repmgr connect_timeout=2’
error: this node cannot attach to rejoin target node 2
detail: rejoin target server’s timeline 8 forked off current database system timeline 7 before current recovery point 0/1a000028
hint: use --force-rewind to execute pg_rewind
–分析：主备库时间线不一致

[postgres@ha1 ~]$ /opt/pgsql/bin/repmgr -f /opt/repmgr.conf node rejoin -d’host=192.168.99.202 port=6000 user=repmgr dbname=repmgr connect_timeout=2’ --force-rewind
notice: pg_rewind execution required for this node to attach to rejoin target node 2
detail: rejoin target server’s timeline 8 forked off current database system timeline 7 before current recovery point 0/1a000028
error: --force-rewind specified but pg_rewind cannot be used
detail: “wal_log_hints” is set to “off” and data checksums are disabled
–分析：数据库未开启wal_log_hints,数据目录初始化时也未打开checksums

timeline不一致问题,如何同步
1.备库执行checkpoint
2.关闭备库，执行node rejoin

postgresql

「喜欢文章，快来给作者赞赏墨值吧」

文章被以下合辑收录

墨天轮最受欢迎的技术文档-高可用架构篇（共124篇）

数据库高可用如何实现？这份合辑你或许可以用到！本文为大家整理了墨天轮社区上一些受欢迎的数据库高可用架构实践主题相关文档，包含oracle、postgresql、mysql三类数据库，欢迎大家下载查看🌹

postgresql工作实践（共101篇）

关注

postgresql高可用之repmgr自动failover -m6米乐安卓版下载

文章被以下合辑收录

评论