GPDB · 特性分析 · Segment 修復指南 · 數據庫內核月報

## 問題背景 GPDB是中央控制節點式的架構，在一個 GreenPlum 集群中，有一個 Master 節點和多個 Segment 節點。Master 是中央控制節點，Segment 是數據存放節點。所有的Segment節點平等，均由Master管理。架構如下圖： ![](https://box.kancloud.cn/2016-04-22_5719d9fde4d0e.jpg) GreenPlum架構圖當GP Master出現問題的時候，可以通過外部的HA監控模塊發現并激活備庫，Standby Master 正常后刪除原來的 Master 進行重建備庫。而 Segment 的修復與此不同！由上圖可知，Segment 也分為主備，稱為 Primary 和 Mirror，Mirror 是 Primary 的備。Primary與Mirror之間強同步保證數據一致性和可靠性，其間的監控與切換則由Master的FTS模塊負責。當FTS發現Primary宕機、Mirror健康后會激活Mirror，并標記Primary為’d’，Mirror進入 ChangeTracking 狀態。(詳細的原理此處不作贅述，有興趣可以參考本期月報的[GPDB · 特性分析· GreenPlum Segment事務一致性與異常處理](http://mysql.taobao.org/monthly/2016/04/02/)和上期的[GPDB · 特性分析· GreenPlum FTS 機制](http://mysql.taobao.org/monthly/2016/03/08/)) 當有Segment被標記為’d’后，Master將不會對其做處理，GP實例的啟動（重啟）也會將其忽略。這個時候，整個GP集群是處于有風險的狀況中： 1. 切過去的Mirror壓力增大（需要做change tracking）； 2. 節點單點，可靠性風險加大。這個時候需要及時地對Segment進行修復。 ## GP的Segment修復 GP提供了一系列的控制腳本用于對GP進行操作，其中用于修復Segment的是gprecoverseg。使用方式比較簡單，有限的幾個主要參數如下： * -i 主要參數，用于指定一個配置文件，該配置文件描述了需要修復的Segment和修復后的目的位置。 * -F 可選項，指定后，gprecoverseg會將”-i”中指定的或標記”d”的實例刪除，并從活著的Mirror復制一個完整一份到目標位置。 * -r 當FTS發現有Primary宕機并進行主備切換，在gprecoverseg修復后，擔當Primary的Mirror角色并不會立即切換回來，就會導致部分主機上活躍的Segment過多從而引起性能瓶頸。因此需要恢復Segment原先的角色，稱為re-balance。舉個使用的例子：下面是一個正常的實例， ~~~ $ gpstate -s /opt/python27/lib/python2.7/site-packages/Crypto/Util/number.py:57: PowmInsecureWarning: Not using mpz_powm_sec. You should rebuild using libgmp >= 5 to avoid timing attack vulnerability. _warn("Not using mpz_powm_sec. You should rebuild using libgmp >= 5 to avoid timing attack vulnerability.", PowmInsecureWarning) 20160418:21:39:29:016547 gpstate:host1:gpuser-[INFO]:-Starting gpstate with args: -s 20160418:21:39:29:016547 gpstate:host1:gpuser-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.99.00 build dev' 20160418:21:39:29:016547 gpstate:host1:gpuser-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3 (Greenplum Database 4.3.99.00 build dev) compiled on Apr 11 2016 22:02:39' 20160418:21:39:29:016547 gpstate:host1:gpuser-[INFO]:-Obtaining Segment details from master... 20160418:21:39:29:016547 gpstate:host1:gpuser-[INFO]:-Gathering data from segments... . 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:----------------------------------------------------- 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:--Master Configuration & Status 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:----------------------------------------------------- 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Master host = host1 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Master postgres process ID = 72447 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Master data directory = /workspace/gpuser/3007 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Master port = 3007 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Master current role = dispatch 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Greenplum initsystem version = 4.3.99.00 build dev 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Greenplum current version = PostgreSQL 8.3 (Greenplum Database 4.3.99.00 build dev) compiled on Apr 11 2016 22:02:39 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Postgres version = 8.3 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Master standby = host2 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Standby master state = Standby host passive 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:----------------------------------------------------- 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:-Segment Instance Status Report 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:----------------------------------------------------- 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Segment Info 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Hostname = host1 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Address = host1 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Datadir = /workspace/gpuser/3008 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Port = 3008 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Mirroring Info 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Current role = Primary 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Preferred role = Primary 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Mirror status = Synchronized 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Status 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- PID = 72388 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Configuration reports status as = Up 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Database status = Up 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:----------------------------------------------------- ...... [INFO]:----------------------------------------------------- 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Segment Info 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Hostname = host1 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Address = host1 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Datadir = /workspace/gpuser/3012 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Port = 3012 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Mirroring Info 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Current role = Mirror 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Preferred role = Mirror 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Mirror status = Synchronized 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Status 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- PID = 75247 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Configuration reports status as = Up 20160418:21:39:30:016547 gpstate:host1:gpuser-[INFO]:- Segment status = Up ~~~ 選擇一個kill之后（如3012這個端口的實例），執行gprecoverseg，如下： ~~~ 20160418:21:40:58:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h host1 -p 3008 20160418:21:40:58:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h host1 -p 3008 20160418:21:40:58:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h host2 -p 3014 20160418:21:40:58:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h host2 -p 3014 20160418:21:40:58:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h host1 -p 3010 20160418:21:40:58:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h host1 -p 3010 20160418:21:40:58:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h host2 -p 3015 20160418:21:40:58:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h host2 -p 3015 20160418:21:40:58:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h host2 -p 3008 20160418:21:40:58:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h host2 -p 3008 20160418:21:40:58:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h host1 -p 3011 20160418:21:40:58:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h host1 -p 3011 20160418:21:40:58:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h host2 -p 3013 20160418:21:40:58:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h host2 -p 3013 20160418:21:40:58:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h host1 -p 3012 20160418:21:40:58:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h host1 -p 3012 ...... 20160418:21:41:18:017989 gpstate:host1:gpuser-[DEBUG]:-[worker6] finished cmd: Get segment status cmdStr='sshpass -e ssh -o 'StrictHostKeyChecking no' host1 ". /workspace/gpdb/greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h host1 -p 3012"' had result: cmd had rc=15 completed=True halted=False stdout='' stderr='failed to connect: Connection refused (errno: 111) Retrying no 1 failed to connect: Connection refused (errno: 111) Retrying no 2 ...... 20160418:21:41:18:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Encountered error Not ready to connect to database mode: PrimarySegment segmentState: Fault dataState: InSync faultType: FaultMirror mode: PrimarySegment segmentState: Fault dataState: InSync faultType: FaultMirror ~~~ 這個時候連接這個實例去獲取信息是失敗的，失敗的原因后面再說。這個時候失敗后會重試5次，當再一次嘗試的時候發現了不同： ~~~ 20160418:21:41:23:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h host1 -p 3008 20160418:21:41:23:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h host1 -p 3008 20160418:21:41:23:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h host2 -p 3014 20160418:21:41:23:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h host2 -p 3014 20160418:21:41:23:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h host1 -p 3010 20160418:21:41:23:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h host1 -p 3010 20160418:21:41:23:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h host2 -p 3015 20160418:21:41:23:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h host2 -p 3015 20160418:21:41:23:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h host2 -p 3008 20160418:21:41:23:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h host2 -p 3008 20160418:21:41:23:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h host1 -p 3011 20160418:21:41:23:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h host1 -p 3011 20160418:21:41:23:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h host2 -p 3013 20160418:21:41:23:017989 gprecoverseg:host1:gpuser-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h host2 -p 3013 ~~~ 會發現，少了一個Segment的命令，而這個Segment正是剛才kill的Segment。繼續往下看執行結果，gprecoverseg執行了下面的內容： ~~~ 20160418:23:16:20:085203 gprecoverseg:host1:gpuser-[DEBUG]:-[worker7] finished cmd: Get segment status information cmdStr='sshpass -e ssh -o 'StrictHostKeyChecking no' host2 ". /workspace/gpdb/greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h host2 -p 3013"' had result: cmd had rc=1 completed=True halted=False stdout='' stderr='mode: PrimarySegment segmentState: Ready dataState: InChangeTracking faultType: NotInitialized mode: PrimarySegment segmentState: Ready dataState: InChangeTracking faultType: NotInitialized ' ~~~ 這個實例為什么單獨檢查呢？而且這個時候如果失敗，則會直接退出無法繼續執行。在一系列的檢查之后，先更新catalog中的操作記錄表： ~~~ UPDATE pg_catalog.gp_segment_configuration ~~~ 之后即調用命令進行數據的恢復： ~~~ /workspace/gpdb/bin/lib/gpconfigurenewsegment -c /workspace/gpuser/3012:3012:false:false:9 -v -B 16 --write-gpid-file-only ~~~ 最后再啟動Segment，并更新catalog： ~~~ $GPHOME/sbin/gpsegstart.py -C en_US.utf8:C:C -M quiescent -V 'postgres (Greenplum Database) 4.3.99.00 build dev' -n 4 --era df86ca11ca2fc214_160418165251 -t 600 -v -p KGRwMApTJ2Ric0J5UG9ydCcKcDEKKGRwMgpJMzAxMgooZHAzClMndGFyZ2V0TW9kZScKcDQKUydtaXJyb3InCnA1CnNTJ2RiaWQnCnA2Ckk5CnNTJ2hvc3ROYW1lJwpwNwpTJzEwLjk3LjI0OC43MycKcDgKc1MncGVlclBvcnQnCnA5CkkzNTEzCnNTJ3BlZXJQTVBvcnQnCnAxMApJMzAxMwpzUydwZWVyTmFtZScKcDExClMncnQxYjA3MDI0LnRiYycKcDEyCnNTJ2Z1bGxSZXN5bmNGbGFnJwpwMTMKSTAwCnNTJ21vZGUnCnAxNApTJ3InCnAxNQpzUydob3N0UG9ydCcKcDE2CkkzNTEyCnNzcy4= -D '9|3|m|m|r|d|host1|host1|3012|3512|/workspace/gpuser/3012||' ...... 20160419:01:21:05:042692 gprecoverseg:host1:gpuser-[DEBUG]:-UPDATE pg_catalog.gp_segment_configuration SET mode = 'r', status = 'u' WHERE dbid = 5 20160419:01:21:05:042692 gprecoverseg:host1:gpuser-[DEBUG]:-INSERT INTO gp_configuration_history (time, dbid, "desc") VALUES( now(), 5, 'gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status' ) 20160419:01:21:05:042692 gprecoverseg:host1:gpuser-[DEBUG]:-UPDATE pg_catalog.gp_segment_configuration SET mode = 'r', status = 'u' WHERE dbid = 9 20160419:01:21:05:042692 gprecoverseg:host1:gpuser-[DEBUG]:-INSERT INTO gp_configuration_history (time, dbid, "desc") VALUES( now(), 9, 'gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status' ) 20160419:01:21:05:042692 gprecoverseg:host1:gpuser-[DEBUG]:-UPDATE gp_fault_strategy ~~~ 這樣即是一個完整的gprecoverseg過程。執行過后，對應的Primary和Mirror會進入”r”狀態，表示正在做數據同步。下面來看其中的詳細步驟和原理。 ## 實現原理上面的例子中，遺留了幾個問題： * 在gprecoverseg過程中，第一次獲取Segment狀態是不對的； * 第二次獲取Segment信息，比第一次少了一條； * 單獨檢查了“-h host2 -p 3013”這個實例。這幾個問題在了解了原理后就很容易理解了。想要了解原理，可以先看下執行的步驟。從代碼看來，其大致的步驟如下： ### 參數處理 GP的腳本用了較多的環境變量，且不同的腳本、不同的地方略有不同。如gprecoverseg用的就是MASTER_DATA_DIRECTORY，從MASTER_DATA_DIRECTORY指定的目錄中得到Master相關的信息（如port）以進行相關操作。 gprecoverseg的參數，最重要的莫過于”-i”了，其指定了需要做修復的Segment，并且可以指定到不同的主機上，例如： ~~~ filespaceOrder= host1:3012:/workspace/gpuser/3012 host2:3012:3512:/workspace/gpuser/3012 ~~~ 具體執行不再贅述。 ### 判斷Segment當前的狀態調用gp_primarymirror，向活著的segment發送消息，以判斷Segment當前的狀態。這是非常重要的一步，也是遇到問題最多的一步，經常會出現問題”Unable to connect to database”。事實上，造成這個失敗的原因有很多，比較多的是： * 其對應的Primary（Mirror）也宕機； * 其對應的Primary的狀態不對，如已經有gprecoverseg在進行（或執行失敗，狀態出問題等）。在做這一步的時候，是依賴gp_segment_configuration中的數據的，即會首先從GP　Master上獲取相應的數據，與下一步中的描述基本相同。如果這個Segment被標記為”d”，那么是不會向該Segment發起狀態信息請求。而如果對應的Primary/Mirror都宕機了，他們的狀態不會同時為”d”（有可能都為”u”，比如同時異常的時候，FTS不會更新他們）。因此對標記為”u”實際已經宕機的Segment連接獲取狀態信息的時候，則會報錯。這個時候就不是gprecoverseg所能處理的問題了，只能重啟整個實例。回到前面的問題。第一次執行失敗即因為Segment的狀態尚未更新；第二次執行少了一個Segment，即狀態被更新為”d”后不進行連接。在檢查完所有狀態為”u”的Segment連接后，則會針對宕機的Mirror進行檢查，查看其對應的主庫是否正常，可以用于修復數據，即是第三個問題的答案。如： ~~~ stdout='' stderr='mode: PrimarySegment segmentState: Ready dataState: InSync faultType: NotInitialized mode: PrimarySegment segmentState: Ready dataState: InSync faultType: NotInitialized ' ~~~ 或者這樣 ~~~ stdout='' stderr='mode: PrimarySegment segmentState: Ready dataState: InChangeTracking faultType: NotInitialized mode: PrimarySegment segmentState: Ready dataState: InChangeTracking faultType: NotInitialized ' ~~~ 正常情況下，當Mirror出現問題，Primary發現后會進入ChangeTracking的狀態。在這個狀態里，Primary會記錄下切換狀態時間點之后的變更，用于當Mirror恢復時進行數據同步，而不用每次都做一次全量。 ### 從master獲取segment的信息包括IP、PORT、ROLE、Status、數據目錄、臨時空間等，如下： ~~~ dbid | content | role | preferred\_role | mode | status | hostname | address | port | replication\_port | oid | fselocation ------+---------+------+----------------+------+--------+---------------+---------------+------+------------------+------+----------------- 1 | -1 | p | p | s | u | host1 | host1 | 3007 | | 3052 | /workspace/gpuser/3007 10 | -1 | m | m | s | u | host2 | host2 | 3007 | | 3052 | /workspace/gpuser/3007 2 | 0 | p | p | s | u | host1 | host1 | 3008 | 3508 | 3052 | /workspace/gpuser/3008 6 | 0 | m | m | s | u | host2 | host2 | 3014 | 3514 | 3052 | /workspace/gpuser/3014 3 | 1 | p | p | s | u | host1 | host1 | 3010 | 3510 | 3052 | /workspace/gpuser/3010 7 | 1 | m | m | s | u | host2 | host2 | 3015 | 3515 | 3052 | /workspace/gpuser/3015 4 | 2 | p | p | s | u | host2 | host2 | 3008 | 3508 | 3052 | /workspace/gpuser/3008 8 | 2 | m | m | s | u | host1 | host1 | 3011 | 3511 | 3052 | /workspace/gpuser/3011 5 | 3 | p | p | s | u | host2 | host2 | 3013 | 3513 | 3052 | /workspace/gpuser/3013 9 | 3 | m | m | s | u | host1 | host1 | 3012 | 3512 | 3052 | /workspace/gpuser/3012 ~~~ IP/PORT/ROLE/STATUS/目錄/FILESPACE等信息，后面的Mirror修復列表、臨時空間、操作對象的信息都依賴于此。 ### 修復準備在獲取所有的Segment信息后，會針對配置文件、參數等相關信息確定，包括： * Segment修復對象確定Segment修復對象和數據源，即Primary；需要修復的Segment有可能是多個。并獲取需要修復的Segment的相關信息，包括端口、流復制端口、數據目錄、臨時空間、文件空間等信息，以及是否強制修復等。 * 主機環境在獲取所需要修復的Segment列表后，需要確保所在主機環境是可以的，包括端口占用、目錄的占用等有可能沖突的地方。如果沒有指定主機，則會在已有的主機中選擇一個。 ### 修復修復的步驟是： * 關閉宕機的Mirror，并清理shared memory * 確定需要修復的Segment已經被標記為”d” * 如有需要，則進行刪除，如”-F”的情況 * 打包壓縮、復制數據到目標位置 * 關閉SIGINT、SIG_IGN，更新元數據庫，打開SIGINT、SIG_IGN 以上步驟后，即可實現對Segment的本地（in-place）或跨機修復。 ### re-balance 當修復完Segment之后，原先因為Primary宕機而切到Mirror上的Segment并不會主動切回來，這個時候有可能出現性能傾斜而影響性能，因此需要做”re-balance”，執行： ~~~ gprecoverseg -r ~~~ 執行該命令會將role切換為preferred_role，保證整個集群的角色平衡而不致于部分主機Primay更多引起性能瓶頸。