401 Star 1.4K Fork 1.5K

GVPopenGauss / openGauss-server

 / 详情

【测试类型:工具功能】【测试版本:5.0.1】【升级】3.x(带cm)升级至5.0.1(带cm),升级提交后一段时间,CM集群主备发生切换

已验收
缺陷
创建于  
2023-10-18 11:08

【标题描述】:3.x(带cm)升级至5.0.1(带cm),升级提交后一段时间,CM集群主备发生切换
【测试类型:工具功能】【测试版本:5.0.1】【升级】3.x(带cm)升级至5.0.1(带cm),升级提交后一段时间,CM集群主备发生切换

【操作系统和硬件信息】(查询命令: cat /etc/system-release, uname -a):
CentOS Linux release 7.6.1810 (Core)
Linux kwepwebenv07954 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
【测试环境】(单机/1主x备x级联备):
一主两备
【被测功能】:
升级
【测试类型】:
功能测试
【数据库版本】(查询命令: gaussdb -V):
3.0.0:gsql (openGauss 3.0.0 build 02c14696) compiled at 2022-04-01 18:12:34 commit 0 last mr
3.0.2:gsql (openGauss 3.0.2 build 74914a8d) compiled at 2022-11-17 17:02:30 commit 0 last mr
3.0.3:gsql (openGauss 3.0.3 build 46134f73) compiled at 2023-01-10 22:42:07 commit 0 last mr
3.0.5:gsql (openGauss 3.0.5 build b54d05de) compiled at 2023-09-14 19:23:00 commit 0 last mr
3.1.0:gsql (openGauss 3.1.0 build 4e931f9a) compiled at 2022-09-29 14:19:24 commit 0 last mr
5.0.1:gsql (openGauss 5.0.1 build f766addf) compiled at 2023-10-07 18:07:51 commit 0 last mr

【预置条件】:

【操作步骤】(请填写详细的操作步骤):

  1. CI配置直接升级路径及版本并执行
    主要集中复现在就地升级,灰度升级复现概率小

可复现路径:
3.0.2cm -- 5.0.1cm 就地升级_回滚_升级提交
3.0.3cm -- 5.0.1cm 就地升级、就地升级_回滚_升级提交、就地升级_强制回滚_升级提交
3.0.5ncm/cm -- 5.0.1cm 就地升级
3.1.0cm -- 5.0.1cm 就地升级
3.0.0cm -- 3.1.0cm -- 5.0.1cm 就地升级

【预期输出】:
升级成功,tpcc连跑正常
【实际输出】:
升级提交后一段时间,CM集群主备发生切换
输入图片说明
输入图片说明
输入图片说明
输入图片说明
【原因分析】:

  1. 这个问题的根因
  2. 问题推断过程
  3. 还有哪些原因可能造成类似现象
  4. 该问题是否有临时规避措施
  5. 问题解决方案
  6. 预计修复问题时间

【日志信息】(请附上日志文件、截图、coredump信息):

【测试代码】:

评论 (4)

lixin 创建了缺陷

Hey @lixin, Welcome to openGauss Community.
All of the projects in openGauss Community are maintained by @opengauss_bot.
That means the developers can comment below every pull request or issue to trigger Bot Commands.
Please follow instructions at Here to find the details.

Hi @lixin, please use the command /sig xxx to add a SIG label to this issue.
For example: /sig sqlengine or /sig storageengine or /sig om or /sig ai and so on.
You can find more SIG labels from Here.
If you have no idea about that, please contact with @xiangxinyong , @zhangxubo .

lixin 负责人设置为zhangxubo
lixin 添加协作者周斌
lixin 关联项目设置为openGauss 5.1.1 community
lixin 优先级设置为次要
lixin 关联分支设置为5.0.0
lixin 修改了描述
zhangxubo 计划开始日期设置为2023-10-31
zhangxubo 计划截止日期设置为2023-11-30
zhangxubo 计划开始日期2023-10-31 修改为
zhangxubo 计划截止日期2023-11-30 修改为2023-10-31
zhangxubo 修改了备注
周斌 负责人zhangxubo 修改为张翱
周斌 添加协作者zhangxubo
张翱 计划开始日期设置为2023-11-10
张翱 计划截止日期2023-10-31 修改为2023-11-10
张翱 添加协作者张翱
张翱 负责人张翱 修改为薛蒙恩
张翱 取消协作者张翱

切换涉及到2个过程,一个是升级发生切换,另一个是提交时发生切换。当前场景为提交时发生切换,目前未知原因在升级场景发生了切换。

薛蒙恩 任务状态待办的 修改为已确认
薛蒙恩 任务状态已确认 修改为修复中
薛蒙恩 通过opengauss/openGauss-OM Pull Request !605任务状态修复中 修改为已完成
薛蒙恩 任务状态已完成 修改为待回归
jiexiao1413 任务状态待回归 修改为测试中

验收日期:2023/11/29
验收版本:gsql (openGauss 3.0.2 build 74914a8d) compiled at 2022-11-17 17:02:30 commit 0 last mr
gsql (openGauss 5.0.1 build d855a18f) compiled at 2023-11-28 19:13:55 commit 0 last mr
验收结论:通过
验收场景:3.0.2带cm就地升级到5.0.1带cm,再回滚,再升级提交
升级前

[2023-11-29 16:40:03 INFO UpgradeScene 19 20512] 第1次升级前检查
[2023-11-29 16:40:03 INFO UpgradeScene 19 20512] 升级前准备与检查
[2023-11-29 16:40:03 INFO UpgradeScene 19 20512] 1.获取升级前版本
[2023-11-29 16:40:03 INFO UpgradeScene 19 20512] 开始执行: su - upgrade_1129 <<EOF
source /home/upgrade_1129/gaussdb.bashrc
gs_ssh -c "gsql -V"
EOF
[2023-11-29 16:40:06 INFO UpgradeScene 19 20512] Success: Successfully execute command on all nodes.

Output:
[SUCCESS] kwepwebenv06293:
gsql (openGauss 3.0.2 build 74914a8d) compiled at 2022-11-17 17:02:30 commit 0 last mr
[SUCCESS] kwepwebenv07953:
gsql (openGauss 3.0.2 build 74914a8d) compiled at 2022-11-17 17:02:30 commit 0 last mr
[SUCCESS] kwepwebenv07954:
gsql (openGauss 3.0.2 build 74914a8d) compiled at 2022-11-17 17:02:30 commit 0 last mr

就地升级

[2023-11-29 16:43:43 INFO UpgradeScene 19 20512] 开始执行: su - upgrade_1129 <<EOF
source /home/upgrade_1129/gaussdb.bashrc && gs_upgradectl -t auto-upgrade -X /opt/upgrade_1129/1129/pkg/upgrade_1129.xml
EOF
[2023-11-29 16:52:21 INFO UpgradeScene 19 20512] Success: Static configuration matched with old static configuration files.
Performing inplace rollback.
Rollback succeeded.
Checking upgrade environment.
Successfully checked upgrade environment.
Wait for the cluster status normal or degrade.
Start check CMS parameter.
Old cluster version number less than 92574.
Start to do health check.
Successfully checked cluster status.
Backing up current application and configurations.
Successfully backed up current application and configurations.
Stop cluster with gs_om successfully.
Backing up cluster configuration.
Successfully backup hotpatch config file.
Successfully backed up cluster configuration.
Installing new binary.
Restoring cluster configuration.
Successfully restored cluster configuration.
Stop cluster with gs_om successfully.
Modifying the socket path.
copy certs from /data/upgrade_1129/cluster/app_74914a8d to /data/upgrade_1129/cluster/app_d855a18f.
Successfully copy certs from /data/upgrade_1129/cluster/app_74914a8d to /data/upgrade_1129/cluster/app_d855a18f.
Stop cluster with gs_om successfully.
Switch symbolic link to new binary directory.
Successfully switch symbolic link to new binary directory.
Stop cluster with gs_om successfully.
Waiting for the cluster status to become normal.
.
The cluster status is normal.
Start to do health check.
Successfully checked cluster status.
Upgrade main process has been finished, user can do some check now.
Once the check done, please execute following command to commit upgrade:

    gs_upgradectl -t commit-upgrade -X /opt/upgrade_1129/1129/pkg/upgrade_1129.xml

Last login: Wed Nov 29 16:43:42 CST 2023
[2023-11-29 16:52:21 INFO UpgradeScene 19 20512] 升级后验证
[2023-11-29 16:52:21 INFO UpgradeScene 19 20512] 1.版本验证
[2023-11-29 16:52:21 INFO UpgradeScene 19 20512] 开始执行: su - upgrade_1129 <<EOF
source /home/upgrade_1129/gaussdb.bashrc
gs_ssh -c "gsql -V"
EOF
[2023-11-29 16:52:24 INFO UpgradeScene 19 20512] Success: Successfully execute command on all nodes.

Output:
[SUCCESS] kwepwebenv06293:
gsql (openGauss 5.0.1 build d855a18f) compiled at 2023-11-28 19:13:55 commit 0 last mr
[SUCCESS] kwepwebenv07953:
gsql (openGauss 5.0.1 build d855a18f) compiled at 2023-11-28 19:13:55 commit 0 last mr
[SUCCESS] kwepwebenv07954:
gsql (openGauss 5.0.1 build d855a18f) compiled at 2023-11-28 19:13:55 commit 0 last mr
[2023-11-29 16:52:28 INFO UpgradeScene 19 20512] 3.检查数据库状态
[2023-11-29 16:52:28 INFO UpgradeScene 19 20512] 开始执行: su - upgrade_1129 <<EOF
source /home/upgrade_1129/gaussdb.bashrc
gs_om -t status --all
EOF
[2023-11-29 16:52:29 INFO UpgradeScene 19 20512] Success: -----------------------------------------------------------------------

cluster_state             : Normal
redistributing            : No
balanced                  : Yes

-----------------------------------------------------------------------

node                      : 1
node_name                 : kwepwebenv06293

node                      : 1
instance_id               : 1
node_ip                   : xx.xx.xx.1
data_path                 : /data/upgrade_1129/cluster/cm/cm_server
type                      : CMServer
instance_state            : Down

node                      : 1
instance_id               : 6001
node_ip                   : xx.xx.xx.1
data_path                 : /data/upgrade_1129/cluster/dn1
type                      : Datanode
instance_state            : Primary
static_connections        : 2
HA_state                  : Normal
reason                    : Normal
standby_node              :
standby_data_path         :
standby_node              :
standby_data_path         :
standby_state             : Standby
sender_sent_location      : 0/B1A77670
sender_write_location     : 0/B1A77670
sender_flush_location     : 0/B1A77670
sender_replay_location    : 0/B1A77670
receiver_received_location: 0/B1A77670
receiver_write_location   : 0/B1A77670
receiver_flush_location   : 0/B1A77670
receiver_replay_location  : 0/B1A77670
sync_state                : Quorum
secondary_state           : Unknown
sender_sent_location      : 0/0
sender_write_location     : 0/0
sender_flush_location     : 0/0
sender_replay_location    : 0/0
receiver_received_location: 0/0
receiver_write_location   : 0/0
receiver_flush_location   : 0/0
receiver_replay_location  : 0/0
sync_state                : Unknown

node                      : 1
node_name                 : kwepwebenv06293

node                      : 1
instance_id               : 1
node_ip                   : xx.xx.xx.1
data_path                 : /data/upgrade_1129/cluster/cm/cm_server
type                      : CMServer
instance_state            : Down

node                      : 1
node_ip                   : xx.xx.xx.1
type                      : Fenced UDF
state                     : Normal

-----------------------------------------------------------------------

node                      : 2
node_name                 : kwepwebenv07953

node                      : 2
instance_id               : 2
node_ip                   : xx.xx.xx.2
data_path                 : /data/upgrade_1129/cluster/cm/cm_server
type                      : CMServer
instance_state            : Standby

node                      : 2
instance_id               : 6002
node_ip                   : xx.xx.xx.2
data_path                 : /data/upgrade_1129/cluster/dn1
type                      : Datanode
instance_state            : Standby
dcf_role                  : FOLLOWER
static_connections        : 2
HA_state                  : Normal
reason                    : Normal
sender_sent_location      : 0/B1A77670
sender_write_location     : 0/B1A77670
sender_flush_location     : 0/B1A77670
sender_replay_location    : 0/B1A77670
receiver_received_location: 0/B1A77670
receiver_write_location   : 0/B1A77670
receiver_flush_location   : 0/B1A77670
receiver_replay_location  : 0/B1A77670
sync_state                : Async

node                      : 2
node_name                 : kwepwebenv07953

node                      : 2
instance_id               : 2
node_ip                   : xx.xx.xx.2
data_path                 : /data/upgrade_1129/cluster/cm/cm_server
type                      : CMServer
instance_state            : Standby

node                      : 2
node_ip                   : xx.xx.xx.2
type                      : Fenced UDF
state                     : Normal

-----------------------------------------------------------------------

node                      : 3
node_name                 : kwepwebenv07954

node                      : 3
instance_id               : 3
node_ip                   : xx.xx.xx.3
data_path                 : /data/upgrade_1129/cluster/cm/cm_server
type                      : CMServer
instance_state            : Primary

node                      : 3
instance_id               : 6003
node_ip                   : xx.xx.xx.3
data_path                 : /data/upgrade_1129/cluster/dn1
type                      : Datanode
instance_state            : Standby
dcf_role                  : FOLLOWER
static_connections        : 2
HA_state                  : Normal
reason                    : Normal
sender_sent_location      : 0/B1A77670
sender_write_location     : 0/B1A77670
sender_flush_location     : 0/B1A77670
sender_replay_location    : 0/B1A77670
receiver_received_location: 0/B1A77670
receiver_write_location   : 0/B1A77670
receiver_flush_location   : 0/B1A77670
receiver_replay_location  : 0/B1A77670
sync_state                : Async

node                      : 3
node_name                 : kwepwebenv07954

node                      : 3
instance_id               : 3
node_ip                   : xx.xx.xx.3
data_path                 : /data/upgrade_1129/cluster/cm/cm_server
type                      : CMServer
instance_state            : Primary

node                      : 3
node_ip                   : xx.xx.xx.3
type                      : Fenced UDF
state                     : Normal

回滚

[2023-11-29 16:52:29 INFO UpgradeScene 19 20512] 开始回滚
[2023-11-29 16:52:29 INFO UpgradeScene 19 20512] 升级版本回滚
[2023-11-29 16:52:29 INFO UpgradeScene 19 20512] 开始执行: su - upgrade_1129 <<EOF
source /home/upgrade_1129/gaussdb.bashrc
gs_upgradectl -t auto-rollback -X /opt/upgrade_1129/1129/pkg/upgrade_1129.xml
EOF
[2023-11-29 16:56:10 INFO UpgradeScene 19 20512] Success: Static configuration matched with old static configuration files.
Performing inplace rollback.
Checking static configuration files.
Successfully checked static configuration files.
Restoring cluster configuration.
Successfully rollback hotpatch config file.
Successfully restored cluster configuration.
Start roll back CM instance.
Switch symbolic link to old binary directory.
Successfully switch symbolic link to old binary directory.
Stop cluster with gs_om successfully.
Restoring application and configurations.
Successfully restored application and configuration.
Restoring cluster configuration.
Successfully rollback hotpatch config file.
Successfully restored cluster configuration.
Clean up backup catalog files.
Start check CMS parameter.
Old cluster version number less than 92574.
Successfully cleaned new install path.
Rollback succeeded.
Last login: Wed Nov 29 16:52:28 CST 2023

[2023-11-29 16:56:10 INFO UpgradeScene 19 20512] 开始执行: su - upgrade_1129 -c "source /home/upgrade_1129/gaussdb.bashrc; gaussdb -V"
[2023-11-29 16:56:10 INFO UpgradeScene 19 20512] Success: gaussdb (openGauss 3.0.2 build 74914a8d) compiled at 2022-11-17 17:02:30 commit 0 last mr

再次升级

[2023-11-29 16:59:20 INFO UpgradeScene 19 20512] 开始执行: su - upgrade_1129 <<EOF
source /home/upgrade_1129/gaussdb.bashrc && gs_upgradectl -t auto-upgrade -X /opt/upgrade_1129/1129/pkg/upgrade_1129.xml
EOF
[2023-11-29 17:06:21 INFO UpgradeScene 19 20512] Success: Static configuration matched with old static configuration files.
Performing inplace rollback.
Rollback succeeded.
Checking upgrade environment.
Successfully checked upgrade environment.
Wait for the cluster status normal or degrade.
Start check CMS parameter.
Old cluster version number less than 92574.
Start to do health check.
Successfully checked cluster status.
Backing up current application and configurations.
Successfully backed up current application and configurations.
Stop cluster with gs_om successfully.
Backing up cluster configuration.
Successfully backup hotpatch config file.
Successfully backed up cluster configuration.
Installing new binary.
Restoring cluster configuration.
Successfully restored cluster configuration.
Stop cluster with gs_om successfully.
Modifying the socket path.
Successfully modified socket path.
copy certs from /data/upgrade_1129/cluster/app_74914a8d to /data/upgrade_1129/cluster/app_d855a18f.
Successfully copy certs from /data/upgrade_1129/cluster/app_74914a8d to /data/upgrade_1129/cluster/app_d855a18f.
Stop cluster with gs_om successfully.
Switch symbolic link to new binary directory.
Successfully switch symbolic link to new binary directory.
Stop cluster with gs_om successfully.
Waiting for the cluster status to become normal.
.
The cluster status is normal.
Start to do health check.
Successfully checked cluster status.
Upgrade main process has been finished, user can do some check now.
Once the check done, please execute following command to commit upgrade:

    gs_upgradectl -t commit-upgrade -X /opt/upgrade_1129/1129/pkg/upgrade_1129.xml
[2023-11-29 17:06:21 INFO UpgradeScene 19 20512] 升级后验证
[2023-11-29 17:06:21 INFO UpgradeScene 19 20512] 1.版本验证
[2023-11-29 17:06:21 INFO UpgradeScene 19 20512] 开始执行: su - upgrade_1129 <<EOF
source /home/upgrade_1129/gaussdb.bashrc
gs_ssh -c "gsql -V"
EOF
[2023-11-29 17:06:23 INFO UpgradeScene 19 20512] Success: Successfully execute command on all nodes.

Output:
[SUCCESS] kwepwebenv06293:
gsql (openGauss 5.0.1 build d855a18f) compiled at 2023-11-28 19:13:55 commit 0 last mr
[SUCCESS] kwepwebenv07953:
gsql (openGauss 5.0.1 build d855a18f) compiled at 2023-11-28 19:13:55 commit 0 last mr
[SUCCESS] kwepwebenv07954:
gsql (openGauss 5.0.1 build d855a18f) compiled at 2023-11-28 19:13:55 commit 0 last mr
[2023-11-29 17:06:28 INFO UpgradeScene 19 20512] 3.检查数据库状态
[2023-11-29 17:06:28 INFO UpgradeScene 19 20512] 开始执行: su - upgrade_1129 <<EOF
source /home/upgrade_1129/gaussdb.bashrc
gs_om -t status --all
EOF
[2023-11-29 17:06:29 INFO UpgradeScene 19 20512] Success: -----------------------------------------------------------------------

cluster_state             : Normal
redistributing            : No
balanced                  : Yes

-----------------------------------------------------------------------

node                      : 1
node_name                 : kwepwebenv06293

node                      : 1
instance_id               : 1
node_ip                   : xx.xx.xx.1
data_path                 : /data/upgrade_1129/cluster/cm/cm_server
type                      : CMServer
instance_state            : Standby

node                      : 1
instance_id               : 6001
node_ip                   : xx.xx.xx.1
data_path                 : /data/upgrade_1129/cluster/dn1
type                      : Datanode
instance_state            : Primary

提交

[2023-11-29 17:06:29 INFO UpgradeScene 19 20512] 提交升级
[2023-11-29 17:06:29 INFO UpgradeScene 19 20512] 开始执行: su - upgrade_1129 <<EOF
source /home/upgrade_1129/gaussdb.bashrc
gs_upgradectl -t commit-upgrade -X /opt/upgrade_1129/1129/pkg/upgrade_1129.xml
EOF
[2023-11-29 17:07:46 INFO UpgradeScene 19 20512] Success: NOTICE: Start to commit binary upgrade.
Start to check whether can be committed.
Can be committed.
Start to set commit flag.
Set commit flag succeeded.
Start to do operations that cannot be rollback.
Wait for the cluster status normal or degrade.
Cancel the upgrade status succeeded.
Start to clean temp files for upgrade.
Start check CMS parameter.
Old cluster version number less than 92574.
Clean up backup catalog files.
Successfully cleaned old install path.
Stop cluster with gs_om successfully.
Clean temp files for upgrade succeeded.
NOTICE: Commit binary upgrade succeeded.
Last login: Wed Nov 29 17:06:28 CST 2023

[2023-11-29 17:07:46 INFO UpgradeScene 19 20512] 版本验证
[2023-11-29 17:07:46 INFO UpgradeScene 19 20512] 开始执行: su - upgrade_1129 <<EOF
source /home/upgrade_1129/gaussdb.bashrc
gs_ssh -c "gaussdb -V"
EOF
[2023-11-29 17:07:55 INFO UpgradeScene 19 20512] Success: Successfully execute command on all nodes.

Output:
[SUCCESS] kwepwebenv06293:
gaussdb (openGauss 5.0.1 build d855a18f) compiled at 2023-11-28 19:13:55 commit 0 last mr
[2023-11-29 17:08:02 INFO UpgradeScene 19 20512] 3.检查数据库状态
[2023-11-29 17:08:02 INFO UpgradeScene 19 20512] 开始执行: su - upgrade_1129 <<EOF
source /home/upgrade_1129/gaussdb.bashrc
gs_om -t status --all
EOF
[2023-11-29 17:08:03 INFO UpgradeScene 19 20512] Success: -----------------------------------------------------------------------

cluster_state             : Normal
redistributing            : No
balanced                  : Yes

-----------------------------------------------------------------------

node                      : 1
node_name                 : kwepwebenv06293

node                      : 1
instance_id               : 1
node_ip                   : xx.xx.xx.1
data_path                 : /data/upgrade_1129/cluster/cm/cm_server
type                      : CMServer
instance_state            : Standby

node                      : 1
instance_id               : 6001
node_ip                   : xx.xx.xx.1
data_path                 : /data/upgrade_1129/cluster/dn1
type                      : Datanode
instance_state            : Primary
lixin 任务状态测试中 修改为已验收
薛蒙恩 修改了备注

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(4)
13084139 opengauss bot 1686829535
C++
1
https://gitee.com/opengauss/openGauss-server.git
git@gitee.com:opengauss/openGauss-server.git
opengauss
openGauss-server
openGauss-server

搜索帮助