402 Star 1.4K Fork 1.5K

GVPopenGauss / openGauss-server

 / 详情

【资源池化】产生各种日志,进行按需回放数据库coredump

已验收
缺陷
创建于  
2023-07-06 09:19

【标题描述】:产生各种日志,进行按需回放数据库coredump
【测试类型:SQL功能/存储功能/接口功能/工具功能/性能/并发/压力长稳/故障注入/安全/资料/编码规范】【测试版本:2.0.0】 问题描述

【操作系统和硬件信息】(查询命令: cat /etc/system-release, uname -a):
openEuler release 20.03 (LTS)
Linux sharedstore002 4.19.90-2003.4.0.0036.oe1.x86_64 #1 SMP Mon Mar 23 19:10:41 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

【测试环境】(单机/1主x备x级联备):
一主两备资源池化环境

【被测功能】:
failover按需回放

【测试类型】:
功能测试

【数据库版本】(查询命令: gaussdb -V):
gaussdb (openGauss 5.1.0 build a74b7333) compiled at 2023-06-19 09:40:28 commit 0 last mr

【预置条件】:
gs_guc set -N all -I all -c "replication_type=1"
gs_guc set -N all -I all -c "recovery_parse_workers=4"
gs_guc set -N all -I all -c "recovery_redo_workers=4"
gs_guc set -N all -I all -c "hot_standby=off"
gs_guc set -N all -I all -c "ss_enable_ondemand_recovery=on"
gs_guc set -N all -I all -c "ss_ondemand_recovery_mem_size=25GB"
gs_guc set -N all -I all -c "shared_buffers=100GB"
gs_guc set -N all -I all -c "pagewriter_thread_num = 1"
gs_guc set -N all -I all -c "pagewriter_sleep = 60min"
gs_guc set -N all -I all -c "enable_incremental_checkpoint = off"

【操作步骤】(请填写详细的操作步骤):

  1. 主机跑2分钟tpcc读写业务,redogap为2.5G左右
    输入图片说明
  2. 停掉tpcc,跑如下sql语句,产生多种日志
create table tb_01(id integer,name varchar(16))with(segment=on);
insert into tb_01 values (1, 'aaa');insert into tb_01 values (2, 'bbb');insert into tb_01 values (3, 'ccc');
alter table tb_01 add primary key (id);alter table tb_01 modify name varchar(32);
drop table tb_01;
create table tb_02(id integer,name varchar(16))with(segment=on);
insert into tb_02 values (1, 'aaa');insert into tb_02 values (2, 'bbb');insert into tb_02 values (3, 'ccc');
alter table tb_02 modify name varchar(32);
CREATE  or replace VIEW "v_tb_02" AS SELECT * FROM tb_02;
create user tpcc1 password "Huawei@123";grant all privileges to tpcc1;revoke all privileges from tpcc1;

begin;create table tb_03(id integer,name varchar(16))with(segment=on);
insert into tb_03 values (1, 'aaa');insert into tb_03 values (2, 'bbb');rollback;
begin;create table tb_04(id integer,name varchar(16))with(segment=on);
insert into tb_04 values (1, 'aaa');insert into tb_04 values (2, 'bbb');commit;
  1. 主机跑tpcc,然后主机产生故障,触发failover
    主机跑tpcc之前的redogao:
    输入图片说明
    主机故障之前的redogap:
    输入图片说明
  2. 等待failover恢复

【预期输出】:
failover恢复成功

【实际输出】:
failover恢复失败,升主备机coredump

【原因分析】:

【日志信息】(请附上日志文件、截图、coredump信息):
cm查询状态:
输入图片说明

core堆栈:
输入图片说明

#0  BBOX_CreateCoredump (file_name=file_name@entry=0x0) at bbox_create.cpp:404
#1  0x0000555e9baacdd2 in bbox_handler (sig=<optimized out>, si=0x7f9550048c30, uc=0x7f9550048b00) at gs_bbox.cpp:112
#2  bbox_handler (sig=<optimized out>, si=0x7f9550048c30, uc=0x7f9550048b00) at gs_bbox.cpp:102
#3  <signal handler called>
#4  0x0000000000000000 in ?? ()
#5  0x0000555e9c29dfee in XLogDropRelation (rnode=..., forknum=forknum@entry=0) at xlogutils.cpp:1444
#6  0x0000555e9c29e3f5 in XlogDropRowReation (rnode=...) at xlogutils.cpp:1377
#7  XLogForgetDDLRedo (redoblockstate=redoblockstate@entry=0x7f996e4cf1f0) at xlogutils.cpp:1402
#8  0x0000555e9c20bfe7 in ondemand_extreme_rto::RedoPageWorkerMain () at page_redo.cpp:1641
#9  0x0000555e9c20f915 in ondemand_extreme_rto::RedoMainLoop () at page_redo.cpp:2552
#10 0x0000555e9c20fc12 in ondemand_extreme_rto::ParallelRedoThreadMain () at page_redo.cpp:2666
#11 0x0000555e9bdc9a66 in GaussDbAuxiliaryThreadMain<(knl_thread_role)22> (arg=0x7fa20b7f6af0) at postmaster.cpp:11510
#12 GaussDbThreadMain<(knl_thread_role)22> (arg=0x7fa20b7f6af0) at postmaster.cpp:13674
#13 0x0000555e9bd9e3b5 in InternalThreadFunc (args=<optimized out>) at postmaster.cpp:14256
#14 0x00007fbd502b5fed in ?? () from /lib64/libpthread.so.0
#15 0x00007fbd501e918f in clone () from /lib64/libc.so.6

数据库日志:
输入图片说明
【测试代码】:

附件
zhaobingyu 2023-07-06 09:38

评论 (4)

zhaobingyu 创建了缺陷

Hey @zhaobingyu, Welcome to openGauss Community.
All of the projects in openGauss Community are maintained by @opengauss_bot.
That means the developers can comment below every pull request or issue to trigger Bot Commands.
Please follow instructions at Here to find the details.

Hi @zhaobingyu, please use the command /sig xxx to add a SIG label to this issue.
For example: /sig sqlengine or /sig storageengine or /sig om or /sig ai and so on.
You can find more SIG labels from Here.
If you have no idea about that, please contact with @xiangxinyong , @zhangxubo .

zhaobingyu 上传了附件gstack.txt
zhaobingyu 修改了描述
zhaobingyu 负责人设置为陈栋
zhaobingyu 关联项目设置为openGauss 5.1.0 community
zhaobingyu 优先级设置为主要
zhaobingyu 修改了描述
陈栋 任务状态待办的 修改为修复中
陈栋 通过opengauss/openGauss-server Pull Request !3684任务状态修复中 修改为已完成
zhaobingyu 修改了标题
zhaobingyu 任务状态已完成 修改为测试中
zhaobingyu 任务状态测试中 修改为已验收

验收版本:
gaussdb (openGauss 5.1.0 build 31025cde) compiled at 2023-08-02 00:09:00 commit 0 last mr
验收结果:
输入图片说明
输入图片说明
在产生各种日志后,按需回放成功,并且集群成功恢复初始状态

5.0.1验证 通过 gaussdb (openGauss 5.0.1 build 1afdf2b0) compiled at 2023-10-12 17:49:15 commit 0 last mr
输入图片说明

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(3)
13084139 opengauss bot 1686829535
C++
1
https://gitee.com/opengauss/openGauss-server.git
git@gitee.com:opengauss/openGauss-server.git
opengauss
openGauss-server
openGauss-server

搜索帮助