414 Star 1.4K Fork 1.6K

GVPopenGauss/openGauss-server

 / 详情

【测试类型:SQL功能】【测试版本:3.0.0】 gs_probackup恢复数据之后,gs_ctl build节点失败

已完成
缺陷
创建于  
2022-05-20 11:57

【标题描述】:
【测试类型:SQL功能】【测试版本:3.0.0】 gs_probackup恢复数据之后,gs_ctl build节点失败

【操作系统和硬件信息】(查询命令: cat /etc/system-release, uname -a):openEuler release 20.03 (LTS)

【测试环境】(单机/1主x备x级联备):1主1备

【被测功能】:行存表压缩

【测试类型】:功能测试

【数据库版本】(查询命令: gaussdb –V):gaussdb (openGauss 3.0.0 build 02c14696) compiled at 2022-04-01 18:12:19 commit 0 last mr

【预置条件】:数据库正常运行
使用sysbench导入随机数据;
1.修改sysbench表结构:CREATE TABLE sbtest%d(
id %s,
k INTEGER DEFAULT '0' NOT NULL,
c CHAR(120) DEFAULT '' NOT NULL,
pad CHAR(60) DEFAULT '' NOT NULL,
%s (id)
) with(compresstype=2)
2.使用sysbench导入数据;
3.检查表文件,出现使用chunk等于分配chunk数量;且均为相应的chunk_size,分配chunk数量的最大值
【操作步骤】(请填写详细的操作步骤):

  1. gs_probackup backup -B /home/omm/data_bak/ --instance=6001 -b full -h 127.0.0.1 -p 15400 -d test 备份数据
  2. stop 集群
  3. 删除主DN 目录下数据;
  4. gs_probackup restore -B /home/omm/data_bak/ --instance=6001 恢复数据
  5. gs_om -t start 启动集群失败;备机 wal
  6. build 备机 /data/xydb/cluster/app/bin/gs_ctl build -D /data/xydb/cluster/dn_1 -M standby -b full -r 300

【预期输出】:成功拉起备机

【实际输出】:拉起失败

【原因分析】:

  1. 这个问题的根因
  2. 问题推断过程
  3. 还有哪些原因可能造成类似现象
  4. 该问题是否有临时规避措施
  5. 问题解决方案
  6. 预计修复问题时间

【日志信息】(请附上日志文件、截图、coredump信息):
[2022-05-20 11:34:56.815][2326409][][gs_ctl]: gs_ctl full build ,datadir is /data/xydb/cluster/dn_1
waiting for server to shut down.... done
server stopped
[2022-05-20 11:34:57.824][2326409][][gs_ctl]: current workdir is (/home/omm).
[2022-05-20 11:34:57.825][2326409][][gs_ctl]: fopen build pid file "/data/xydb/cluster/dn_1/gs_build.pid" success
[2022-05-20 11:34:57.825][2326409][][gs_ctl]: fprintf build pid file "/data/xydb/cluster/dn_1/gs_build.pid" success
[2022-05-20 11:34:57.825][2326409][][gs_ctl]: fsync build pid file "/data/xydb/cluster/dn_1/gs_build.pid" success
[2022-05-20 11:34:57.825][2326409][][gs_ctl]: set gaussdb state file when full build build:db state(BUILDING_STATE), server mode(STANDBY_MODE), build mode(FULL_BUILD).
[2022-05-20 11:34:57.829][2326409][dn_6001_6002][gs_ctl]: build try host(主DNIP) port(xxxx) success
[2022-05-20 11:34:57.829][2326409][dn_6001_6002][gs_ctl]: connect to server success, build started.
[2022-05-20 11:34:57.829][2326409][dn_6001_6002][gs_ctl]: create build tag file success
[2022-05-20 11:34:59.204][2326409][dn_6001_6002][gs_ctl]: clear old target dir success
[2022-05-20 11:34:59.204][2326409][dn_6001_6002][gs_ctl]: create build tag file again success
[2022-05-20 11:34:59.204][2326409][dn_6001_6002][gs_ctl]: get system identifier success
[2022-05-20 11:34:59.204][2326409][dn_6001_6002][gs_ctl]: receiving and unpacking files...
[2022-05-20 11:34:59.204][2326409][dn_6001_6002][gs_ctl]: create backup label success
[2022-05-20 11:34:59.264][2326409][dn_6001_6002][gs_ctl]: xlog start point: 11/E1000178
[2022-05-20 11:34:59.264][2326409][dn_6001_6002][gs_ctl]: begin build tablespace list
[2022-05-20 11:34:59.264][2326409][dn_6001_6002][gs_ctl]: finish build tablespace list
[2022-05-20 11:34:59.264][2326409][dn_6001_6002][gs_ctl]: begin get xlog by xlogstream
[2022-05-20 11:34:59.264][2326409][dn_6001_6002][gs_ctl]: starting background WAL receiver
[2022-05-20 11:34:59.264][2326409][dn_6001_6002][gs_ctl]: starting walreceiver
[2022-05-20 11:34:59.264][2326409][dn_6001_6002][gs_ctl]: begin receive tar files
[2022-05-20 11:34:59.265][2326409][dn_6001_6002][gs_ctl]: receiving and unpacking files...
[2022-05-20 11:34:59.268][2326409][dn_6001_6002][gs_ctl]: build try host(主DNIP) port(xxxx) success
[2022-05-20 11:34:59.269][2326409][dn_6001_6002][gs_ctl]: check identify system success
[2022-05-20 11:34:59.269][2326409][dn_6001_6002][gs_ctl]: send START_REPLICATION 11/E1000000 success
[2022-05-20 11:36:01.090][2326409][dn_6001_6002][gs_ctl]: finish receive tar files
[2022-05-20 11:36:01.090][2326409][dn_6001_6002][gs_ctl]: could not get WAL end position from server: FATAL: base backup cheksum or Decompressed blockno 3854 failed in file "./base/16388/18732_pcd", aborting backup. nchunks: 2, allocatedChunks: 2, segno: 0.
[2022-05-20 11:36:01.090][2326409][dn_6001_6002][gs_ctl]: full build failed(/data/xydb/cluster/dn_1).
【测试代码】:

评论 (3)

Baker X 创建了缺陷

Hey @Baker X, Welcome to openGauss Community.
All of the projects in openGauss Community are maintained by @opengauss-bot.
That means the developers can comment below every pull request or issue to trigger Bot Commands.
Please follow instructions at Here to find the details.

yansong_lee 负责人设置为周斌
Baker X 修改了描述
Baker X 修改了描述

@Baker X 此问题还有复现吗? 我本地无法复现。

周斌 添加协作者周斌
周斌 负责人周斌 修改为li_jianqiu
周斌 取消协作者周斌
熊小军 优先级设置为主要

3.0.0可复现问题,具体步骤如下:
一、创建压缩表与索引,插入数据
--压缩表
create table row_compress_1(a int, b int,c varchar(32)) with(compresstype=1,compress_chunk_size=4096);
--添加压缩索引
create index on row_compress_1(a) with (compresstype=1);
--插入数据
insert into row_compress_1 values(generate_series(1,100000),generate_series(1,100000),'a'||generate_series(1,100000));
二、备机执行全量build,报错
[2022-12-05 09:54:12.078][4107926][dn_6001_6002][gs_ctl]: could not get WAL end position from server: FATAL: base backup cheksum or Decompressed blockno 1 failed in file "./base/15563/58348_pcd", aborting backup. nchunks: 2, allocatedChunks: 2, segno: 0.
结论:此问题在新版本已经修复,PR链接为:
!1847:【3.0.0回合】1750 数据备份时,页面校验增加非压缩页面校验。

li_jianqiu 任务状态待办的 修改为已确认
li_jianqiu 任务状态已确认 修改为修复中
li_jianqiu 任务状态修复中 修改为已完成

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(4)
5622128 opengauss bot 1581905080
C++
1
https://gitee.com/opengauss/openGauss-server.git
git@gitee.com:opengauss/openGauss-server.git
opengauss
openGauss-server
openGauss-server

搜索帮助

Cb406eda 1850385 E526c682 1850385