401 Star 1.4K Fork 1.5K

GVPopenGauss / openGauss-server

 / 详情

并发改config文件导致coredump

已完成
缺陷 成员
创建于  
2022-05-15 22:36

【标题描述】:
并发改config文件导致coredump

【测试类型:并发】【测试版本:3.0.0】
【操作系统和硬件信息】(查询命令: cat /etc/system-release, uname -a):
CentOS Linux release 7.9.2009 (Core)
x86_64 GNU/Linux

【测试环境】(单机/1主x备x级联备):
单机

【被测功能】:
多线程并发使用ALTER SYSTEM SET命令修改postgresql.conf文件

【数据库版本】(查询命令: gaussdb –V):
openGauss 3.0.0 build 6c589864

【预置条件】:
启动opengauss

【操作步骤】(请填写详细的操作步骤):

  1. 启动opengauss
  2. 使用yat并发跑一组testcases, 每个testcase使用ALTER SYSTEM SET命令修改不同的config参数
    e.g.
    testcase1:
    -- @testpoint: 使用 alter system set 设置guc变量后检查是否设置成功
    ALTER SYSTEM SET archive_mode to on;
    SQL SUCCESS
    select pg_sleep(2);
    +----------+
    | pg_sleep |
    +----------+
    | |
    +----------+
    show archive_mode;
    +--------------+
    | archive_mode |
    +--------------+
    | on |
    +--------------+
    ALTER SYSTEM SET archive_mode to off;
    SQL SUCCESS
    select pg_sleep(2);
    +----------+
    | pg_sleep |
    +----------+
    | |
    +----------+
    show archive_mode;
    +--------------+
    | archive_mode |
    +--------------+
    | off |
    +--------------+
    testcase2:
    -- @testpoint: 使用 alter system set 设置guc变量后检查是否设置成功
    ALTER SYSTEM SET archive_timeout to 600;
    SQL SUCCESS
    select pg_sleep(2);
    +----------+
    | pg_sleep |
    +----------+
    | |
    +----------+
    show archive_timeout;
    +-----------------+
    | archive_timeout |
    +-----------------+
    | 10min |
    +-----------------+
    ALTER SYSTEM SET archive_timeout to 0;
    SQL SUCCESS
    select pg_sleep(2);
    +----------+
    | pg_sleep |
    +----------+
    | |
    +----------+
    show archive_timeout;
    +-----------------+
    | archive_timeout |
    +-----------------+
    | 0 |
    +-----------------+

【预期输出】:
每个testcase修改参数后,postgresql.conf文件中所有修改了的参数结果一致。以以上两个testcase为例,archive_mode应为off, archive_timeout应为0.

【实际输出】:
数据库运行出错,coredump

【原因分析】:

  1. 这个问题的根因
    多线程同时更改postgresql.conf文件时,锁为flock(),是process级别的锁,所以opengauss的其中一个线程拿到锁后,所有opengauss线程都获得锁,实际没有起到多线程保护作用。

关于修改conf文件有两种race condition,一种是thread, opengauss不同thread会同时修改conf文件;另一种是process,gs_guc程序和opengauss主程序会同时修改conf文件。

  1. 问题解决方案

保留原先的flock锁作为process锁,然后在这基础上加上opengauss全局锁lwlock作为thread锁,每次更改conf文件都需要拿到这两把锁,release的时候需要先release process锁,再release thread锁,不然会有race condition

【日志信息】(请附上日志文件、截图、coredump信息):
#0 write_guc_file (
    path=path@entry=0x7f21f6f6c020 "/data/msun/openGauss-server/src/test/regress/tmp_check/datanode1/postgresql.conf.bak_bak", lines=lines@entry=0x0) at guc.cpp:12027
#1 0x000055cd7c38e4a9 in WriteAlterSystemSetGucFile (
    ConfFileName=ConfFileName@entry=0x7f21f6f6c020 "/data/msun/openGauss-server/src/test/regress/tmp_check/datanode1/postgresql.conf.bak_bak", opt_lines=0x0, filelock=filelock@entry=0x7f21f6f6b780)
    at guc.cpp:8507
#2 0x000055cd7c38ea14 in AlterSystemSetConfigFile (altersysstmt=altersysstmt@entry=0x7f21fb7f1890)
    at guc.cpp:8596
#3 0x000055cd7c8d04fd in standard_ProcessUtility (parse_tree=parse_tree@entry=0x7f21fb7f1890,
    query_string=query_string@entry=0x7f21f744d790 "alter system set autovacuum to off",
    params=params@entry=0x0, is_top_level=,
    dest=dest@entry=0x55cd8258bca0 , sent_to_remote=,
    completion_tag=0x7f21f6f6f120 "", isCTAS=false) at utility.cpp:5557
#4 0x00007f22d7f590a9 in gsaudit_ProcessUtility_hook (parsetree=0x7f21fb7f1890,
    queryString=0x7f21f744d790 "alter system set autovacuum to off", params=0x0,
    isTopLevel=, dest=0x55cd8258bca0 , sentToRemote=,
    completionTag=0x7f21f6f6f120 "", isCTAS=false) at gs_policy_plugin.cpp:1062
#5 0x000055cd7c8de63c in pgaudit_ProcessUtility (parsetree=0x7f21fb7f1890,
    queryString=0x7f21f744d790 "alter system set autovacuum to off", params=,
    isTopLevel=, dest=, sentToRemote=,
    completionTag=0x7f21f6f6f120 "", isCTAS=false) at auditfuncs.cpp:1198
#6 0x000055cd7c8c2265 in PortalRunUtility (portal=portal@entry=0x7f224e0bc040,
    utilityStmt=utilityStmt@entry=0x7f21fb7f1890, isTopLevel=isTopLevel@entry=true,
    dest=dest@entry=0x55cd8258bca0 ,
    completionTag=completionTag@entry=0x7f21f6f6f120 "") at pquery.cpp:1758
#7 0x000055cd7c8c33c4 in PortalRunMulti (portal=portal@entry=0x7f224e0bc040,
    isTopLevel=isTopLevel@entry=true, dest=0x55cd8258bca0 , dest@entry=0x7f223a1c64a0,
    altdest=0x55cd8258bca0 , altdest@entry=0x7f223a1c64a0,
    completionTag=completionTag@entry=0x7f21f6f6f120 "") at pquery.cpp:1937
#8 0x000055cd7c8c5f48 in PortalRun (portal=portal@entry=0x7f224e0bc040,
    count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
    dest=dest@entry=0x7f223a1c64a0, altdest=altdest@entry=0x7f223a1c64a0,
    completionTag=completionTag@entry=0x7f21f6f6f120 "") at pquery.cpp:1183
#9 0x000055cd7be74890 in exec_execute_message (portal_name=portal_name@entry=0x7f223a1c6040 "",
    max_rows=9223372036854775807, max_rows@entry=0) at postgres.cpp:4997
#10 0x000055cd7c8c045a in PostgresMain (argc=, argv=argv@entry=0x7f2244af1780,
    dbname=, username=) at postgres.cpp:8784
#11 0x000055cd7c832c6b in BackendRun (port=port@entry=0x7f21f6f6f7b0) at postmaster.cpp:7746
#12 0x000055cd7c8508e8 in GaussDbThreadMain<(knl_thread_role)2> (arg=0x7f226aadbf98)
    at postmaster.cpp:11629
#13 0x000055cd7c832cc5 in InternalThreadFunc (args=) at postmaster.cpp:12186
---Type to continue, or q to quit---
#14 0x00007f22e24a1ea5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f22e21cab0d in clone () from /lib64/libc.so.6

评论 (1)

wenkeyang_abab 创建了缺陷

Hey @wenkeyang_abab, Welcome to openGauss Community.
All of the projects in openGauss Community are maintained by @opengauss-bot.
That means the developers can comment below every pull request or issue to trigger Bot Commands.
Please follow instructions at Here to find the details.

wenkeyang_abab 修改了描述
wenkeyang_abab 修改了描述
yansong_lee 负责人设置为buter
buter 添加协作者buter
buter 负责人buter 修改为张翱
buter 取消协作者buter
张翱 添加协作者张翱
张翱 负责人张翱 修改为胡正超
胡正超 任务状态待办的 修改为已确认
胡正超 任务状态已确认 修改为修复中
胡正超 任务状态修复中 修改为已完成

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(3)
5622128 opengauss bot 1581905080
C++
1
https://gitee.com/opengauss/openGauss-server.git
git@gitee.com:opengauss/openGauss-server.git
opengauss
openGauss-server
openGauss-server

搜索帮助