36 Star 99 Fork 102

openGauss / openGauss-connector-jdbc

 / 详情

opengauss:fail to allocate a zone

已取消
缺陷
创建于  
2022-08-04 10:26

一主一备模式下,进行简单测试,一段时间后显示:fail to allocate a zone
通过提供的工具收集日志,结果是死锁导致的:

2022-08-03 14:14:59.865 62ea1252.5025 postgres 139780237289216 PostgreSQL JDBC Driver 1538516 dn_6001_6002 40P01  3377699720531497 [BACKEND] ERROR:  deadlock detected
2022-08-03 14:14:59.865 62ea1252.5025 postgres 139780237289216 PostgreSQL JDBC Driver 1538516 dn_6001_6002 40P01  3377699720531497 [BACKEND] DETAIL:  Process 139780237289216 waits for ShareLock on transaction 1538556; blocked by process 139780876728064.
	Process 139780876728064 waits for AccessExclusiveLock on tuple (19808,36) of (relation 16390, bucket 0) of database 15484; blocked by process 139779890865920.
	Process 139779890865920 waits for ShareLock on transaction 1538516; blocked by process 139780237289216.
	Process 139780237289216: delete from users where user_name = $1

线程代码:

public class DelectTest extends Thread {
    String sql = "delete from users where user_name = ?;" +
            "delete from users where user_name = ?;" +
            "delete from users where user_name = ?";
    JDBCUtil jdbc = new JDBCUtil();

    public void run(){
        Long startTime = System.currentTimeMillis();
        Long endTime = null;
        boolean flag = true;
        while(flag){
            try {
                DelectTest.sleep(800);
                PreparedStatement statement = jdbc.getStatement(sql);
                statement.setString(1,"zhangsan");
                statement.setString(2,"zhaoliu");
                statement.setString(3,"longqi");


                int i = statement.executeUpdate();
                System.out.println("成功地删除了"+ i +"条数据");
                //MyLogger.info("成功地删除了"+ i +"条数据");

                endTime = System.currentTimeMillis();


                if(endTime - startTime > 86400000*3){
                    flag = false;
                }
            } catch (SQLException throwables) {
                throwables.printStackTrace();
                //MyLogger.warning(throwables.getMessage());
            } catch (InterruptedException e) {
                e.printStackTrace();
            } finally {
                jdbc.closeSource();
            }
        }


    }
}

之后将测试关闭程序,只要对数据库进行写操作,就会显示fail to allocate a zone,需要重启数据库才能恢复。请问这是为什么呢?怎么解决

评论 (12)

黄佳 创建了咨询

Hey @黄佳, Welcome to openGauss Community.
All of the projects in openGauss Community are maintained by @opengauss-bot.
That means the developers can comment below every pull request or issue to trigger Bot Commands.
Please follow instructions at Here to find the details.

zhangxubo 任务类型咨询 修改为缺陷
yansong_lee 负责人设置为周斌
yansong_lee 优先级设置为主要

已将sql语句进行优化,避免了死锁,但是仍然会出现:failed to allocate a zone。收集到的日志如下:
clean statement thread start 2022-08-07 00:33:03.957 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [SLRU] LOG: remove old segments(<2048) under pg_csnlog 2022-08-07 00:33:03.957 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: truncate CSN log oldestXact 2595255, next xid 2595271 2022-08-07 00:33:04.165 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [DBL_WRT] LOG: [batch flush] DW truncate end: file_head[dwn 193, start 3654], total_pages 0 2022-08-07 00:33:04.173 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: slotname: dn_6002, dummy: 0, restartlsn: 1/7F6C3FD8 2022-08-07 00:33:04.174 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: CreateCheckPoint PrintCkpXctlControlFile: [checkPoint] oldCkpLoc:1/7F593818, oldRedo:1/7F5930A8, newCkpLoc:1/7F6C4070, newRedo:1/7F62C0B0, preCkpLoc:1/7F4FD190 2022-08-07 00:33:04.174 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: will update control file (create checkpoint), shutdown:0 2022-08-07 00:33:04.175 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: attempting to remove WAL segments older than log file 00000001000000010000006D 2022-08-07 00:33:31.729 62ee97db.5054 postgres 140242626868992 Clean Statement thread 0 dn_6001_6002 00000 0 [BACKEND] LOG: clean statement thread start 2022-08-07 00:34:03.439 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [DBL_WRT] LOG: [batch flush] DW truncate end: file_head[dwn 193, start 4139], total_pages 0 2022-08-07 00:34:03.447 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: slotname: dn_6002, dummy: 0, restartlsn: 1/7F759F90 2022-08-07 00:34:03.448 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: CreateCheckPoint PrintCkpXctlControlFile: [checkPoint] oldCkpLoc:1/7F6C4070, oldRedo:1/7F62C0B0, newCkpLoc:1/7F75A028, newRedo:1/7F6C37E8, preCkpLoc:1/7F593818 2022-08-07 00:34:03.448 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: will update control file (create checkpoint), shutdown:0 2022-08-07 00:34:03.449 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: attempting to remove WAL segments older than log file 00000001000000010000006D 2022-08-07 00:34:32.492 62ee9818.5053 postgres 140242626868992 Clean Statement thread 0 dn_6001_6002 00000 0 [BACKEND] LOG: clean statement thread start 2022-08-07 00:34:50.017 62ec74f9.6135 [unknown] 140243039483648 dn_6001 0 dn_6001_6002 01000 0 [BACKEND] WARNING: audit file total space(1074649551 B) exceed guc parameter(audit_space_limit: 1048576 KB) 2022-08-07 00:34:50.017 62ec74f9.6135 [unknown] 140243039483648 dn_6001 0 dn_6001_6002 01000 0 [BACKEND] WARNING: /var/log/omm/omm/pg_audit/dn_6001/997_adt 2022-08-07 00:35:03.674 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [DBL_WRT] LOG: [batch flush] DW truncate end: file_head[dwn 193, start 4693], total_pages 0 2022-08-07 00:35:03.681 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: slotname: dn_6002, dummy: 0, restartlsn: 1/7F7F27B0 2022-08-07 00:35:03.682 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: CreateCheckPoint PrintCkpXctlControlFile: [checkPoint] oldCkpLoc:1/7F75A028, oldRedo:1/7F6C37E8, newCkpLoc:1/7F7F2830, newRedo:1/7F759F90, preCkpLoc:1/7F6C4070 2022-08-07 00:35:03.682 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: will update control file (create checkpoint), shutdown:0 2022-08-07 00:35:03.683 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: attempting to remove WAL segments older than log file 00000001000000010000006D 2022-08-07 00:35:28.075 62ee9850.6123 [unknown] 140242626868992 dn_6001 0 dn_6001_6002 00000 0 [AUTOVAC] LOG: report autovac startup on database 15484 to stats collector 2022-08-07 00:35:28.082 62ee9850.6123 [unknown] 140242626868992 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: start autovacuum on database "postgres" 2022-08-07 00:35:33.243 62ee9855.5054 postgres 140242626868992 Clean Statement thread 0 dn_6001_6002 00000 0 [BACKEND] LOG: clean statement thread start 2022-08-07 00:36:03.896 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [DBL_WRT] LOG: [batch flush] DW truncate end: file_head[dwn 193, start 5051], total_pages 0 2022-08-07 00:36:03.904 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: slotname: dn_6002, dummy: 0, restartlsn: 1/7F88B758 2022-08-07 00:36:03.905 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: CreateCheckPoint PrintCkpXctlControlFile: [checkPoint] oldCkpLoc:1/7F7F2830, oldRedo:1/7F759F90, newCkpLoc:1/7F88B7D8, newRedo:1/7F88A868, preCkpLoc:1/7F75A028 2022-08-07 00:36:03.905 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: will update control file (create checkpoint), shutdown:0 2022-08-07 00:36:03.906 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: attempting to remove WAL segments older than log file 00000001000000010000006D 2022-08-07 00:36:34.043 62ee9892.5053 postgres 140242576525056 Clean Statement thread 0 dn_6001_6002 00000 0 [BACKEND] LOG: clean statement thread start 2022-08-07 00:37:34.842 62ee98ce.5057 postgres 140242610087680 Clean Statement thread 0 dn_6001_6002 00000 0 [BACKEND] LOG: clean statement thread start 2022-08-07 00:38:04.188 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [DBL_WRT] LOG: [batch flush] DW truncate end: file_head[dwn 193, start 6138], total_pages 0 2022-08-07 00:38:04.196 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: slotname: dn_6002, dummy: 0, restartlsn: 1/7F9BBB20 2022-08-07 00:38:04.197 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: CreateCheckPoint PrintCkpXctlControlFile: [checkPoint] oldCkpLoc:1/7F88B7D8, oldRedo:1/7F88A868, newCkpLoc:1/7F9BBBA0, newRedo:1/7F923228, preCkpLoc:1/7F7F2830 2022-08-07 00:38:04.197 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: will update control file (create checkpoint), shutdown:0 2022-08-07 00:38:04.198 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: attempting to remove WAL segments older than log file 00000001000000010000006D 2022-08-07 00:38:35.611 62ee990b.5055 postgres 140242576525056 Clean Statement thread 0 dn_6001_6002 00000 0 [BACKEND] LOG: clean statement thread start 2022-08-07 00:39:03.486 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [DBL_WRT] LOG: [batch flush] DW truncate end: file_head[dwn 193, start 6711], total_pages 0 2022-08-07 00:39:03.507 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: slotname: dn_6002, dummy: 0, restartlsn: 1/7FA506B0 2022-08-07 00:39:03.517 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: CreateCheckPoint PrintCkpXctlControlFile: [checkPoint] oldCkpLoc:1/7F9BBBA0, oldRedo:1/7F923228, newCkpLoc:1/7FA50730, newRedo:1/7F9BB930, preCkpLoc:1/7F88B7D8 2022-08-07 00:39:03.518 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: will update control file (create checkpoint), shutdown:0 2022-08-07 00:39:03.537 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: attempting to remove WAL segments older than log file 00000001000000010000006D 2022-08-07 00:39:36.489 62ee9948.5057 postgres 140242559743744 Clean Statement thread 0 dn_6001_6002 00000 0 [BACKEND] LOG: clean statement thread start 2022-08-07 00:40:03.798 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [DBL_WRT] LOG: [batch flush] DW truncate end: file_head[dwn 193, start 7292], total_pages 0 2022-08-07 00:40:03.806 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: slotname: dn_6002, dummy: 0, restartlsn: 1/7FAE98C0 2022-08-07 00:40:03.807 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: CreateCheckPoint PrintCkpXctlControlFile: [checkPoint] oldCkpLoc:1/7FA50730, oldRedo:1/7F9BB930, newCkpLoc:1/7FAE9940, newRedo:1/7FAE98C0, preCkpLoc:1/7F9BBBA0 2022-08-07 00:40:03.807 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: will update control file (create checkpoint), shutdown:0 2022-08-07 00:40:03.808 62ec74f9.6125 [unknown] 140243736393472 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: attempting to remove WAL segments older than log file 00000001000000010000006D 2022-08-07 00:40:15.377 62ec74ff.5063 [unknown] 140242984888064 dn_6002 0 dn_6001_6002 00000 0 [BACKEND] LOG: the config file has no change 2022-08-07 00:40:27.622 62ee997b.6123 [unknown] 140242576525056 dn_6001 0 dn_6001_6002 00000 0 [AUTOVAC] LOG: report autovac startup on database 1 to stats collector 2022-08-07 00:40:27.629 62ee997b.6123 [unknown] 140242576525056 dn_6001 0 dn_6001_6002 00000 0 [BACKEND] LOG: start autovacuum on database "template1" 2022-08-07 00:40:37.314 62ee9985.5053 postgres 140242576525056 Clean Statement thread 0 dn_6001_6002 00000 0 [BACKEND] LOG: clean statement thread start 2022-08-07 00:41:30.921 62ee99ba.5053 postgres 140242576525056 PostgreSQL JDBC Driver 0 dn_6001_6002 42809 3940649676307841 [BACKEND] ERROR: [AllocateZonesBeforXid:856]failed to allocate a zone 2022-08-07 00:41:30.921 62ee99ba.5053 postgres 140242576525056 PostgreSQL JDBC Driver 0 dn_6001_6002 42809 3940649676307841 [BACKEND] STATEMENT: insert into users(user_name,rowversion) values($1,now()),($2,now()),($3,now()),($4,now()),($5,now()) 2022-08-07 00:41:31.205 62ee99bb.5055 postgres 140242593306368 PostgreSQL JDBC Driver 0 dn_6001_6002 42809 3940649676307850 [BACKEND] ERROR: [AllocateZonesBeforXid:856]failed to allocate a zone
请问如何解决?

@周斌 好的,我试试

@黄佳 刚才经过长时间验证,包括创表为ustore/astore以及在测试过程中频繁插入数据,也无法复现此问题。可否方便提供主备的配置文件,以及测试的机器信息、内存信息以及数据库和驱动版本信息。

周斌 添加协作者周斌
周斌 负责人周斌 修改为陈栋
周斌 取消协作者周斌

实不相瞒,我在不同的插入的同时,也在不同的查询和删除数据。只是当时报错是插入线程之间进入死循环,我就只提了插入。

@周斌 测试的时候,是用服务器的两台虚拟机测试的,你需要的驱动信息指的是哪些驱动

配置文件:

<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
    <!-- openGauss整体信息 -->
    <CLUSTER>
        <!-- 数据库名称 -->
        <PARAM name="clusterName" value="Cluster_template" />
        <!-- 数据库节点名称(hostname) -->
        <PARAM name="nodeNames" value="gauss-db1,gauss-db2" />
        <!-- 数据库安装目录-->
        <PARAM name="gaussdbAppPath" value="/opt/huawei/install/app" />
        <!-- 日志目录-->
        <PARAM name="gaussdbLogPath" value="/var/log/omm" />
        <!-- 临时文件目录-->
        <PARAM name="tmpMppdbPath" value="/opt/huawei/tmp"/>
        <!-- 数据库工具目录-->
        <PARAM name="gaussdbToolPath" value="/opt/huawei/install/om" />
        <!-- 数据库core文件目录-->
        <PARAM name="corePath" value="/opt/huawei/corefile"/>
        <!-- 节点IP,与数据库节点名称列表一一对应 -->
        <PARAM name="backIp1s" value="****,****"/> 
    </CLUSTER>
    <!-- 每台服务器上的节点部署信息 -->
    <DEVICELIST>
        <!-- 节点1上的部署信息 -->
        <DEVICE sn="gauss-db1">
            <!-- 节点1的主机名称 -->
            <PARAM name="name" value="gauss-db1"/>
            <!-- 节点1所在的AZ及AZ优先级 -->
            <PARAM name="azName" value="AZ1"/>
            <PARAM name="azPriority" value="1"/>
            <!-- 节点1的IP,如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP -->
            <PARAM name="backIp1" value="****"/>
            <PARAM name="sshIp1" value="****"/>
            
	    <!--dn-->
            <PARAM name="dataNum" value="1"/>
	    <PARAM name="dataPortBase" value="5432"/>
	    <PARAM name="dataNode1" value="/opt/huawei/install/data/dn,gauss-db2,/opt/huawei/install/data/dn"/>
            <PARAM name="dataNode1_syncNum" value="0"/>
        </DEVICE>

        <!-- 节点2上的节点部署信息,其中“name”的值配置为主机名称 -->
        <DEVICE sn="gauss-db2">
            <!-- 节点2的主机名称 -->
            <PARAM name="name" value="gauss-db2"/>
            <!-- 节点2所在的AZ及AZ优先级 -->
            <PARAM name="azName" value="AZ1"/>
            <PARAM name="azPriority" value="1"/>
            <!-- 节点2的IP,如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP -->
            <PARAM name="backIp1" value="****"/>
            <PARAM name="sshIp1" value="****"/>
	</DEVICE>
    </DEVICELIST>
</ROOT>

===============================================================

主机内存
备机内存

===============================================================

数据库之前测试就一个很普通的表
数据库users表

陈栋 添加协作者陈栋
陈栋 负责人陈栋 修改为周斌
陈栋 取消协作者陈栋

@黄佳 问题是否还能复现? issue:#I5KOXV:长时间运行会因为“failed to allocate a zone”报错,无法进行任何写操作 已经合入。
如果还有问题请帮忙提供数据库下postgresql.conf配置文件
同时提供连接一下连接数据库的参数(如果含连接池,带连接池参数),使用时有多少并发?

换了存储引擎之后就好了,之前并发量其实不高,就是io频率优点高,加上后面稍微优化了一下sql就没问题了。

@黄佳 可否帮忙把此问题单关闭?

yansong_lee 任务状态待办的 修改为已取消

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(4)
5622128 opengauss bot 1581905080
Java
1
https://gitee.com/opengauss/openGauss-connector-jdbc.git
git@gitee.com:opengauss/openGauss-connector-jdbc.git
opengauss
openGauss-connector-jdbc
openGauss-connector-jdbc

搜索帮助