100 Star 332 Fork 294

openLooKeng / hetu-core

 / 详情

An error occurs when CTS create a bucket table when kill a worker

Todo
Defect 成员
创建于  
2021-03-29 18:52

Software Environment:

  • openLooKeng version (source or binary):
    openLooKeng1.2.0RC6

  • OS platform & distribution (eg., Linux Ubuntu 16.04):

  • Java version:

Describe the current behavior

During CTS create bucket table, kill a worker node, CTS SQL statements fail to be executed.
输入图片说明

Describe the expected behavior

During CTS create bucket table, kill a worker node, CTS SQL statements can be executed through task recovery.

Steps to reproduce the issue

  1. set session snapshot_enabled=true;
  2. CTS create bucket table:
    create table hive.task_recovery_test.task_035000(salutation,first_name,last_name,birth_year) WITH (bucket_count = 32,bucketed_by =ARRAY['salutation'],format = 'ORC', partitioned_by = ARRAY['birth_year']) as select c_salutation,
    c_first_name,
    c_last_name,
    c_birth_year from hive.tpcds_bin_partitioned_orc_1000.customer_address t1 inner join hive.tpcds_bin_partitioned_orc_1000.customer t2 on t1.ca_address_sk=t2.c_current_addr_sk;
  3. Wait until the snapshot is generated and kill a node.

Related log/screenshots

Special notes for this issue

评论 (3)

yumei 创建了缺陷Defect
yumei 关联仓库设置为openLooKeng/hetu-core
i-robot 添加了
 
needs-milestone
标签
展开全部操作日志

@yumei You have not selected a milestone,please select a milestone.After setting the milestone, you can use the /check-milestone command to remove the needs-milestone label.

yumei 添加了
 
bug
标签
yumei 添加了
 
m/ReliableExecution
标签
yumei 负责人设置为jessica-surya

Please provide more details on error

  • Test version
    openlookeng(2021-12-21)
  • Test step
    1.create RCBINARY fromat table
create table test_recovery_automate.customer_textfile with(format='RCBINARY') as select * from tpcds_bin_partitioned_orc_1000.customer;

2.CTAS

create table test_recovery_automate.tb_23 with(bucket_count=16,bucketed_by=array['c_last_review_date_sk']) as select * from test_recovery_automate.customer_textfile;

3.After snapshot 1 is generated, kill a worker.

  • Result
    The following error is occasionally triggered.
io.prestosql.spi.PrestoException: Error committing write to Hive
	at io.prestosql.plugin.hive.OrcFileWriter.commit(OrcFileWriter.java:404)
	at io.prestosql.plugin.hive.HiveWriter.commit(HiveWriter.java:105)
	at io.prestosql.plugin.hive.HivePageSink.doFinish(HivePageSink.java:291)
	at io.prestosql.plugin.hive.HivePageSink.mergeFiles(HivePageSink.java:344)
	at io.prestosql.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:23)
	at io.prestosql.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:78)
	at io.prestosql.plugin.hive.HivePageSink.finish(HivePageSink.java:278)
	at io.prestosql.spi.connector.classloader.ClassLoaderSafeConnectorPageSink.finish(ClassLoaderSafeConnectorPageSink.java:90)
	at io.prestosql.operator.TableWriterOperator.finish(TableWriterOperator.java:226)
	at io.prestosql.operator.Driver.processInternal(Driver.java:462)
	at io.prestosql.operator.Driver.lambda$processFor$9(Driver.java:315)
	at io.prestosql.operator.Driver.tryWithLock(Driver.java:785)
	at io.prestosql.operator.Driver.processFor(Driver.java:308)
	at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1261)
	at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
	at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
	at io.prestosql.$gen.Presto_100_RC4_1544_gba66bcf____20211221_025421_1.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.UncheckedIOException: java.io.FileNotFoundException: File does not exist: /user/hive/warehouse/test_recovery_automate.db/.staging-20211221_090207_00338_zm7h7-ec004170-ef93-47bf-a5c6-0f3100ebd85b/000001_0_20211221_090207_00338_zm7h7_snapshot_20211221_090207_00338_zm7h7 (inode 841086752) Holder DFSClient_NONMAPREDUCE_-133843704_198 does not have any open files.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2890)
	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:652)
	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:178)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2757)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:899)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:602)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2876)

	at io.airlift.slice.OutputStreamSliceOutput.writeToOutputStream(OutputStreamSliceOutput.java:362)
	at io.airlift.slice.OutputStreamSliceOutput.writeBytes(OutputStreamSliceOutput.java:185)
	at io.airlift.slice.OutputStreamSliceOutput.writeBytes(OutputStreamSliceOutput.java:176)
	at io.prestosql.orc.OrcOutputBuffer.writeDataTo(OrcOutputBuffer.java:124)
	at io.prestosql.orc.stream.StreamDataOutput.writeData(StreamDataOutput.java:67)
	at io.prestosql.orc.OutputStreamOrcDataSink.lambda$write$0(OutputStreamOrcDataSink.java:53)
	at java.util.ArrayList.forEach(ArrayList.java:1257)
	at io.prestosql.orc.OutputStreamOrcDataSink.write(OutputStreamOrcDataSink.java:53)
	at io.prestosql.orc.OrcWriter.flushStripe(OrcWriter.java:356)
	at io.prestosql.orc.OrcWriter.close(OrcWriter.java:461)
	at io.prestosql.plugin.hive.OrcFileWriter.commit(OrcFileWriter.java:395)
	... 19 more
Caused by: java.io.FileNotFoundException: File does not exist: /user/hive/warehouse/test_recovery_automate.db/.staging-20211221_090207_00338_zm7h7-ec004170-ef93-47bf-a5c6-0f3100ebd85b/000001_0_20211221_090207_00338_zm7h7_snapshot_20211221_090207_00338_zm7h7 (inode 841086752) Holder DFSClient_NONMAPREDUCE_-133843704_198 does not have any open files.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2890)
	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:652)
	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:178)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2757)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:899)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:602)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2876)

	at sun.reflect.GeneratedConstructorAccessor209.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
	at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1084)
	at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1865)
	at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1668)
	at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
Caused by: org.apache.hadoop.ipc.RemoteException: File does not exist: /user/hive/warehouse/test_recovery_automate.db/.staging-20211221_090207_00338_zm7h7-ec004170-ef93-47bf-a5c6-0f3100ebd85b/000001_0_20211221_090207_00338_zm7h7_snapshot_20211221_090207_00338_zm7h7 (inode 841086752) Holder DFSClient_NONMAPREDUCE_-133843704_198 does not have any open files.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2890)
	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:652)
	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:178)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2757)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:899)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:602)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2876)

	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1511)
	at org.apache.hadoop.ipc.Client.call(Client.java:1457)
	at org.apache.hadoop.ipc.Client.call(Client.java:1367)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
	at com.sun.proxy.$Proxy352.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:513)
	at sun.reflect.GeneratedMethodAccessor896.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
	at com.sun.proxy.$Proxy353.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1081)
	... 3 more

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(4)
Java
1
https://gitee.com/openlookeng/hetu-core.git
git@gitee.com:openlookeng/hetu-core.git
openlookeng
hetu-core
hetu-core

搜索帮助