402 Star 1.4K Fork 1.3K

GVPopenEuler / kernel

 / 详情

【OLK-5.10】liburing测试用例会导致内核卡死

已完成
缺陷 成员
创建于  
2023-08-09 18:43

【标题描述】OLK-5.10: liburing测试用例sigfd-deadlock会导致内核卡死
【环境信息】
硬件信息:
1) Loongson-LS2C50C2 3C5000+7A2000 双路双桥服务器
软件信息:
1) OS: 22.03-LTS
2) 内核: OLK-5.10
b4105458dd97 2023-08-02 (origin/OLK-5.10) !1591: net/sched: cls_u32: Fix reference counter leak leading to overflow net/sched: cls_u32: Fix reference counter leak leading to overflow [openeuler-ci-bot]
3) 问题组件: liburing软件包

【问题复现步骤】
git clone https://github.com/axboe/liburing
cd liburing
./configure
make -j32
make runtests

必现
【预期结果】
测试成功
【实际结果】
测试时出现死机
【附件信息】
比如系统message日志/组件日志、dump信息、图片等

评论 (8)

zhanghongchen 创建了缺陷

Hi Hongchen_Zhang, welcome to the openEuler Community.
I'm the Bot here serving you. You can find the instructions on how to interact with me at Here.
If you have any questions, please contact the SIG: Kernel, and any of the maintainers.

openeuler-ci-bot 添加了
 
sig/Kernel
标签
zhanghongchen 修改了标题
zhanghongchen 修改了描述
zhangjialin 添加了
 
help-wanted
标签

使用最新OLK-5.10代码测试:
794c1919e741 2023-08-10 (HEAD -> OLK-5.10) !1683:net: hns3: revert some bugfix and backport some patch net: hns3: revert some bugfix and backport some patch [openeuler-ci-bot]
报错:

[  283.542448] Running test open-direct-pick.t:
[  283.561165] Running test personality.t:
[  283.580048] Running test pipe-bug.t:
[  284.248636] Running test pipe-eof.t:
[  284.268003] Running test pipe-reuse.t:
[  284.286991] Running test poll.t:
[  344.298295] rcu: INFO: rcu_sched self-detected stall on CPU
[  344.304533] rcu: 	15-....: (14999 ticks this GP) idle=5da/1/0x4000000000000002 softirq=2663/2663 fqs=7216 
[  344.314825] 	(t=15004 jiffies g=8681 q=1091)
[  344.314829] NMI backtrace for cpu 15
[  344.314834] CPU: 15 PID: 23066 Comm: poll.t Not tainted 5.10.0+ #298
[  344.314836] Hardware name: LOONGSON Dabieshan T22C08B0/Loongson-LS2C50C2, BIOS Loongson-UDK2018-V4.0.8-Dual 06/09/23 16:13:49
[  344.314838] Stack : 90000001000d7b08 9000000000f41d70 900010008715c000 90000001000d7a40
[  344.314849]         0000000000000000 90000001000d7a40 90000000012e0b20 90000001000d7920
[  344.314855]         0000000000000040 90000000017e93b0 0000000000000071 0000000000000020
[  344.314861]         0000000000000000 0000000000000023 0000000000000001 c0000000ffffdfff
[  344.314868]         383043323254206e 676e6f6f4c2f3042 00000000ffffdfff 0000000000000003
[  344.314874]         0000000000005a1a 0000000008d24000 900000000128d000 00000000000000b0
[  344.314880]         9000000001536488 0000000000000000 90000000012e1000 0000000000000001
[  344.314887]         0000000000000001 90000000010f0758 9000000001447000 9000000001447110
[  344.314893]         0000000000000000 9000000000222e68 000000ffec023010 00000000000000b0
[  344.314899]         0000000000000004 0000000000000000 0000000000071c1c 0000000000000000
[  344.314905]         ...
[  344.314909] Call Trace:
[  344.314917] [<9000000000222e68>] show_stack+0x24/0x124
[  344.314923] [<9000000000f41d70>] dump_stack+0xac/0xe0
[  344.314928] [<900000000092f0d8>] nmi_cpu_backtrace+0xc8/0x118
[  344.314931] [<900000000092f270>] nmi_trigger_cpumask_backtrace+0x148/0x170
[  344.314934] [<9000000000223e90>] arch_trigger_cpumask_backtrace+0x14/0x24
[  344.314939] [<9000000000f3af80>] rcu_dump_cpu_stacks+0x140/0x1b8
[  344.314943] [<90000000002bb778>] print_cpu_stall+0x1a4/0x27c
[  344.314946] [<90000000002bbcd8>] check_cpu_stall+0x118/0x24c
[  344.314948] [<90000000002bbe3c>] rcu_pending+0x30/0xe8
[  344.314951] [<90000000002beef0>] rcu_sched_clock_irq+0x58/0xc4
[  344.314956] [<90000000002c8660>] update_process_times+0x58/0x98
[  344.314961] [<90000000002d8d44>] tick_sched_handle+0x3c/0x5c
[  344.314965] [<90000000002d9080>] tick_sched_timer+0x40/0xa8
[  344.314968] [<90000000002c98d0>] __run_hrtimer.constprop.0+0x78/0x1a4
[  344.314970] [<90000000002c9a94>] __hrtimer_run_queues+0x98/0x10c
[  344.314973] [<90000000002ca064>] hrtimer_interrupt+0x138/0x388
[  344.314976] [<90000000002263c8>] constant_timer_interrupt+0x38/0x48
[  344.314981] [<90000000002a587c>] __handle_irq_event_percpu+0x84/0x184
[  344.314983] [<90000000002a599c>] handle_irq_event_percpu+0x20/0x78
[  344.314987] [<90000000002ac3d0>] handle_percpu_irq+0x54/0x88
[  344.314990] [<90000000002a5074>] __handle_domain_irq+0x64/0xb8
[  344.314993] [<900000000094075c>] handle_cpu_irq+0x78/0xb8
[  344.314997] [<9000000000f57890>] handle_loongarch_irq+0x30/0x48
[  344.315000] [<9000000000f57928>] do_vint+0x80/0xb4
[  344.315005] [<90000000002887e4>] __wake_up_common_lock+0xbc/0x10c
[  344.315010] [<90000000008967f0>] io_req_complete_post+0x14c/0x364
[  344.315013] [<9000000000898b2c>] tctx_task_work+0x100/0x1b4
[  344.315017] [<9000000000259778>] task_work_run+0xc0/0x108
[  344.315021] [<90000000002c3920>] exit_to_user_mode_loop.isra.0+0xd4/0x114
[  344.315024] [<9000000000f57ee8>] syscall_exit_to_user_mode+0x9c/0x134
[  344.315026] [<9000000000f579d8>] do_syscall+0x7c/0x94
[  344.315028] [<9000000000221440>] handle_syscall+0xa0/0x144

[  344.315056] Task dump for CPU 15:
[  344.315058] task:poll.t          state:R  running task     stack:    0 pid:23066 ppid: 23065 flags:0x0020000e
[  344.315063] Stack : 0000000000000000 9000000000f3af8c 900010008715c000 90000001000d7af0
[  344.315069]         9000100089fba900 90000001000d7af0 9000000001288f88 90000001000d7a00
[  344.315075]         0000000000000040 90000000017e9780 0000000000000002 0000000000000030
[  344.315081]         9000100089fba900 0000000000000023 0000000000000001 c0000000ffffdfff
[  344.315087]         617420676e696e6e 7320202020206b73 00000000ffffdfff 0000000000000003
[  344.315094]         0000000000005a19 0000000008d24000 900000000128d000 9000000000f7d200
[  344.315100]         000000000000000f 00000000000000b0 9000000001447238 900000000153f6a8
[  344.315106]         9000000000f7d000 90000000010f0758 9000000001447000 9000000001447110
[  344.315112]         0000000000005a1a 9000000000222e68 000000ffec023010 00000000000000b0
[  344.315119]         0000000000000004 0000000000000000 0000000000071c1c 0000000000000000
[  344.315125]         ...
[  344.315128] Call Trace:
[  344.315130] [<9000000000222e68>] show_stack+0x24/0x124
[  344.315133] [<9000000000f3af8c>] rcu_dump_cpu_stacks+0x14c/0x1b8
[  344.315136] [<90000000002bb778>] print_cpu_stall+0x1a4/0x27c
[  344.315138] [<90000000002bbcd8>] check_cpu_stall+0x118/0x24c
[  344.315140] [<90000000002bbe3c>] rcu_pending+0x30/0xe8
[  344.315143] [<90000000002beef0>] rcu_sched_clock_irq+0x58/0xc4
[  344.315145] [<90000000002c8660>] update_process_times+0x58/0x98
[  344.315148] [<90000000002d8d44>] tick_sched_handle+0x3c/0x5c
[  344.315151] [<90000000002d9080>] tick_sched_timer+0x40/0xa8
[  344.315153] [<90000000002c98d0>] __run_hrtimer.constprop.0+0x78/0x1a4
[  344.315156] [<90000000002c9a94>] __hrtimer_run_queues+0x98/0x10c
[  344.315159] [<90000000002ca064>] hrtimer_interrupt+0x138/0x388
[  344.315161] [<90000000002263c8>] constant_timer_interrupt+0x38/0x48
[  344.315164] [<90000000002a587c>] __handle_irq_event_percpu+0x84/0x184
[  344.315166] [<90000000002a599c>] handle_irq_event_percpu+0x20/0x78
[  344.315168] [<90000000002ac3d0>] handle_percpu_irq+0x54/0x88
[  344.315171] [<90000000002a5074>] __handle_domain_irq+0x64/0xb8
[  344.315174] [<900000000094075c>] handle_cpu_irq+0x78/0xb8
[  344.315176] [<9000000000f57890>] handle_loongarch_irq+0x30/0x48
[  344.315179] [<9000000000f57928>] do_vint+0x80/0xb4
[  344.315181] [<90000000002887e4>] __wake_up_common_lock+0xbc/0x10c
[  344.315184] [<90000000008967f0>] io_req_complete_post+0x14c/0x364
[  344.315186] [<9000000000898b2c>] tctx_task_work+0x100/0x1b4
[  344.315189] [<9000000000259778>] task_work_run+0xc0/0x108
[  344.315192] [<90000000002c3920>] exit_to_user_mode_loop.isra.0+0xd4/0x114
[  344.315194] [<9000000000f57ee8>] syscall_exit_to_user_mode+0x9c/0x134
[  344.315197] [<9000000000f579d8>] do_syscall+0x7c/0x94
[  344.315200] [<9000000000221440>] handle_syscall+0xa0/0x144

[  387.582462] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 15-... } 15033 jiffies s: 153 root: 0x1/.
[  387.593740] rcu: blocking rcu_node structures: l=1:0-15:0x8000/.
[  387.600413] Task dump for CPU 15:
[  387.600422] task:poll.t          state:R  running task     stack:    0 pid:23066 ppid: 23065 flags:0x0020000e
[  387.600430] Stack : 0000000000000004 0000000000000001 900000012cd73c10 00000000000000b4
[  387.600446]         900000011ab64700 90000000002887d4 0000000000000000 0000000000000000
[  387.600453]         900000011a41e008 0000000000000000 0000000000000000 0000000000000000
[  387.600460]         0000000000000100 0000000000000122 900000011ab67c90 0000000000000004
[  387.600472]         0000000000000004 900000011a41dff0 900000012cd73b80 900000011ab66580
[  387.600478]         900000011ab64780 900000011ab64700 900000012cd73800 90000000008967f0
[  387.600489]         900010008715fd5f 900000012cd73800 900000011a41e008 0000000000000001
[  387.600496]         900010008715fd5f 900000012cd73800 900000011ab64790 9000000000898b2c
[  387.600508]         90000001278de6c0 0100000000000001 0000000000004000 0000000010000000
[  387.600514]         000000012000be48 000000fffb9bb8b8 ffffffffffffffef 90000000017e5158
[  387.600521]         ...
[  387.600530] Call Trace:
[  387.600542] [<9000000000f5e7ec>] __schedule+0x374/0x6e0



从上面报错日志看,内核没有卡死, 而是由于某些原因导致了rcu stall。

嗯,怀疑是缺少补丁,使用6.4可测试通过且dmesg无RCU报错:

[root@localhost liburing]# make -j32
make[1]: 进入目录“/root/liburing/src”
make[1]: 对“all”无需做任何事。
make[1]: 离开目录“/root/liburing/src”
make[1]: 进入目录“/root/liburing/test”
make[1]: 对“all”无需做任何事。
make[1]: 离开目录“/root/liburing/test”
make[1]: 进入目录“/root/liburing/examples”
make[1]: 对“all”无需做任何事。
make[1]: 离开目录“/root/liburing/examples”
[root@localhost liburing]# make runtests
make[1]: 进入目录“/root/liburing/src”
make[1]: 对“all”无需做任何事。
make[1]: 离开目录“/root/liburing/src”
make[1]: 进入目录“/root/liburing/test”
make[1]: 对“all”无需做任何事。
make[1]: 离开目录“/root/liburing/test”
make[1]: 进入目录“/root/liburing/examples”
make[1]: 对“all”无需做任何事。
make[1]: 离开目录“/root/liburing/examples”
make[1]: 进入目录“/root/liburing/test”
Running test 232c93d07b74.t                                         4 sec [4]
Running test 35fa71a030ca.t                                         5 sec [5]
Running test 500f9fbadef8.t                                         0 sec [0]
Running test 7ad0e4b2f83c.t                                         1 sec [1]
Running test 8a9973408177.t                                         0 sec [0]
Running test 917257daa0fe.t                                         0 sec [0]
Running test a0908ae19763.t                                         0 sec [0]
Running test a4c0b3decb33.t                                         7 sec [6]
Running test accept.t                                               1 sec
Running test accept-link.t                                          0 sec [1]
Running test accept-reuse.t                                         0 sec [0]
Running test accept-test.t                                          0 sec [0]
Running test across-fork.t                                          0 sec [0]
Running test b19062a56726.t                                         0 sec [0]
Running test b5837bd5311d.t                                         0 sec [0]
Running test buf-ring.t                                             0 sec [0]
Running test ce593a6c480a.t                                         1 sec [1]
Running test close-opath.t                                          0 sec [0]
Running test connect.t                                              0 sec [0]
Running test connect-rep.t                                          0 sec [0]
Running test coredump.t                                             1 sec [0]
Running test cq-full.t                                              0 sec [0]
Running test cq-overflow.t                                          10 sec [10]
Running test cq-peek-batch.t                                        0 sec [0]
Running test cq-ready.t                                             0 sec [0]
Running test cq-size.t                                              0 sec [0]
Running test d4ae271dfaae.t                                         0 sec [0]
Running test d77a67ed5f27.t                                         0 sec [1]
Running test defer.t                                                3 sec [3]
Running test defer-taskrun.t                                        0 sec
Running test double-poll-crash.t                                    Skipped
Running test drop-submit.t                                          0 sec [0]
Running test eeed8b54e0df.t                                         0 sec [0]
Running test empty-eownerdead.t                                     0 sec [0]
Running test eploop.t                                               0 sec [0]
Running test eventfd.t                                              0 sec [0]
Running test eventfd-disable.t                                      0 sec [0]
Running test eventfd-reg.t                                          0 sec [3]
Running test eventfd-ring.t                                         0 sec [0]
Running test evloop.t                                               0 sec [0]
Running test exec-target.t                                          0 sec [0]
Running test exit-no-cleanup.t                                      0 sec [0]
Running test fadvise.t                                              0 sec [0]
Running test fallocate.t                                            0 sec [0]
Running test fc2a85cb02ef.t                                         Test needs failslab/fail_futex/fail_page_alloc enabled, skipped
Skipped
Running test fd-pass.t                                              0 sec
Running test file-register.t                                        3 sec [4]
Running test files-exit-hang-poll.t                                 1 sec [1]
Running test files-exit-hang-timeout.t                              1 sec [1]
Running test file-update.t                                          0 sec [0]
Running test file-verify.t                                          3 sec [3]
Running test fixed-buf-iter.t                                       0 sec [0]
Running test fixed-link.t                                           0 sec [0]
Running test fixed-reuse.t                                          0 sec
Running test fpos.t                                                 1 sec
Running test fsnotify.t                                             0 sec
Running test fsync.t                                                0 sec [0]
Running test hardlink.t                                             0 sec
Running test io-cancel.t                                            2 sec [2]
Running test iopoll.t                                               0 sec [0]
Running test iopoll-leak.t                                          0 sec [0]
Running test iopoll-overflow.t                                      1 sec [1]
Running test io_uring_enter.t                                       0 sec [0]
Running test io_uring_passthrough.t                                 Skipped
Running test io_uring_register.t                                    Unable to map a huge page.  Try increasing /proc/sys/vm/nr_hugepages by at least 1.
Skipping the hugepage test
0 sec [0]
Running test io_uring_setup.t                                       0 sec [0]
Running test lfs-openat.t                                           0 sec [0]
Running test lfs-openat-write.t                                     0 sec [0]
Running test link.t                                                 0 sec [0]
Running test link_drain.t                                           1 sec [0]
Running test link-timeout.t                                         2 sec [2]
Running test madvise.t                                              0 sec [0]
Running test mkdir.t                                                0 sec
Running test msg-ring.t                                             0 sec
Running test msg-ring-flags.t                                       0 sec
Running test msg-ring-overflow.t                                    0 sec
Running test multicqes_drain.t                                      20 sec [10]
Running test nolibc.t                                               Skipped
Running test nop-all-sizes.t                                        0 sec [0]
Running test nop.t                                                  1 sec [0]
Running test openat2.t                                              0 sec [0]
Running test open-close.t                                           0 sec [0]
Running test open-direct-link.t                                     0 sec [0]
Running test open-direct-pick.t                                     0 sec [0]
Running test personality.t                                          0 sec [0]
Running test pipe-bug.t                                             0 sec [1]
Running test pipe-eof.t                                             0 sec [0]
Running test pipe-reuse.t                                           0 sec [0]
Running test poll.t                                                 0 sec
Running test poll-cancel.t                                          0 sec [0]
Running test poll-cancel-all.t                                      0 sec [0]
Running test poll-cancel-ton.t                                      0 sec [0]
Running test poll-link.t                                            0 sec [0]
Running test poll-many.t                                            3 sec [2]
Running test poll-mshot-overflow.t                                  0 sec
Running test poll-mshot-update.t                                    6 sec [2]
Running test poll-race.t                                            1 sec [0]
Running test poll-race-mshot.t                                      1 sec
Running test poll-ring.t                                            0 sec [0]
Running test poll-v-poll.t                                          0 sec [1]
Running test pollfree.t                                             0 sec [0]
Running test probe.t                                                0 sec [0]
Running test read-before-exit.t                                     1 sec [1]
Running test read-write.t                                           1 sec [6]
Running test recv-msgall.t                                          0 sec [0]
Running test recv-msgall-stream.t                                   0 sec
Running test recv-multishot.t                                       0 sec
Running test reg-fd-only.t                                          Skipped
Running test reg-hint.t                                             0 sec
Running test reg-reg-ring.t                                         0 sec
Running test regbuf-merge.t                                         0 sec [0]
Running test register-restrictions.t                                0 sec [1]
Running test rename.t                                               0 sec [0]
Running test ringbuf-read.t                                         0 sec [0]
Running test ring-leak2.t                                           1 sec [1]
Running test ring-leak.t                                            0 sec [0]
Running test rsrc_tags.t                                            0 sec [12]
Running test rw_merge_test.t                                        0 sec [0]
Running test self.t                                                 0 sec [0]
Running test sendmsg_fs_cve.t                                       0 sec [0]
Running test send_recv.t                                            0 sec [0]
Running test send_recvmsg.t                                         0 sec [1]
Running test send-zerocopy.t                                        24 sec
Running test shared-wq.t                                            0 sec [0]
Running test short-read.t                                           0 sec [0]
Running test shutdown.t                                             0 sec [0]
Running test sigfd-deadlock.t                                       0 sec [0]
Running test single-issuer.t                                        0 sec
Running test skip-cqe.t                                             0 sec
Running test socket.t                                               0 sec [0]
Running test socket-io-cmd.t                                        Skipping tests. -ENOTSUP returned
Skipped
Running test socket-rw.t                                            0 sec [0]
Running test socket-rw-eagain.t                                     0 sec [0]
Running test socket-rw-offset.t                                     0 sec [0]
Running test splice.t                                               0 sec [0]
Running test sq-full.t                                              0 sec [0]
Running test sq-full-cpp.t                                          0 sec [0]
Running test sqpoll-cancel-hang.t                                   Skipped
Running test sqpoll-disable-exit.t                                  0 sec [0]
Running test sq-poll-dup.t                                          1 sec [1]
Running test sqpoll-exit-hang.t                                     1 sec [1]
Running test sq-poll-kthread.t                                      2 sec [2]
Running test sq-poll-share.t                                        2 sec [5]
Running test sqpoll-sleep.t                                         0 sec [0]
Running test sq-space_left.t                                        0 sec [0]
Running test stdout.t                                               This is a pipe test
This is a fixed pipe test
0 sec [0]
Running test submit-and-wait.t                                      1 sec [1]
Running test submit-link-fail.t                                     0 sec [0]
Running test submit-reuse.t                                         2 sec [2]
Running test symlink.t                                              0 sec [0]
Running test sync-cancel.t                                          0 sec
Running test teardowns.t                                            0 sec [0]
Running test thread-exit.t                                          0 sec [0]
Running test timeout.t                                              8 sec [6]
Running test timeout-new.t                                          3 sec [2]
Running test tty-write-dpoll.t                                      0 sec [0]
Running test unlink.t                                               0 sec [0]
Running test version.t                                              0 sec [0]
Running test wakeup-hang.t                                          2 sec [2]
Running test xattr.t                                                0 sec [0]
Running test statx.t                                                0 sec [0]
Running test sq-full-cpp.t                                          0 sec [0]
All tests passed
make[1]: 离开目录“/root/liburing/test”
[root@localhost liburing]# uname -a
Linux localhost.localdomain 6.4.0+ #30 SMP PREEMPT Thu Aug 10 19:22:54 CST 2023 loongarch64 loongarch64 loongarch64 GNU/Linux

zhangjialin 关联分支设置为OLK-5.10
Wei Li 修改了标题

io_uring暂无看护committer,挂起

zhangjialin 任务状态待办的 修改为已挂起
zhangjialin 任务状态已挂起 修改为已完成

最新的OLK-5.10已经修复

最新测试未出现卡死

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(4)
5329419 openeuler ci bot 1632792936
C
1
https://gitee.com/openeuler/kernel.git
git@gitee.com:openeuler/kernel.git
openeuler
kernel
kernel

搜索帮助