【标题描述】OLK-5.10: liburing测试用例sigfd-deadlock会导致内核卡死
【环境信息】
硬件信息:
1) Loongson-LS2C50C2 3C5000+7A2000 双路双桥服务器
软件信息:
1) OS: 22.03-LTS
2) 内核: OLK-5.10
b4105458dd97 2023-08-02 (origin/OLK-5.10) !1591: net/sched: cls_u32: Fix reference counter leak leading to overflow net/sched: cls_u32: Fix reference counter leak leading to overflow [openeuler-ci-bot]
3) 问题组件: liburing软件包
【问题复现步骤】
git clone https://github.com/axboe/liburing
cd liburing
./configure
make -j32
make runtests
必现
【预期结果】
测试成功
【实际结果】
测试时出现死机
【附件信息】
比如系统message日志/组件日志、dump信息、图片等
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
使用最新OLK-5.10代码测试:
794c1919e741 2023-08-10 (HEAD -> OLK-5.10) !1683:net: hns3: revert some bugfix and backport some patch net: hns3: revert some bugfix and backport some patch [openeuler-ci-bot]
报错:
[ 283.542448] Running test open-direct-pick.t:
[ 283.561165] Running test personality.t:
[ 283.580048] Running test pipe-bug.t:
[ 284.248636] Running test pipe-eof.t:
[ 284.268003] Running test pipe-reuse.t:
[ 284.286991] Running test poll.t:
[ 344.298295] rcu: INFO: rcu_sched self-detected stall on CPU
[ 344.304533] rcu: 15-....: (14999 ticks this GP) idle=5da/1/0x4000000000000002 softirq=2663/2663 fqs=7216
[ 344.314825] (t=15004 jiffies g=8681 q=1091)
[ 344.314829] NMI backtrace for cpu 15
[ 344.314834] CPU: 15 PID: 23066 Comm: poll.t Not tainted 5.10.0+ #298
[ 344.314836] Hardware name: LOONGSON Dabieshan T22C08B0/Loongson-LS2C50C2, BIOS Loongson-UDK2018-V4.0.8-Dual 06/09/23 16:13:49
[ 344.314838] Stack : 90000001000d7b08 9000000000f41d70 900010008715c000 90000001000d7a40
[ 344.314849] 0000000000000000 90000001000d7a40 90000000012e0b20 90000001000d7920
[ 344.314855] 0000000000000040 90000000017e93b0 0000000000000071 0000000000000020
[ 344.314861] 0000000000000000 0000000000000023 0000000000000001 c0000000ffffdfff
[ 344.314868] 383043323254206e 676e6f6f4c2f3042 00000000ffffdfff 0000000000000003
[ 344.314874] 0000000000005a1a 0000000008d24000 900000000128d000 00000000000000b0
[ 344.314880] 9000000001536488 0000000000000000 90000000012e1000 0000000000000001
[ 344.314887] 0000000000000001 90000000010f0758 9000000001447000 9000000001447110
[ 344.314893] 0000000000000000 9000000000222e68 000000ffec023010 00000000000000b0
[ 344.314899] 0000000000000004 0000000000000000 0000000000071c1c 0000000000000000
[ 344.314905] ...
[ 344.314909] Call Trace:
[ 344.314917] [<9000000000222e68>] show_stack+0x24/0x124
[ 344.314923] [<9000000000f41d70>] dump_stack+0xac/0xe0
[ 344.314928] [<900000000092f0d8>] nmi_cpu_backtrace+0xc8/0x118
[ 344.314931] [<900000000092f270>] nmi_trigger_cpumask_backtrace+0x148/0x170
[ 344.314934] [<9000000000223e90>] arch_trigger_cpumask_backtrace+0x14/0x24
[ 344.314939] [<9000000000f3af80>] rcu_dump_cpu_stacks+0x140/0x1b8
[ 344.314943] [<90000000002bb778>] print_cpu_stall+0x1a4/0x27c
[ 344.314946] [<90000000002bbcd8>] check_cpu_stall+0x118/0x24c
[ 344.314948] [<90000000002bbe3c>] rcu_pending+0x30/0xe8
[ 344.314951] [<90000000002beef0>] rcu_sched_clock_irq+0x58/0xc4
[ 344.314956] [<90000000002c8660>] update_process_times+0x58/0x98
[ 344.314961] [<90000000002d8d44>] tick_sched_handle+0x3c/0x5c
[ 344.314965] [<90000000002d9080>] tick_sched_timer+0x40/0xa8
[ 344.314968] [<90000000002c98d0>] __run_hrtimer.constprop.0+0x78/0x1a4
[ 344.314970] [<90000000002c9a94>] __hrtimer_run_queues+0x98/0x10c
[ 344.314973] [<90000000002ca064>] hrtimer_interrupt+0x138/0x388
[ 344.314976] [<90000000002263c8>] constant_timer_interrupt+0x38/0x48
[ 344.314981] [<90000000002a587c>] __handle_irq_event_percpu+0x84/0x184
[ 344.314983] [<90000000002a599c>] handle_irq_event_percpu+0x20/0x78
[ 344.314987] [<90000000002ac3d0>] handle_percpu_irq+0x54/0x88
[ 344.314990] [<90000000002a5074>] __handle_domain_irq+0x64/0xb8
[ 344.314993] [<900000000094075c>] handle_cpu_irq+0x78/0xb8
[ 344.314997] [<9000000000f57890>] handle_loongarch_irq+0x30/0x48
[ 344.315000] [<9000000000f57928>] do_vint+0x80/0xb4
[ 344.315005] [<90000000002887e4>] __wake_up_common_lock+0xbc/0x10c
[ 344.315010] [<90000000008967f0>] io_req_complete_post+0x14c/0x364
[ 344.315013] [<9000000000898b2c>] tctx_task_work+0x100/0x1b4
[ 344.315017] [<9000000000259778>] task_work_run+0xc0/0x108
[ 344.315021] [<90000000002c3920>] exit_to_user_mode_loop.isra.0+0xd4/0x114
[ 344.315024] [<9000000000f57ee8>] syscall_exit_to_user_mode+0x9c/0x134
[ 344.315026] [<9000000000f579d8>] do_syscall+0x7c/0x94
[ 344.315028] [<9000000000221440>] handle_syscall+0xa0/0x144
[ 344.315056] Task dump for CPU 15:
[ 344.315058] task:poll.t state:R running task stack: 0 pid:23066 ppid: 23065 flags:0x0020000e
[ 344.315063] Stack : 0000000000000000 9000000000f3af8c 900010008715c000 90000001000d7af0
[ 344.315069] 9000100089fba900 90000001000d7af0 9000000001288f88 90000001000d7a00
[ 344.315075] 0000000000000040 90000000017e9780 0000000000000002 0000000000000030
[ 344.315081] 9000100089fba900 0000000000000023 0000000000000001 c0000000ffffdfff
[ 344.315087] 617420676e696e6e 7320202020206b73 00000000ffffdfff 0000000000000003
[ 344.315094] 0000000000005a19 0000000008d24000 900000000128d000 9000000000f7d200
[ 344.315100] 000000000000000f 00000000000000b0 9000000001447238 900000000153f6a8
[ 344.315106] 9000000000f7d000 90000000010f0758 9000000001447000 9000000001447110
[ 344.315112] 0000000000005a1a 9000000000222e68 000000ffec023010 00000000000000b0
[ 344.315119] 0000000000000004 0000000000000000 0000000000071c1c 0000000000000000
[ 344.315125] ...
[ 344.315128] Call Trace:
[ 344.315130] [<9000000000222e68>] show_stack+0x24/0x124
[ 344.315133] [<9000000000f3af8c>] rcu_dump_cpu_stacks+0x14c/0x1b8
[ 344.315136] [<90000000002bb778>] print_cpu_stall+0x1a4/0x27c
[ 344.315138] [<90000000002bbcd8>] check_cpu_stall+0x118/0x24c
[ 344.315140] [<90000000002bbe3c>] rcu_pending+0x30/0xe8
[ 344.315143] [<90000000002beef0>] rcu_sched_clock_irq+0x58/0xc4
[ 344.315145] [<90000000002c8660>] update_process_times+0x58/0x98
[ 344.315148] [<90000000002d8d44>] tick_sched_handle+0x3c/0x5c
[ 344.315151] [<90000000002d9080>] tick_sched_timer+0x40/0xa8
[ 344.315153] [<90000000002c98d0>] __run_hrtimer.constprop.0+0x78/0x1a4
[ 344.315156] [<90000000002c9a94>] __hrtimer_run_queues+0x98/0x10c
[ 344.315159] [<90000000002ca064>] hrtimer_interrupt+0x138/0x388
[ 344.315161] [<90000000002263c8>] constant_timer_interrupt+0x38/0x48
[ 344.315164] [<90000000002a587c>] __handle_irq_event_percpu+0x84/0x184
[ 344.315166] [<90000000002a599c>] handle_irq_event_percpu+0x20/0x78
[ 344.315168] [<90000000002ac3d0>] handle_percpu_irq+0x54/0x88
[ 344.315171] [<90000000002a5074>] __handle_domain_irq+0x64/0xb8
[ 344.315174] [<900000000094075c>] handle_cpu_irq+0x78/0xb8
[ 344.315176] [<9000000000f57890>] handle_loongarch_irq+0x30/0x48
[ 344.315179] [<9000000000f57928>] do_vint+0x80/0xb4
[ 344.315181] [<90000000002887e4>] __wake_up_common_lock+0xbc/0x10c
[ 344.315184] [<90000000008967f0>] io_req_complete_post+0x14c/0x364
[ 344.315186] [<9000000000898b2c>] tctx_task_work+0x100/0x1b4
[ 344.315189] [<9000000000259778>] task_work_run+0xc0/0x108
[ 344.315192] [<90000000002c3920>] exit_to_user_mode_loop.isra.0+0xd4/0x114
[ 344.315194] [<9000000000f57ee8>] syscall_exit_to_user_mode+0x9c/0x134
[ 344.315197] [<9000000000f579d8>] do_syscall+0x7c/0x94
[ 344.315200] [<9000000000221440>] handle_syscall+0xa0/0x144
[ 387.582462] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 15-... } 15033 jiffies s: 153 root: 0x1/.
[ 387.593740] rcu: blocking rcu_node structures: l=1:0-15:0x8000/.
[ 387.600413] Task dump for CPU 15:
[ 387.600422] task:poll.t state:R running task stack: 0 pid:23066 ppid: 23065 flags:0x0020000e
[ 387.600430] Stack : 0000000000000004 0000000000000001 900000012cd73c10 00000000000000b4
[ 387.600446] 900000011ab64700 90000000002887d4 0000000000000000 0000000000000000
[ 387.600453] 900000011a41e008 0000000000000000 0000000000000000 0000000000000000
[ 387.600460] 0000000000000100 0000000000000122 900000011ab67c90 0000000000000004
[ 387.600472] 0000000000000004 900000011a41dff0 900000012cd73b80 900000011ab66580
[ 387.600478] 900000011ab64780 900000011ab64700 900000012cd73800 90000000008967f0
[ 387.600489] 900010008715fd5f 900000012cd73800 900000011a41e008 0000000000000001
[ 387.600496] 900010008715fd5f 900000012cd73800 900000011ab64790 9000000000898b2c
[ 387.600508] 90000001278de6c0 0100000000000001 0000000000004000 0000000010000000
[ 387.600514] 000000012000be48 000000fffb9bb8b8 ffffffffffffffef 90000000017e5158
[ 387.600521] ...
[ 387.600530] Call Trace:
[ 387.600542] [<9000000000f5e7ec>] __schedule+0x374/0x6e0
从上面报错日志看,内核没有卡死, 而是由于某些原因导致了rcu stall。
嗯,怀疑是缺少补丁,使用6.4可测试通过且dmesg无RCU报错:
[root@localhost liburing]# make -j32
make[1]: 进入目录“/root/liburing/src”
make[1]: 对“all”无需做任何事。
make[1]: 离开目录“/root/liburing/src”
make[1]: 进入目录“/root/liburing/test”
make[1]: 对“all”无需做任何事。
make[1]: 离开目录“/root/liburing/test”
make[1]: 进入目录“/root/liburing/examples”
make[1]: 对“all”无需做任何事。
make[1]: 离开目录“/root/liburing/examples”
[root@localhost liburing]# make runtests
make[1]: 进入目录“/root/liburing/src”
make[1]: 对“all”无需做任何事。
make[1]: 离开目录“/root/liburing/src”
make[1]: 进入目录“/root/liburing/test”
make[1]: 对“all”无需做任何事。
make[1]: 离开目录“/root/liburing/test”
make[1]: 进入目录“/root/liburing/examples”
make[1]: 对“all”无需做任何事。
make[1]: 离开目录“/root/liburing/examples”
make[1]: 进入目录“/root/liburing/test”
Running test 232c93d07b74.t 4 sec [4]
Running test 35fa71a030ca.t 5 sec [5]
Running test 500f9fbadef8.t 0 sec [0]
Running test 7ad0e4b2f83c.t 1 sec [1]
Running test 8a9973408177.t 0 sec [0]
Running test 917257daa0fe.t 0 sec [0]
Running test a0908ae19763.t 0 sec [0]
Running test a4c0b3decb33.t 7 sec [6]
Running test accept.t 1 sec
Running test accept-link.t 0 sec [1]
Running test accept-reuse.t 0 sec [0]
Running test accept-test.t 0 sec [0]
Running test across-fork.t 0 sec [0]
Running test b19062a56726.t 0 sec [0]
Running test b5837bd5311d.t 0 sec [0]
Running test buf-ring.t 0 sec [0]
Running test ce593a6c480a.t 1 sec [1]
Running test close-opath.t 0 sec [0]
Running test connect.t 0 sec [0]
Running test connect-rep.t 0 sec [0]
Running test coredump.t 1 sec [0]
Running test cq-full.t 0 sec [0]
Running test cq-overflow.t 10 sec [10]
Running test cq-peek-batch.t 0 sec [0]
Running test cq-ready.t 0 sec [0]
Running test cq-size.t 0 sec [0]
Running test d4ae271dfaae.t 0 sec [0]
Running test d77a67ed5f27.t 0 sec [1]
Running test defer.t 3 sec [3]
Running test defer-taskrun.t 0 sec
Running test double-poll-crash.t Skipped
Running test drop-submit.t 0 sec [0]
Running test eeed8b54e0df.t 0 sec [0]
Running test empty-eownerdead.t 0 sec [0]
Running test eploop.t 0 sec [0]
Running test eventfd.t 0 sec [0]
Running test eventfd-disable.t 0 sec [0]
Running test eventfd-reg.t 0 sec [3]
Running test eventfd-ring.t 0 sec [0]
Running test evloop.t 0 sec [0]
Running test exec-target.t 0 sec [0]
Running test exit-no-cleanup.t 0 sec [0]
Running test fadvise.t 0 sec [0]
Running test fallocate.t 0 sec [0]
Running test fc2a85cb02ef.t Test needs failslab/fail_futex/fail_page_alloc enabled, skipped
Skipped
Running test fd-pass.t 0 sec
Running test file-register.t 3 sec [4]
Running test files-exit-hang-poll.t 1 sec [1]
Running test files-exit-hang-timeout.t 1 sec [1]
Running test file-update.t 0 sec [0]
Running test file-verify.t 3 sec [3]
Running test fixed-buf-iter.t 0 sec [0]
Running test fixed-link.t 0 sec [0]
Running test fixed-reuse.t 0 sec
Running test fpos.t 1 sec
Running test fsnotify.t 0 sec
Running test fsync.t 0 sec [0]
Running test hardlink.t 0 sec
Running test io-cancel.t 2 sec [2]
Running test iopoll.t 0 sec [0]
Running test iopoll-leak.t 0 sec [0]
Running test iopoll-overflow.t 1 sec [1]
Running test io_uring_enter.t 0 sec [0]
Running test io_uring_passthrough.t Skipped
Running test io_uring_register.t Unable to map a huge page. Try increasing /proc/sys/vm/nr_hugepages by at least 1.
Skipping the hugepage test
0 sec [0]
Running test io_uring_setup.t 0 sec [0]
Running test lfs-openat.t 0 sec [0]
Running test lfs-openat-write.t 0 sec [0]
Running test link.t 0 sec [0]
Running test link_drain.t 1 sec [0]
Running test link-timeout.t 2 sec [2]
Running test madvise.t 0 sec [0]
Running test mkdir.t 0 sec
Running test msg-ring.t 0 sec
Running test msg-ring-flags.t 0 sec
Running test msg-ring-overflow.t 0 sec
Running test multicqes_drain.t 20 sec [10]
Running test nolibc.t Skipped
Running test nop-all-sizes.t 0 sec [0]
Running test nop.t 1 sec [0]
Running test openat2.t 0 sec [0]
Running test open-close.t 0 sec [0]
Running test open-direct-link.t 0 sec [0]
Running test open-direct-pick.t 0 sec [0]
Running test personality.t 0 sec [0]
Running test pipe-bug.t 0 sec [1]
Running test pipe-eof.t 0 sec [0]
Running test pipe-reuse.t 0 sec [0]
Running test poll.t 0 sec
Running test poll-cancel.t 0 sec [0]
Running test poll-cancel-all.t 0 sec [0]
Running test poll-cancel-ton.t 0 sec [0]
Running test poll-link.t 0 sec [0]
Running test poll-many.t 3 sec [2]
Running test poll-mshot-overflow.t 0 sec
Running test poll-mshot-update.t 6 sec [2]
Running test poll-race.t 1 sec [0]
Running test poll-race-mshot.t 1 sec
Running test poll-ring.t 0 sec [0]
Running test poll-v-poll.t 0 sec [1]
Running test pollfree.t 0 sec [0]
Running test probe.t 0 sec [0]
Running test read-before-exit.t 1 sec [1]
Running test read-write.t 1 sec [6]
Running test recv-msgall.t 0 sec [0]
Running test recv-msgall-stream.t 0 sec
Running test recv-multishot.t 0 sec
Running test reg-fd-only.t Skipped
Running test reg-hint.t 0 sec
Running test reg-reg-ring.t 0 sec
Running test regbuf-merge.t 0 sec [0]
Running test register-restrictions.t 0 sec [1]
Running test rename.t 0 sec [0]
Running test ringbuf-read.t 0 sec [0]
Running test ring-leak2.t 1 sec [1]
Running test ring-leak.t 0 sec [0]
Running test rsrc_tags.t 0 sec [12]
Running test rw_merge_test.t 0 sec [0]
Running test self.t 0 sec [0]
Running test sendmsg_fs_cve.t 0 sec [0]
Running test send_recv.t 0 sec [0]
Running test send_recvmsg.t 0 sec [1]
Running test send-zerocopy.t 24 sec
Running test shared-wq.t 0 sec [0]
Running test short-read.t 0 sec [0]
Running test shutdown.t 0 sec [0]
Running test sigfd-deadlock.t 0 sec [0]
Running test single-issuer.t 0 sec
Running test skip-cqe.t 0 sec
Running test socket.t 0 sec [0]
Running test socket-io-cmd.t Skipping tests. -ENOTSUP returned
Skipped
Running test socket-rw.t 0 sec [0]
Running test socket-rw-eagain.t 0 sec [0]
Running test socket-rw-offset.t 0 sec [0]
Running test splice.t 0 sec [0]
Running test sq-full.t 0 sec [0]
Running test sq-full-cpp.t 0 sec [0]
Running test sqpoll-cancel-hang.t Skipped
Running test sqpoll-disable-exit.t 0 sec [0]
Running test sq-poll-dup.t 1 sec [1]
Running test sqpoll-exit-hang.t 1 sec [1]
Running test sq-poll-kthread.t 2 sec [2]
Running test sq-poll-share.t 2 sec [5]
Running test sqpoll-sleep.t 0 sec [0]
Running test sq-space_left.t 0 sec [0]
Running test stdout.t This is a pipe test
This is a fixed pipe test
0 sec [0]
Running test submit-and-wait.t 1 sec [1]
Running test submit-link-fail.t 0 sec [0]
Running test submit-reuse.t 2 sec [2]
Running test symlink.t 0 sec [0]
Running test sync-cancel.t 0 sec
Running test teardowns.t 0 sec [0]
Running test thread-exit.t 0 sec [0]
Running test timeout.t 8 sec [6]
Running test timeout-new.t 3 sec [2]
Running test tty-write-dpoll.t 0 sec [0]
Running test unlink.t 0 sec [0]
Running test version.t 0 sec [0]
Running test wakeup-hang.t 2 sec [2]
Running test xattr.t 0 sec [0]
Running test statx.t 0 sec [0]
Running test sq-full-cpp.t 0 sec [0]
All tests passed
make[1]: 离开目录“/root/liburing/test”
[root@localhost liburing]# uname -a
Linux localhost.localdomain 6.4.0+ #30 SMP PREEMPT Thu Aug 10 19:22:54 CST 2023 loongarch64 loongarch64 loongarch64 GNU/Linux
io_uring暂无看护committer,挂起
最新的OLK-5.10已经修复
最新测试未出现卡死
登录 后才可以发表评论