99 Star 55 Fork 226

src-openEuler / kernel

 / 详情

【arm】ltp用例tcp6-multi-diffip11执行失败产生softlockup问题

已完成
缺陷
创建于  
2022-08-01 20:22

【标题描述】2209版本内核执行ltp用例(tcp6-multi-diffip11),触发soft lockup,系统重启
【环境信息】
5.10.0-104.0.0.54.oe2209.aarch64
【问题复现步骤】
部署ltp执行环境
本端执行sh -x /opt/ltp/testcases/bin/tcp6-multi-diffip11

【预期结果】
用例执行通过
【实际结果】
用例概率性失败,大约十几次出现一次softlockup问题
系统soft lockup重启,调用栈信息如下(完整vmcore-dmesg见附件):

[ 6008.989193] watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [ns-tcpserver:81709]
[ 6008.989841] Modules linked in: esp6 ah4 authenc echainiv esp4 twofish_generic twofish_common camellia_generic serpent_generic blowfish_generic blowfish_common cast5_generic cast_common des_generic libdes rmd160 sha512_generic sha512_arm64 af_key rfkill sunrpc sg nls_cp437 vfat fat sch_fq_codel fuse ext4 mbcache jbd2 sd_mod t10_pi ghash_ce sha2_ce virtio_gpu virtio_dma_buf virtio_scsi virtio_net net_failover virtio_console failover sha256_arm64 sha1_ce virtio_mmio virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
[ 6008.989903] CPU: 3 PID: 81709 Comm: ns-tcpserver Kdump: loaded Tainted: G OE 5.10.0-104.0.0.54.oe2209.aarch64 #1
[ 6008.989904] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 6008.989907] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
[ 6008.989915] pc : des3_ede_encrypt+0x16c/0x460 [libdes]
[ 6008.989917] lr : 0x12
[ 6008.989918] sp : ffff80001477a760
[ 6008.989919] x29: ffff80001477a760 x28: ffff0000daa59a00
[ 6008.989921] x27: ffff0000e1bca9a8 x26: ffff0000e1bca980
[ 6008.989924] x25: ffff80001477a820 x24: 0000000000000039
[ 6008.989926] x23: 0000000000000001 x22: 000000000000002b
[ 6008.989928] x21: 0000000000000038 x20: 0000000080100020
[ 6008.989930] x19: 0000000000000018 x18: 0000000000000019
[ 6008.989932] x17: ffff0000c336c100 x16: ffff800009083460
[ 6008.989934] x15: ffff800009083260 x14: ffff0000c336c080
[ 6008.989936] x13: ffff800009083360 x12: ffff800009083560
[ 6008.989938] x11: ffff800009083860 x10: ffff800009083660
[ 6008.989940] x9 : ffff800009083760 x8 : ffff800009083960
[ 6008.989942] x7 : 00000000b0b86b24 x6 : ffff800009083060
[ 6008.989944] x5 : 0000000000000024 x4 : 00000000dbb85664
[ 6008.989946] x3 : 00000000bbd5412b x2 : ffff0000c336c0b0
[ 6008.989948] x1 : ffff0000c1efc168 x0 : 000000000000003c
[ 6008.989951] Call trace:
[ 6008.989955] des3_ede_encrypt+0x16c/0x460 [libdes]
[ 6008.989958] crypto_des3_ede_encrypt+0x1c/0x30 [des_generic]
[ 6008.989965] crypto_cbc_encrypt_inplace+0x78/0xc0
[ 6008.989967] crypto_cbc_encrypt+0x80/0xc0
[ 6008.989971] crypto_skcipher_encrypt+0x2c/0x40
[ 6008.989975] crypto_authenc_encrypt+0xc8/0xfc [authenc]
[ 6008.989977] crypto_aead_encrypt+0x2c/0x40
[ 6008.989979] echainiv_encrypt+0x144/0x1a0 [echainiv]
[ 6008.989981] crypto_aead_encrypt+0x2c/0x40
[ 6008.989984] esp6_output_tail+0x348/0x5cc [esp6]
[ 6008.989986] esp6_output+0x120/0x194 [esp6]
[ 6008.989990] xfrm_output_one+0x25c/0x4d4
[ 6008.989991] xfrm_output_resume+0x6c/0x1fc
[ 6008.989992] xfrm_output+0xac/0x3c0
[ 6008.989996] __xfrm6_output+0x118/0x270
[ 6008.989997] xfrm6_output+0x54/0xfc
[ 6008.990000] ip6_xmit+0x2dc/0x5a4
[ 6008.990002] inet6_csk_xmit+0x9c/0xfc
[ 6008.990005] __tcp_transmit_skb+0x47c/0x79c
[ 6008.990006] tcp_write_xmit+0x258/0x690
[ 6008.990008] __tcp_push_pending_frames+0x44/0x104
[ 6008.990010] tcp_rcv_established+0x604/0x694
[ 6008.990013] tcp_v6_do_rcv+0x2e4/0x450
[ 6008.990015] tcp_v6_rcv+0xb90/0x1040
[ 6008.990016] ip6_protocol_deliver_rcu+0x190/0x4c0
[ 6008.990018] ip6_input+0x54/0xf0
[ 6008.990019] ip6_rcv_finish+0x80/0xa0
[ 6008.990021] xfrm_trans_reinject+0xb8/0xf0
[ 6008.990026] tasklet_action_common.constprop.0+0x194/0x1b4
[ 6008.990028] tasklet_action+0x30/0x3c
[ 6008.990030] __do_softirq+0x130/0x358
[ 6008.990032] do_softirq.part.0+0x8c/0x90
[ 6008.990034] __local_bh_enable_ip+0xa4/0xb0
[ 6008.990035] ip6_finish_output2+0x26c/0x6d0
[ 6008.990037] __ip6_finish_output.part.0+0xb8/0x1b0
[ 6008.990038] ip6_finish_output+0xec/0x12c
[ 6008.990040] ip6_output+0x78/0x170
[ 6008.990041] xfrm_output_resume+0x1ec/0x1fc
[ 6008.990042] xfrm_output+0xac/0x3c0
[ 6008.990044] __xfrm6_output+0x118/0x270
[ 6008.990046] xfrm6_output+0x54/0xfc
[ 6008.990047] ip6_xmit+0x2dc/0x5a4
[ 6008.990049] inet6_csk_xmit+0x9c/0xfc
[ 6008.990050] __tcp_transmit_skb+0x47c/0x79c
[ 6008.990051] tcp_write_xmit+0x258/0x690
[ 6008.990053] __tcp_push_pending_frames+0x44/0x104
[ 6008.990056] tcp_push+0xe8/0x140
[ 6008.990057] tcp_sendmsg_locked+0xb98/0xca0
[ 6008.990059] tcp_sendmsg+0x40/0x70
[ 6008.990060] inet6_sendmsg+0x4c/0x80
[ 6008.990065] sock_sendmsg+0x48/0x70
[ 6008.990067] __sys_sendto+0x120/0x14c
[ 6008.990068] __arm64_sys_sendto+0x30/0x40
[ 6008.990071] el0_svc_common.constprop.0+0x7c/0x1bc
[ 6008.990073] do_el0_svc+0x2c/0x94
[ 6008.990076] el0_svc+0x20/0x30
[ 6008.990078] el0_sync_handler+0xb0/0xb4
[ 6008.990079] el0_sync+0x160/0x180
[ 6008.990082] Kernel panic - not syncing: softlockup: hung tasks

附件
robertxw 2022-08-01 20:24

评论 (4)

robertxw 创建了任务

Hi robertxw, welcome to the openEuler Community.
I'm the Bot here serving you. You can find the instructions on how to interact with me at Here.
If you have any questions, please contact the SIG: Kernel, and any of the maintainers: @Xie XiuQi , @YangYingliang , @成坚 (CHENG Jian) , @jiaoff , @AlexGuo , @hanjun-guo , @woqidaideshi , @zhengzengkai , @Jackie Liu , @Zhang Yi , @colyli , @ThunderTown , @htforge , @Chiqijun , @冷嘲啊 , @zhujianwei001 , @kylin-mayukun , @wangxiongfeng , @Kefeng , @SuperSix173 , @WangShaoBo , @Zheng Zucheng , @lujialin , @陈结松 , @刘恺 , @whoisxxx , @wuxu_buque , @koulihong , @柳歆 , @朱科潜 , @Xu Kuohai , @Lingmingqiang , @juntian , @OSSIM , @岳海兵 , @郑振鹏 , @刘勇强 , @yuzenghui

openeuler-ci-bot 添加了
 
sig/Kernel
标签
robertxw 上传了附件vmcore-dmesg.txt
robertxw 里程碑设置为openEuler-22.09-Kernel
robertxw 负责人设置为岳海兵

以下用例存在相同问题,均为压测场景下使用ipsec加密网络通信后,出现softlockup问题
tcp4-multi-diffip11
tcp4-multi-diffport02
tcp6-multi-diffip04
tcp6-multi-diffip11
tcp6-multi-sameport11

robertxw 修改了描述
robertxw 任务类型任务 修改为缺陷
陈亚强 负责人岳海兵 修改为zhengzengkai
  1. tcp6-multi-diffip11用例中,send调用总是以socket的sendbuf大小发送数据(我复现环境这个值是1638400 ), IPSec ESP/transport模式。并且协议栈中处理的skb大小是1500.(不知道为啥没有skb聚合? 按1500算,在tcp_sendmsg这就有1638400/1500的循环)
  2. 这个用例中有100个如1中所述的socket发包发送数据做压力测试。 xfrm_trans_reinject 被触发是通常trans->queue 的长度64-2000。
  3. xfrm_trans_reinject 中处理函数是ip6_rcv_finish。 这个函数里面最终会执行到__tcp_push_pending_frames 去触发另外的99个socket发送 发送队列 的数据。
  4. 在我测试环境中,__local_bh_enable_ip (tasklet的执行点) 有事会耗时1s左右。

通过bcc工具抓取,软中断信息:
./softirqs -NT 10
Tracing soft irq event time... Hit Ctrl-C to end.

15:34:34
SOFTIRQ TOTAL_nsecs
block 158990
timer 20030920
sched 46577080
net_rx 676746820
tasklet 9906067650

15:34:45
SOFTIRQ TOTAL_nsecs
block 86100
sched 38849790
net_rx 676532470
timer 1163848790
tasklet 9409019620

15:34:55
SOFTIRQ TOTAL_nsecs
sched 58078450
net_rx 475156720
timer 533832410
tasklet 9431333300
tasklet消耗太长时间。

按照https://patchwork.kernel.org/project/netdevbpf/patch/20220924080157.247678-1-liujian56@huawei.com/中的patch测试,softlockup问题不在出现。这也表明send流程出现softlockup是因为整个send流程太长(本身的tcp_sendmsg循环,中间穿插的tasklet等),另外虚拟机应该比物理机器慢等等导致tcp6-multi-diffip11出现softlockup现象。

另外linux主线版本同样有如此现象,在https://patchwork.kernel.org/project/netdevbpf/patch/20220924080157.247678-1-liujian56@huawei.com/跟踪下

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(4)
5329419 openeuler ci bot 1632792936 7594801 robertxw 1637821312
1
https://gitee.com/src-openeuler/kernel.git
git@gitee.com:src-openeuler/kernel.git
src-openeuler
kernel
kernel

搜索帮助