402 Star 1.4K Fork 1.3K

GVPopenEuler / kernel

 / 详情

22.03 LTS SP1 arm服务器内核hns3/hclge BUG导致服务器频繁crash重启

已完成
缺陷
创建于  
2023-08-02 17:02

【标题描述】22.03 LTS SP1 arm服务器内核定期crash重启
【环境信息】
硬件信息:
Huawei TaiShan 2280 V2
product: BC82AMDDA
网卡:HNS GE/10GE/25GE RDMA Network Controller
软件信息:
1) openEuler 22.03 (LTS-SP1)
2) 5.10.0-136.42.0.120.oe2203sp1.aarch64
3) hns3 hclge

如果有特殊组网,请提供网络拓扑图
【问题复现步骤】
服务器运行一段时间后自动重启,在 /var/crash 生成crash日志
[ 5888.946472] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018
[ 5888.946475] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018
[ 5888.946489] Mem abort info:
[ 5888.956902] Mem abort info:
[ 5888.967057] ESR = 0x96000007
[ 5888.970898] ESR = 0x96000007
[ 5888.974733] EC = 0x25: DABT (current EL), IL = 32 bits
[ 5888.978833] EC = 0x25: DABT (current EL), IL = 32 bits
[ 5888.982918] SET = 0, FnV = 0
[ 5888.989255] SET = 0, FnV = 0
[ 5888.995579] EA = 0, S1PTW = 0
[ 5888.999653] EA = 0, S1PTW = 0
[ 5889.003729] Data abort info:
[ 5889.007877] Data abort info:
[ 5889.012026] ISV = 0, ISS = 0x00000007
[ 5889.015915] ISV = 0, ISS = 0x00000007
[ 5889.019797] CM = 0, WnR = 0
[ 5889.024637] CM = 0, WnR = 0
[ 5889.029460] user pgtable: 4k pages, 48-bit VAs, pgdp=00002040e69c0000
[ 5889.033416] user pgtable: 4k pages, 48-bit VAs, pgdp=0000208045851000
[ 5889.037354] [0000000000000018] pgd=00002040e544a003
[ 5889.044754] [0000000000000018] pgd=000020803e145003
[ 5889.052145] , p4d=00002040e544a003
[ 5889.057976] , p4d=000020803e145003
[ 5889.063789] , pud=00002040e6850003
[ 5889.068126] , pud=000020803e0e0003
[ 5889.072449] , pmd=00002040e6bdc003
[ 5889.076760] , pmd=000020803e0e2003
[ 5889.081058] , pte=0000000000000000
[ 5889.085346] , pte=0000000000000000
[ 5889.089620]
[ 5889.093881]
[ 5889.098132] Internal error: Oops: 96000007 [#1] SMP
[ 5889.108497] Modules linked in: nbd xt_CHECKSUM nft_compat nft_counter nft_chain_nat nf_tables xt_set ipt_rpfilter xt_multiport iptable_raw iptable_mangle ip_set_hash_net ip_set_hash_ip ip_set ipip tunnel4 ip_tunnel veth geneve ip6_udp_tunnel udp_tunnel openvswitch nsh nf_conncount vhost_net vhost vhost_iotlb tap tun nf_conntrack_netlink nfnetlink xt_statistic xt_nat ipt_REJECT nf_reject_ipv4 xt_addrtype xt_conntrack ip6table_nat ip6_tables xt_MASQUERADE xt_comment iptable_filter xt_mark iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_tables rfkill overlay rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi ipmi_ssif sunrpc hns_roce_hw_v2 ib_uverbs acpi_ipmi hibmc_drm ses enclosure drm_vram_helper ib_core ipmi_si drm_ttm_helper sg ipmi_devintf ttm ipmi_msghandler hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu vfat hisi_uncore_pmu fat sch_fq_codel ext4 mbcache
[ 5889.108577] jbd2 sha512_generic sha512_arm64 aes_ce_ccm sd_mod t10_pi xts realtek ghash_ce hisi_sas_v3_hw sha2_ce hisi_sas_main sha256_arm64 hclge libsas hisi_sec2 ahci sha1_ce sbsa_gwdt hns3 libahci scsi_transport_sas hisi_qm libata hnae3 megaraid_sas uacce host_edma_drv authenc i2c_designware_platform i2c_designware_core br_netfilter bridge stp llc fuse aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
[ 5889.237915] CPU: 70 PID: 0 Comm: swapper/70 Kdump: loaded Not tainted 5.10.0-136.42.0.120.oe2203sp1.aarch64 #1
[ 5889.249313] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDDA, BIOS 1.05 09/18/2019
[ 5889.259018] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
[ 5889.266118] pc : hclge_ptp_get_rx_hwts+0x40/0x170 [hclge]
[ 5889.272612] lr : hclge_ptp_get_rx_hwts+0x34/0x170 [hclge]
[ 5889.279101] sp : ffff800012c3bc50
[ 5889.283516] x29: ffff800012c3bc50 x28: ffff2040002be040
[ 5889.289927] x27: ffff800009116484 x26: 0000000080007500
[ 5889.296333] x25: 0000000000000000 x24: ffff204001c6f000
[ 5889.302738] x23: ffff204144f53c00 x22: 0000000000000000
[ 5889.309134] x21: 0000000000000000 x20: ffff204004220080
[ 5889.315520] x19: ffff204144f53c00 x18: 0000000000000000
[ 5889.321897] x17: 0000000000000000 x16: 0000000000000000
[ 5889.328263] x15: 0000004000140ec8 x14: 0000000000000000
[ 5889.334617] x13: 0000000000000000 x12: 00000000010011df
[ 5889.340965] x11: bbfeff4d22000000 x10: 0000000000000000
[ 5889.347303] x9 : ffff800009402124 x8 : 0200f78811dfbb4d
[ 5889.353637] x7 : 2200000000191b01 x6 : ffff208002a7d480
[ 5889.359959] x5 : 0000000000000000 x4 : 0000000000000000
[ 5889.366271] x3 : 0000000000000000 x2 : 0000000000000000
[ 5889.372567] x1 : 0000000000000000 x0 : ffff20400095c080
[ 5889.378857] Call trace:
[ 5889.382285] hclge_ptp_get_rx_hwts+0x40/0x170 [hclge]
[ 5889.388304] hns3_handle_bdinfo+0x324/0x410 [hns3]
[ 5889.394055] hns3_handle_rx_bd+0x60/0x150 [hns3]
[ 5889.399624] hns3_clean_rx_ring+0x84/0x170 [hns3]
[ 5889.405270] hns3_nic_common_poll+0xa8/0x220 [hns3]
[ 5889.411084] napi_poll+0xcc/0x264
[ 5889.415329] net_rx_action+0xd4/0x21c
[ 5889.419911] __do_softirq+0x130/0x358
[ 5889.424484] irq_exit+0x134/0x154
[ 5889.428700] __handle_domain_irq+0x88/0xf0
[ 5889.433684] gic_handle_irq+0x78/0x2c0
[ 5889.438319] el1_irq+0xb8/0x140
[ 5889.442354] arch_cpu_idle+0x18/0x40
[ 5889.446816] default_idle_call+0x5c/0x1c0
[ 5889.451714] cpuidle_idle_call+0x174/0x1b0
[ 5889.456692] do_idle+0xc8/0x160
[ 5889.460717] cpu_startup_entry+0x30/0xfc
[ 5889.465523] secondary_start_kernel+0x158/0x1ec
[ 5889.470936] Code: 97ffab78 f9411c14 91408294 f9457284 (f9400c80)
[ 5889.477950] SMP: stopping secondary CPUs
[ 5890.514626] SMP: failed to stop secondary CPUs 0-69,71-95
[ 5890.522951] Starting crashdump kernel...

【预期结果】
不应该重启

【实际结果】
重启

【附件信息】
输入链接说明

评论 (4)

yunion 创建了缺陷

Hi yunionio_admin, welcome to the openEuler Community.
I'm the Bot here serving you. You can find the instructions on how to interact with me at Here.
If you have any questions, please contact the SIG: Kernel, and any of the maintainers.

openeuler-ci-bot 添加了
 
sig/Kernel
标签

请问能不能提供一下环境?

请问下你们是什么场景下出来的问题?有没修改过网卡hns3驱动的源码?
另外想问下,hns3网卡对端的设备,是不是有使能1588 v2功能?
能否将复现步骤及相关配置帮忙发一下?

zhangjialin 任务状态待办的 修改为已完成

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(5)
5329419 openeuler ci bot 1632792936
C
1
https://gitee.com/openeuler/kernel.git
git@gitee.com:openeuler/kernel.git
openeuler
kernel
kernel

搜索帮助