【标题描述】22.03 LTS SP1 arm服务器内核定期crash重启
【环境信息】
硬件信息:
Huawei TaiShan 2280 V2
product: BC82AMDDA
网卡:HNS GE/10GE/25GE RDMA Network Controller
软件信息:
1) openEuler 22.03 (LTS-SP1)
2) 5.10.0-136.42.0.120.oe2203sp1.aarch64
3) hns3 hclge
如果有特殊组网,请提供网络拓扑图
【问题复现步骤】
服务器运行一段时间后自动重启,在 /var/crash 生成crash日志
[ 5888.946472] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018
[ 5888.946475] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018
[ 5888.946489] Mem abort info:
[ 5888.956902] Mem abort info:
[ 5888.967057] ESR = 0x96000007
[ 5888.970898] ESR = 0x96000007
[ 5888.974733] EC = 0x25: DABT (current EL), IL = 32 bits
[ 5888.978833] EC = 0x25: DABT (current EL), IL = 32 bits
[ 5888.982918] SET = 0, FnV = 0
[ 5888.989255] SET = 0, FnV = 0
[ 5888.995579] EA = 0, S1PTW = 0
[ 5888.999653] EA = 0, S1PTW = 0
[ 5889.003729] Data abort info:
[ 5889.007877] Data abort info:
[ 5889.012026] ISV = 0, ISS = 0x00000007
[ 5889.015915] ISV = 0, ISS = 0x00000007
[ 5889.019797] CM = 0, WnR = 0
[ 5889.024637] CM = 0, WnR = 0
[ 5889.029460] user pgtable: 4k pages, 48-bit VAs, pgdp=00002040e69c0000
[ 5889.033416] user pgtable: 4k pages, 48-bit VAs, pgdp=0000208045851000
[ 5889.037354] [0000000000000018] pgd=00002040e544a003
[ 5889.044754] [0000000000000018] pgd=000020803e145003
[ 5889.052145] , p4d=00002040e544a003
[ 5889.057976] , p4d=000020803e145003
[ 5889.063789] , pud=00002040e6850003
[ 5889.068126] , pud=000020803e0e0003
[ 5889.072449] , pmd=00002040e6bdc003
[ 5889.076760] , pmd=000020803e0e2003
[ 5889.081058] , pte=0000000000000000
[ 5889.085346] , pte=0000000000000000
[ 5889.089620]
[ 5889.093881]
[ 5889.098132] Internal error: Oops: 96000007 [#1] SMP
[ 5889.108497] Modules linked in: nbd xt_CHECKSUM nft_compat nft_counter nft_chain_nat nf_tables xt_set ipt_rpfilter xt_multiport iptable_raw iptable_mangle ip_set_hash_net ip_set_hash_ip ip_set ipip tunnel4 ip_tunnel veth geneve ip6_udp_tunnel udp_tunnel openvswitch nsh nf_conncount vhost_net vhost vhost_iotlb tap tun nf_conntrack_netlink nfnetlink xt_statistic xt_nat ipt_REJECT nf_reject_ipv4 xt_addrtype xt_conntrack ip6table_nat ip6_tables xt_MASQUERADE xt_comment iptable_filter xt_mark iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_tables rfkill overlay rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi ipmi_ssif sunrpc hns_roce_hw_v2 ib_uverbs acpi_ipmi hibmc_drm ses enclosure drm_vram_helper ib_core ipmi_si drm_ttm_helper sg ipmi_devintf ttm ipmi_msghandler hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu vfat hisi_uncore_pmu fat sch_fq_codel ext4 mbcache
[ 5889.108577] jbd2 sha512_generic sha512_arm64 aes_ce_ccm sd_mod t10_pi xts realtek ghash_ce hisi_sas_v3_hw sha2_ce hisi_sas_main sha256_arm64 hclge libsas hisi_sec2 ahci sha1_ce sbsa_gwdt hns3 libahci scsi_transport_sas hisi_qm libata hnae3 megaraid_sas uacce host_edma_drv authenc i2c_designware_platform i2c_designware_core br_netfilter bridge stp llc fuse aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
[ 5889.237915] CPU: 70 PID: 0 Comm: swapper/70 Kdump: loaded Not tainted 5.10.0-136.42.0.120.oe2203sp1.aarch64 #1
[ 5889.249313] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDDA, BIOS 1.05 09/18/2019
[ 5889.259018] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
[ 5889.266118] pc : hclge_ptp_get_rx_hwts+0x40/0x170 [hclge]
[ 5889.272612] lr : hclge_ptp_get_rx_hwts+0x34/0x170 [hclge]
[ 5889.279101] sp : ffff800012c3bc50
[ 5889.283516] x29: ffff800012c3bc50 x28: ffff2040002be040
[ 5889.289927] x27: ffff800009116484 x26: 0000000080007500
[ 5889.296333] x25: 0000000000000000 x24: ffff204001c6f000
[ 5889.302738] x23: ffff204144f53c00 x22: 0000000000000000
[ 5889.309134] x21: 0000000000000000 x20: ffff204004220080
[ 5889.315520] x19: ffff204144f53c00 x18: 0000000000000000
[ 5889.321897] x17: 0000000000000000 x16: 0000000000000000
[ 5889.328263] x15: 0000004000140ec8 x14: 0000000000000000
[ 5889.334617] x13: 0000000000000000 x12: 00000000010011df
[ 5889.340965] x11: bbfeff4d22000000 x10: 0000000000000000
[ 5889.347303] x9 : ffff800009402124 x8 : 0200f78811dfbb4d
[ 5889.353637] x7 : 2200000000191b01 x6 : ffff208002a7d480
[ 5889.359959] x5 : 0000000000000000 x4 : 0000000000000000
[ 5889.366271] x3 : 0000000000000000 x2 : 0000000000000000
[ 5889.372567] x1 : 0000000000000000 x0 : ffff20400095c080
[ 5889.378857] Call trace:
[ 5889.382285] hclge_ptp_get_rx_hwts+0x40/0x170 [hclge]
[ 5889.388304] hns3_handle_bdinfo+0x324/0x410 [hns3]
[ 5889.394055] hns3_handle_rx_bd+0x60/0x150 [hns3]
[ 5889.399624] hns3_clean_rx_ring+0x84/0x170 [hns3]
[ 5889.405270] hns3_nic_common_poll+0xa8/0x220 [hns3]
[ 5889.411084] napi_poll+0xcc/0x264
[ 5889.415329] net_rx_action+0xd4/0x21c
[ 5889.419911] __do_softirq+0x130/0x358
[ 5889.424484] irq_exit+0x134/0x154
[ 5889.428700] __handle_domain_irq+0x88/0xf0
[ 5889.433684] gic_handle_irq+0x78/0x2c0
[ 5889.438319] el1_irq+0xb8/0x140
[ 5889.442354] arch_cpu_idle+0x18/0x40
[ 5889.446816] default_idle_call+0x5c/0x1c0
[ 5889.451714] cpuidle_idle_call+0x174/0x1b0
[ 5889.456692] do_idle+0xc8/0x160
[ 5889.460717] cpu_startup_entry+0x30/0xfc
[ 5889.465523] secondary_start_kernel+0x158/0x1ec
[ 5889.470936] Code: 97ffab78 f9411c14 91408294 f9457284 (f9400c80)
[ 5889.477950] SMP: stopping secondary CPUs
[ 5890.514626] SMP: failed to stop secondary CPUs 0-69,71-95
[ 5890.522951] Starting crashdump kernel...
【预期结果】
不应该重启
【实际结果】
重启
【附件信息】
输入链接说明
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
请问能不能提供一下环境?
请问下你们是什么场景下出来的问题?有没修改过网卡hns3驱动的源码?
另外想问下,hns3网卡对端的设备,是不是有使能1588 v2功能?
能否将复现步骤及相关配置帮忙发一下?
登录 后才可以发表评论