399 Star 1.4K Fork 1.3K

GVPopenEuler / kernel

 / 详情

请合入intel uncore模块最新的系列补丁,以解决由于SPR CPU MCC上的UPI discovery tables损坏导致出现warning打印

已完成
缺陷
创建于  
2023-06-29 15:59

【标题描述】搭载Intel SPR 6460C CPU的服务器运行openEuler 22.03SP1出现WARNING: CPU: 65 PID: 1 at arch/x86/events/intel/uncore_discovery.c:184 intel_uncore_has_discovery_tables+0x4c0/0x65告警

【环境信息】
硬件信息:
1)搭载Intel SPR 6460C CPU的服务器
软件信息:
1) openEuler 22.03SP1
2) 5.10.0-136.12.0.86.oe2203sp1.x86_64
3) intel_uncore模块

【问题复现步骤】
1、服务器安装openEuler 22.03SP1
2、安装完成后重启查看dmesg日志

【预期结果】
dmesg没有WARNING: CPU: 65 PID: 1 at arch/x86/events/intel/uncore_discovery.c:184打印

【实际结果】
有WARNING: CPU: 65 PID: 1 at arch/x86/events/intel/uncore_discovery.c:184打印

【附件信息】
[ 10.984466] ------------[ cut here ]------------
[ 10.984478] WARNING: CPU: 22 PID: 1942 at arch/x86/events/intel/uncore_discovery.c:184 uncore_insert_box_info+0x11d/0x200 [intel_uncore]
[ 10.984478] Modules linked in: intel_uncore(+) acpi_pad(-) isst_if_common idxd_bus sg joydev fjes(-) wmi acpi_cpufreq vfat fat drm fuse ksecurec(O) ext4 mbcache jbd2 sd_mod t10_pi crct10dif_pclmul igb ahci crc32_pclmul i2c_algo_bit libahci crc32c_intel ghash_clmulni_intel libata dca dm_mirror dm_region_hash dm_log dm_mod
[ 10.984494] CPU: 22 PID: 1942 Comm: systemd-udevd Tainted: G O 5.10.0-60.18.0.50.r509_2.hce2.x86_64 #1
[ 10.984495] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CM15MBCA, BIOS 2.00.13 04/24/2023
[ 10.984500] RIP: 0010:uncore_insert_box_info+0x11d/0x200 [intel_uncore]
[ 10.984502] Code: 83 c2 01 48 83 c0 04 39 d1 0f 8e d3 00 00 00 48 8b 4b 38 8b 0c 01 41 89 4c 05 00 48 8b 73 40 8b 34 06 41 89 34 06 39 f9 75 d2 <0f> 0b 4c 89 ef e8 59 1b f5 e2 5b 4c 89 f7 5d 41 5c 41 5d 41 5e e9
[ 10.984503] RSP: 0018:ff5ce658ca2e7cd8 EFLAGS: 00010246
[ 10.984505] RAX: 0000000000000008 RBX: ff2e3f930ef08de0 RCX: 0000000000000003
[ 10.984505] RDX: 0000000000000002 RSI: 0000000000018000 RDI: 0000000000000003
[ 10.984506] RBP: 0000000000000000 R08: 0000000000000010 R09: 0000000000000000
[ 10.984507] R10: ff5ce658c8bcc000 R11: ff5ce658ca2e7c54 R12: ff5ce658ca2e7d28
[ 10.984507] R13: ff2e3f930f14ae60 R14: ff2e3f930f14ae50 R15: 0000000000000000
[ 10.984508] FS: 00007f2afa3e0b40(0000) GS:ff2e3f9a5f580000(0000) knlGS:0000000000000000
[ 10.984509] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 10.984510] CR2: 0000559f4a882828 CR3: 00000008929d4001 CR4: 0000000000771ee0
[ 10.984511] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 10.984512] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 10.984512] PKRU: 55555554
[ 10.984513] Call Trace:
[ 10.984521] parse_discovery_table.isra.0+0x171/0x1b0 [intel_uncore]
[ 10.984525] intel_uncore_has_discovery_tables+0x16e/0x1d0 [intel_uncore]
[ 10.984529] ? type_pmu_register+0x38/0x38 [intel_uncore]
[ 10.984533] intel_uncore_init+0x83/0xccc [intel_uncore]
[ 10.984536] ? type_pmu_register+0x38/0x38 [intel_uncore]
[ 10.984539] do_one_initcall+0x41/0x1d0
[ 10.984543] ? kmem_cache_alloc_trace+0x34/0x410
[ 10.984545] do_init_module+0x4c/0x240
[ 10.984547] __se_sys_init_module+0x143/0x1c0
[ 10.984550] do_syscall_64+0x30/0x40
[ 10.984553] entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 10.984554] RIP: 0033:0x7f2aface24de
[ 10.984555] Code: 48 8b 0d 3d 29 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0a 29 0e 00 f7 d8 64 89 01 48
[ 10.984556] RSP: 002b:00007ffedfc33208 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[ 10.984557] RAX: ffffffffffffffda RBX: 0000559f49ffd050 RCX: 00007f2aface24de
[ 10.984557] RDX: 00007f2afae3da9d RSI: 000000000005f268 RDI: 0000559f4a8235c0
[ 10.984557] RBP: 0000559f4a8235c0 R08: 0000559f49fda1e0 R09: 00007ffedfc32318
[ 10.984558] R10: 0000000000000005 R11: 0000000000000246 R12: 00007f2afae3da9d
[ 10.984558] R13: 000000000000000a R14: 0000559f49fc7260 R15: 0000559f49ffd050
[ 10.984559] ---[ end trace c3d8c21ba271e78f ]---

【建议】经验证,合入intel uncore最新系列补丁可解决该问题,当前linux kernel upstream已合入该系列补丁,补丁链接:https://lore.kernel.org/lkml/20221129191023.936738-1-kan.liang@linux.intel.com/#r

评论 (5)

chenrongwen 创建了缺陷
openeuler-ci-bot 添加了
 
sig/Kernel
标签

It's a known issue for SPR MCC. The warning is triggered by the broken UPI discovery table in BIOS, and mainline kernel has fixed it with commits below from kernel v6.3:
5d515ee40cb5 perf/x86/uncore: Don't WARN_ON_ONCE() for a broken discovery table
65248a9a9ee1 perf/x86/uncore: Add a quirk for UPI on SPR
bd9514a4d5ec perf/x86/uncore: Ignore broken units in discovery table
3af548f23610 perf/x86/uncore: Fix potential NULL pointer in uncore_get_alias_name
dbf061b26221 perf/x86/uncore: Factor out uncore_device_to_die()

At Intel side we had also verified the fix works on SPR MCC.

Since openEuler 22.03 SP1 and SP2 have already been released, we will do the backport and create PR against OLK-5.10. Is it ok?
If the fix is needed for 22.03 SP1/SP2, you can cherry pick it from OLK-5.10.

Ok, please backport the relevant patches into the OLK-5.10 kernel, thank you.

PR's available for more than 12 days. Could you help to do the review and merge? Thank you. @chenrongwen

zhangjialin 任务状态待办的 修改为修复中
yunyings 通过openeuler/kernel Pull Request !1315任务状态修复中 修改为已完成
zhangjialin 添加了
 
issue_resolved
标签

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(3)
5329419 openeuler ci bot 1632792936
C
1
https://gitee.com/openeuler/kernel.git
git@gitee.com:openeuler/kernel.git
openeuler
kernel
kernel

搜索帮助

53164aa7 5694891 3bd8fe86 5694891