401 Star 1.4K Fork 1.3K

GVPopenEuler / kernel

 / 详情

[OLK-5.10] HARDLOCKUP_DETECTOR_PERF is broken

已完成
任务 成员
创建于  
2021-11-25 19:03

【标题描述】能够简要描述问题:说明什么场景下,做了什么操作,出现什么问题(尽量使用正向表达方式)
不能正常使能pmu based nmi_watchdog
【环境信息】
硬件信息:
1) 裸机场景提供出问题的硬件信息;
2) 虚机场景提供虚机XML文件或者配置信息
软件信息:
1) OS版本及分支
2) 内核信息
3) 发现问题的组件版本信息
如果有特殊组网,请提供网络拓扑图
【问题复现步骤】
具体操作步骤
出现概率(是否必现,概率性错误)
【预期结果】
描述预期结果,可以通过对比新老版本获取
【实际结果】
描述出问题的结果
【附件信息】
比如系统message日志/组件日志、dump信息、图片等

评论 (4)

Wei Li 创建了缺陷

Hi stkid, welcome to the openEuler Community.
I'm the Bot here serving you. You can find the instructions on how to interact with me at
https://gitee.com/openeuler/community/blob/master/en/sig-infrastructure/command.md.
If you have any questions, please contact the SIG: Kernel, and any of the maintainers: @Xie XiuQi, @YangYingliang, @成坚 (CHENG Jian).

openeuler-ci-bot 添加了
 
sig/Kernel
标签

60565144df0a ("init: only move down lockup_detector_init() when sdei_watchdog is enabled")
is to fix a 'BUG'. While on ARM64, armv8_pmu_driver_init() is called in do_basic_setup(), it will
fail to create perf event if lockup_detector_init() is moved back. So revert the patch firstly.

Then let's fix the original issue.
When enabling CONFIG_DEBUG_PREEMPT and CONFIG_PREEMPT, it triggers a 'BUG'
in the pmu based nmi_watchdog initializaion:

[ 3.341853] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
[ 3.344392] caller is debug_smp_processor_id+0x17/0x20
[ 3.344395] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.10.0+ #398
[ 3.344397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[ 3.344399] Call Trace:
[ 3.344410] dump_stack+0x60/0x76
[ 3.344412] check_preemption_disabled+0xba/0xc0
[ 3.344415] debug_smp_processor_id+0x17/0x20
[ 3.344422] hardlockup_detector_event_create+0xf/0x60
[ 3.344427] hardlockup_detector_perf_init+0xf/0x41
[ 3.344430] watchdog_nmi_probe+0xe/0x10
[ 3.344432] lockup_detector_init+0x22/0x5b
[ 3.344437] kernel_init_freeable+0x20c/0x245
[ 3.344439] ? rest_init+0xd0/0xd0
[ 3.344441] kernel_init+0xe/0x110
[ 3.344446] ret_from_fork+0x22/0x30

This issue was introduced by commit a79050434b45, which move down
lockup_detector_init() after do_basic_setup(), after sched_init_smp() too.

  hardlockup_detector_event_create
    |- hardlockup_detector_perf_init	(unsafe)
      |- watchdog_nmi_probe
        |- lockup_detector_init
    |- hardlockup_detector_perf_enable
      |- watchdog_nmi_enable
        |- watchdog_enable
          |- lockup_detector_online_cpu
          |- softlockup_start_fn
            |- softlockup_start_all
              |- lockup_detector_reconfigure
                |- lockup_detector_setup
                  |- lockup_detector_init

After analysing the calling context, it's only unsafe to use
smp_processor_id() in hardlockup_detector_perf_init() as the thread
'kernel_init' is preemptible after sched_init_smp().

While it is just a test if we can enable the pmu based nmi_watchdog, the
real enabling process is in softlockup_start_fn() later which ensures
that watchdog_enable() is called on all cores. So it's free to disable
preempt to fix this 'BUG'.

Wei Li 修改了描述
Wei Li 任务状态待办的 修改为修复中
wupeng 任务类型缺陷 修改为任务
wupeng 任务状态修复中 修改为待办的
zhengzengkai 通过src-openeuler/kernel Pull Request !418任务状态待办的 修改为已完成

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(3)
5329419 openeuler ci bot 1632792936 5625574 stkid 1587900794
C
1
https://gitee.com/openeuler/kernel.git
git@gitee.com:openeuler/kernel.git
openeuler
kernel
kernel

搜索帮助