【标题描述】能够简要描述问题:说明什么场景下,做了什么操作,出现什么问题(尽量使用正向表达方式)
一、缺陷信息
【20.03-LTS-SP1~SP4】4.19 kernel加载并卸载vkms模块即可导致系统崩溃重启
涉及问题的内核版本:测试了kernel-4.19.90-2402.4.0.0238.oe1.x86_64到kernel-4.19.90-2403.1.0.0241.oe1.x86_64均存在该问题
内核信息:
kernel-4.19.90-2402.4.0.0238.oe1.x86_64
缺陷归属组件:
kernel
缺陷归属的版本:
kernel-4.19.90-2402.4.0.0238.oe1.x86_64到kernel-4.19.90-2403.1.0.0241.oe1.x86_64
缺陷简述:
安装系统后执行命令:
系统就会崩溃重启
【环境信息】
硬件信息
任意启动虚拟机,安装20.03-LTS-SP1
软件信息
发现问题是安装2月以后版本:20.03-LTS-SP1
kernel版本:kernel-4.19.90-2402.4.0.0238.oe1.x86_64到kernel-4.19.90-2403.1.0.0241.oe1.x86_64
网络信息
-无
【问题复现步骤】,请描述具体的操作步骤
安装系统后执行命令:
【实际结果】
系统崩溃并重启
【其他相关附件信息】
比如系统message日志/组件日志、dump信息、图片等
缺陷详情参考链接:
缺陷分析指导链接:
初步分析是:CVE-2023-51043的修改引入的问题,在5.10内核(22.03 LTS SP1)上也修复了该CVE,但是没有发现该问题。原始社区补丁是修复的v6.5~v6.8上的问题,所以4.19可能有前置补丁未打上导致该问题。
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
问题分析总结
1、为什么会crash?
先看crash调用栈,
vkms_exit
drm_dev_put
drm_dev_release
dev->driver->release(dev) ===== vkms_release
drm_atomic_helper_shutdown
__drm_atomic_helper_disable_all
alloc
init-----》
get
drm_atomic_state_put
__drm_atomic_state_free
drm_dev_put
drm_dev_release
dev->driver->release(dev) ===== vkms_release
platform_device_unregister
platform_device_del
device_del
dpm_sysfs_remove
sysfs_unmerge_group
kernfs_find_and_get_ns
kernfs_find_ns --->dev->kobj->sd为空指针了
直接原因是dev->kobj->sd指针是个空指针,kernfs_find_ns中访问sd->flags,系统crash。
下面分析sd空指针的来源:
1)vkms模块的退出函数vkms_exit中,调用drm_dev_put,这个函数很关键,代码逻辑如下:
void drm_dev_put(struct drm_device *dev)
{
if (dev)
kref_put(&dev->ref, drm_dev_release);
}
kref_put函数实现很简单,先将dev->ref减1,如果减后的结果是0,则调用drm_dev_release。
由于vkms模块初始化是,调用了drm_dev_init,将dev->ref设置为1了,因此drm_dev_put调用后,就会调用drm_dev_release。
drm_dev_release随即调用vkms驱动的vkms_release。
2)vkms_release函数首先调用platform_device_unregister -》platform_device_del -》device_del-》kobject_del-》sysfs_put,该函数释放了kobj->sd.(非常关键)
3)在vkms_release后来的调用关系中,__drm_atomic_helper_disable_all函数很关键:
它首先调用drm_atomic_state_alloc -》 drm_atomic_state_init,在drm_atomic_state_init函数中会对dev->ref参数加1,此时dev->ref变成了1,drm_atomic_state_alloc返回到__drm_atomic_helper_disable_all。
__drm_atomic_helper_disable_all函数继续执行drm_atomic_state_put,这个函数最终又会调用到drm_dev_put,此时dev->ref是1,再次调用vkms_release。
4)当第二次vkms_release释放时,会有最开始crash的调用栈,最终调用到kernfs_find_ns访问dev->kobj->sd空指针。
2、为什么原来没问题,合入了CVE-2023-51043补丁后就有问题?
1)原来的流程分析
原流程中,调用到__drm_atomic_helper_disable_all-》drm_atomic_state_alloc -》 drm_atomic_state_init时,不会再对dev->ref参数加1,因此不会二次调用到vkms_release,也即不会再次访问已经释放的dev->kobj->sd,所以没问题。
2)合入CVE-2023-51043补丁后,drm_atomic_state_init会对dev->ref参数加1,会二次调用到vkms_release,最终调用到kernfs_find_ns访问dev->kobj->sd空指针
非常感谢报告该问题。
能否发一下原始的调用栈日志吗?
我在qemu上insmod vkms.ko再rmmod vkms.ko,出现了如下的栈,但是还没崩溃。想确认下是否是同一个栈。
root@buildroot:~# insmod vkms.ko
[ 28.132658] vkms: module verification failed: signature and/or required key missing - tainting kernel
[ 28.177024] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 28.177508] [drm] Driver supports precise vblank timestamp query.
[ 28.198651] [drm] Initialized vkms 1.0.0 20180514 for virtual device on minor 0
root@buildroot:~# lsmod
Module Size Used by Tainted: G
vkms 28672 0
root@buildroot:~# rmmod vkms.ko
[ 34.573093] ------------[ cut here ]------------
[ 34.573747] refcount_t: increment on 0; use-after-free.
[ 34.576323] WARNING: CPU: 0 PID: 154 at ../lib/refcount.c:156 refcount_inc_checked+0x5c/0x80
[ 34.577169] Modules linked in: vkms(E-)
[ 34.578614] CPU: 0 PID: 154 Comm: rmmod Tainted: G E 4.19.90+ #70
[ 34.578981] Hardware name: linux,dummy-virt (DT)
[ 34.579475] pstate: 60000005 (nZCv daif -PAN -UAO)
[ 34.580050] pc : refcount_inc_checked+0x5c/0x80
[ 34.580294] lr : refcount_inc_checked+0x5c/0x80
[ 34.580543] sp : ffff8000c818fb20
[ 34.580749] x29: ffff8000c818fb20 x28: ffff8000c6f2e000
[ 34.581098] x27: 0000000000000000 x26: 0000000000000000
[ 34.581379] x25: ffff2000028f5418 x24: ffff20000db37000
[ 34.581664] x23: 1fffe4000051ea18 x22: ffff20000ca2c000
[ 34.581945] x21: ffff8000c7d75d00 x20: ffff20000d8c7434
[ 34.582224] x19: 0000000000000000 x18: 0000000000000000
[ 34.582523] x17: 0000000000000000 x16: 0000000000000000
[ 34.582818] x15: 0000000000000000 x14: ffff20000974507c
[ 34.583114] x13: 000000000008c133 x12: ffff200009710a04
[ 34.583405] x11: 1fffe40001b4608f x10: ffff040001b4608f
[ 34.583944] x9 : dfff200000000000 x8 : 657466612d657375
[ 34.584262] x7 : 203b30206e6f2074 x6 : 0000000000000030
[ 34.584544] x5 : 1ffff0018801b5c6 x4 : 0000000000000000
[ 34.584831] x3 : 0000000000000000 x2 : ffffffffffffffff
[ 34.585082] x1 : 61bf317b17390d00 x0 : 0000000000000000
[ 34.585592] Call trace:
[ 34.585802] refcount_inc_checked+0x5c/0x80
[ 34.586033] drm_dev_get+0x24/0x30
[ 34.586217] drm_atomic_state_init+0x150/0x240
[ 34.586434] drm_atomic_state_alloc+0xb8/0xe8
[ 34.586647] __drm_atomic_helper_disable_all.isra.4+0x28/0x438
[ 34.586932] drm_atomic_helper_shutdown+0xac/0x118
[ 34.588049] vkms_release+0x40/0x68 [vkms]
[ 34.588272] drm_dev_put.part.0+0x7c/0xb0
[ 34.588489] drm_dev_put+0x24/0x30
[ 34.588669] vkms_exit+0x38/0x50 [vkms]
[ 34.588874] __arm64_sys_delete_module+0x334/0x538
[ 34.589126] el0_svc_common+0x10c/0x488
[ 34.589318] el0_svc_handler+0x170/0x240
[ 34.589521] el0_svc+0x10/0x640
[ 34.589787] ---[ end trace 546d5b7622744ad6 ]---
[ 34.604467] ------------[ cut here ]------------
[ 34.604767] refcount_t: underflow; use-after-free.
[ 34.605223] WARNING: CPU: 0 PID: 154 at ../lib/refcount.c:190 refcount_sub_and_test_checked+0xe8/0x108
[ 34.605659] Modules linked in: vkms(E-)
[ 34.605872] CPU: 0 PID: 154 Comm: rmmod Tainted: G W E 4.19.90+ #70
[ 34.606198] Hardware name: linux,dummy-virt (DT)
[ 34.606454] pstate: 60000005 (nZCv daif -PAN -UAO)
[ 34.606707] pc : refcount_sub_and_test_checked+0xe8/0x108
[ 34.606968] lr : refcount_sub_and_test_checked+0xe8/0x108
[ 34.607238] sp : ffff8000c818fb10
[ 34.607411] x29: ffff8000c818fb10 x28: ffff8000c6f2e000
[ 34.607905] x27: 0000000000000000 x26: 0000000000000000
[ 34.608252] x25: ffff2000028f5418 x24: ffff20000db37000
[ 34.608505] x23: 1ffff000196d7121 x22: ffff800bc5480000
[ 34.608747] x21: ffff2000028f20c0 x20: ffff20000d8c7434
[ 34.608983] x19: 0000000000000000 x18: 0000000000000000
[ 34.609211] x17: 0000000000000000 x16: 0000000000000000
[ 34.609439] x15: 0000000000000000 x14: ffff20000974507c
[ 34.609667] x13: 0000000000093a5f x12: ffff200009710a04
[ 34.609922] x11: 1fffe40001b4608e x10: ffff040001b4608e
[ 34.610210] x9 : dfff200000000000 x8 : 72657466612d6573
[ 34.610508] x7 : 75203b776f6c6672 x6 : 0000000000000030
[ 34.610823] x5 : 1ffff0018801b5c6 x4 : 0000000000000000
[ 34.611108] x3 : 0000000000000000 x2 : ffffffffffffffff
[ 34.611359] x1 : 61bf317b17390d00 x0 : 0000000000000000
[ 34.611721] Call trace:
[ 34.611981] refcount_sub_and_test_checked+0xe8/0x108
[ 34.612455] refcount_dec_and_test_checked+0x14/0x20
[ 34.612726] drm_dev_put.part.0+0x20/0xb0
[ 34.612943] drm_dev_put+0x24/0x30
[ 34.613101] __drm_atomic_state_free+0xa4/0xe0
[ 34.613346] __drm_atomic_helper_disable_all.isra.4+0x340/0x438
[ 34.613647] drm_atomic_helper_shutdown+0xac/0x118
[ 34.613929] vkms_release+0x40/0x68 [vkms]
[ 34.614168] drm_dev_put.part.0+0x7c/0xb0
[ 34.614410] drm_dev_put+0x24/0x30
[ 34.614586] vkms_exit+0x38/0x50 [vkms]
[ 34.614812] __arm64_sys_delete_module+0x334/0x538
[ 34.615083] el0_svc_common+0x10c/0x488
[ 34.615269] el0_svc_handler+0x170/0x240
[ 34.615481] el0_svc+0x10/0x640
[ 34.615831] ---[ end trace 546d5b7622744ad7 ]---
drm_atomic_helper_shutdown函数需要初始化一个新的drm_atomic_state. CVE-2023-51043补丁在每个drm_atomic_state的初始化无条件加上了dev引用计数加一。因此drm_atomic_helper_shutdown函数会去修改dev引用计数。
vkms驱动在release回调中调用了drm_atomic_helper_shutdown函数,而release回调是在dev引用计数到0后调用的。在这里就会导致上述调用栈中,refcount_t: increment on 0;
的问题。
在驱动卸载时,正确调用顺序应该是drm_atomic_helper_shutdown > drm_dev_put?
这样可以保证drm_atomic_helper_shutdown时,dev仍持有引用计数。
这样的话最好就不要把drm_atomic_helper_shutdown 放在release回调里面。
同理其他也有可能有问题的driver:
drivers/gpu/drm/xen/xen_drm_front.c
OS: openEuler 20.03 LTS SP4
kernel: 升级到 kernel-4.19.90-2402.4.0.0264.oe2003sp4.x86_64
modprobe vkms; modprobe -r vkms
crash> bt
PID: 5317 TASK: ffff8ecb85e82f00 CPU: 25 COMMAND: "modprobe"
#0 [ffff9aeb049d3a58] machine_kexec at ffffffff9ca5466f
#1 [ffff9aeb049d3ab0] __crash_kexec at ffffffff9cb57791
#2 [ffff9aeb049d3b70] crash_kexec at ffffffff9cb5868d
#3 [ffff9aeb049d3b88] oops_end at ffffffff9ca231ef
#4 [ffff9aeb049d3ba8] no_context at ffffffff9ca63ec5
#5 [ffff9aeb049d3c00] __do_page_fault at ffffffff9ca64688
#6 [ffff9aeb049d3c70] do_page_fault at ffffffff9ca64ac1
#7 [ffff9aeb049d3ca0] async_page_fault at ffffffff9d4011fe
[exception RIP: kernfs_find_ns+17]
RIP: ffffffff9cd6de31 RSP: ffff9aeb049d3d58 RFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff9d8c5168 RDI: 0000000000000000
RBP: ffffffff9d8c5168 R8: 0000000000000000 R9: ffffffffc055c6f6
R10: ffffcd7884103600 R11: 0000000000000001 R12: 0000000000000000
R13: ffffffff9ddae9c0 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff9aeb049d3d78] kernfs_find_and_get_ns at ffffffff9cd6defc
#9 [ffff9aeb049d3d98] sysfs_unmerge_group at ffffffff9cd717e8
#10 [ffff9aeb049d3db0] dpm_sysfs_remove at ffffffff9cfe0e9d
#11 [ffff9aeb049d3dc8] device_del at ffffffff9cfd37e7
#12 [ffff9aeb049d3e18] platform_device_del at ffffffff9cfda81e
#13 [ffff9aeb049d3e30] platform_device_unregister at ffffffff9cfda8b3
#14 [ffff9aeb049d3e40] vkms_release at ffffffffc042a015 [vkms]
#15 [ffff9aeb049d3e50] __drm_atomic_helper_disable_all.constprop.29 at ffffffffc063b7d0 [drm_kms_helper]
#16 [ffff9aeb049d3e78] drm_atomic_helper_shutdown at ffffffffc063b840 [drm_kms_helper]
#17 [ffff9aeb049d3ec8] vkms_release at ffffffffc042a01d [vkms]
#18 [ffff9aeb049d3ed8] cleanup_module at ffffffffc042a890 [vkms]
#19 [ffff9aeb049d3ee0] __x64_sys_delete_module at ffffffff9cb51b49
#20 [ffff9aeb049d3f38] do_syscall_64 at ffffffff9ca0430f
#21 [ffff9aeb049d3f50] entry_SYSCALL_64_after_hwframe at ffffffff9d4000a0
RIP: 00007f8c532469b7 RSP: 00007ffe8f2f5898 RFLAGS: 00000206
RAX: ffffffffffffffda RBX: 0000563cd32d6d20 RCX: 00007f8c532469b7
RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000563cd32d6d88
RBP: 0000563cd32d6d20 R8: 00007ffe8f2f4841 R9: 0000000000000000
R10: 00007f8c532b8aa0 R11: 0000000000000206 R12: 0000563cd32d6d88
R13: 0000000000000001 R14: 0000563cd32d6d88 R15: 0000563cd32d6d20
ORIG_RAX: 00000000000000b0 CS: 0033 SS: 002b
crash>
登录 后才可以发表评论