Looks like the following commit:
16257e430641 crypto: kabi: KABI reservation for crypto
causes the following panic stack. And after remove the KABI_RESERVE() in "include/crypto/cryptd.h" and "include/crypto/hash.h", the panic disappears.
===============================================================
[ 15.642274][ T1469] BUG: kernel NULL pointer dereference, address: 000000000000002c
[ 15.659247][ T1469] #PF: supervisor read access in kernel mode [ 15.659248][ T1469] #PF: error_code(0x0000) - not-present page
[ 15.659249][ T1469] PGD 12e952067 P4D 0
[ 15.659251][ T1469] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 15.659252][ T1469] CPU: 80 PID: 1469 Comm: cryptomgr_test Not tainted 6.6.0-iommufd66+ #1
[ 15.659254][ T1469] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022
[ 15.659255][ T1469] RIP: 0010:crypto_ahash_setkey+0x11/0x60
[ 15.659260][ T1469] Code: 00 00 00 00 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 47 70 53 48 89 fb <85> 70 2c 75 17 48 8b 47 38 ff d0 0f 1f 00 85 c0 75 13 83 63 5c fe [ 15.659261][ T1469] RSP: 0018:ffa00000209dfbf8 EFLAGS: 00010286
[ 15.659262][ T1469] RAX: 0000000000000000 RBX: ff11000109abe3d0 RCX: 00000000fff000ff
[ 15.659263][ T1469] RDX: 0000000000000010 RSI: ffffffffb62aea35 RDI: ff11000109abe3d0
[ 15.659264][ T1469] RBP: 0000000000000000 R08: 00000000fff000ff R09: ffa00000209dfca0
[ 15.659265][ T1469] R10: 0000000000000000 R11: ffa00000209dfc68 R12: ff11000109abf858
[ 15.659265][ T1469] R13: ffa00000209dfca0 R14: 0000000000000000 R15: ffffffffb5e8c940
[ 15.659266][ T1469] FS: 0000000000000000(0000) GS:ff11001c8e100000(0000) knlGS:0000000000000000
[ 15.659267][ T1469] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 15.659268][ T1469] CR2: 000000000000002c CR3: 000000012e9cc002 CR4: 0000000000771ee0
[ 15.659269][ T1469] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 15.659270][ T1469] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 15.659270][ T1469] PKRU: 55555554
[ 15.659271][ T1469] Call Trace:
[ 15.659272][ T1469] <TASK>
[ 15.659273][ T1469] ? __die+0x23/0x70
[ 15.659277][ T1469] ? page_fault_oops+0x81/0x150
[ 15.659280][ T1469] ? exc_page_fault+0x5ea/0x7d0
[ 15.659283][ T1469] ? asm_exc_page_fault+0x26/0x30
[ 15.659287][ T1469] ? crypto_ahash_setkey+0x11/0x60
[ 15.659289][ T1469] crypto_ahash_setkey+0x1c/0x60
[ 15.659291][ T1469] test_ahash_vec_cfg+0x165/0x840
[ 15.659294][ T1469] ? vsnprintf+0x44d/0x630
[ 15.659296][ T1469] ? sprintf+0x5a/0x80
[ 15.659297][ T1469] __alg_test_hash.isra.0+0x1aa/0x3a0
[ 15.659299][ T1469] alg_test+0x199/0x610
[ 15.659300][ T1469] ? __schedule+0x611/0xc30
[ 15.659302][ T1469] ? __pfx_cryptomgr_test+0x10/0x10
[ 15.659305][ T1469] cryptomgr_test+0x24/0x40
[ 15.659307][ T1469] kthread+0xe5/0x120
[ 15.659310][ T1469] ? __pfx_kthread+0x10/0x10
[ 15.659312][ T1469] ret_from_fork+0x31/0x50
[ 15.659315][ T1469] ? __pfx_kthread+0x10/0x10
[ 15.659317][ T1469] ret_from_fork_asm+0x1b/0x30
[ 15.659321][ T1469] </TASK>
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
我使用openeuler_defconfig没能将问题复现出来,编译的时候的config是否可以提供下?另外是否所有的ko都重新编译过了?
我复现的步骤:
make distclean
cp arch/x86/configs/openeuler_defconfig .config
make olddefconfig
make -j256
make modules_install -j256
make install
我在SPR和ICX两个平台都复现了。
难道是跟平台相关?
这两个平台是arm64的?
这两个平台是arm64的?
没有额外的ko
grub configuration
title Fedora Linux (6.6.0-iommufd66+) 38 (Server Edition)
version 6.6.0-iommufd66+
linux /vmlinuz-6.6.0-iommufd66+
initrd /initramfs-6.6.0-iommufd66+.img
options root=UUID=255d6584-02dc-49e2-848a-36d9f9af7eb6 ro console=tty0 console=ttyS0,115200n8 ignore_loglevel panic=5 kernel.softlockup_panic=1 crashkernel=2G intel_iommu=on
grub_users $grub_users
grub_arg --unrestricted
grub_class fedora
我这边复现不了,要不帮忙加个打印看看具体是哪个算法出的问题?
alg_test函数里将alg入参打印出来
好的,我试试。
另外,我抓了一个kdump:
crash> dis -lr crypto_ahash_setkey
/home/zengz/linux-OLK-6.6/crypto/ahash.c: 167
0xffffffffb41c9de0 <crypto_ahash_setkey>: endbr64
crash> dis -l crypto_ahash_setkey
/home/zengz/linux-OLK-6.6/crypto/ahash.c: 167
0xffffffffb41c9de0 <crypto_ahash_setkey>: endbr64
0xffffffffb41c9de4 <crypto_ahash_setkey+4>: nopl 0x0(%rax,%rax,1)
/home/zengz/linux-OLK-6.6/crypto/ahash.c: 171
0xffffffffb41c9de9 <crypto_ahash_setkey+9>: mov 0x70(%rdi),%rax
/home/zengz/linux-OLK-6.6/crypto/ahash.c: 167
0xffffffffb41c9ded <crypto_ahash_setkey+13>: push %rbx
0xffffffffb41c9dee <crypto_ahash_setkey+14>: mov %rdi,%rbx
/home/zengz/linux-OLK-6.6/crypto/ahash.c: 171
0xffffffffb41c9df1 <crypto_ahash_setkey+17>: test %esi,0x2c(%rax) <=======================
0xffffffffb41c9df4 <crypto_ahash_setkey+20>: jne 0xffffffffb41c9e0d <crypto_ahash_setkey+45>
/home/zengz/linux-OLK-6.6/crypto/ahash.c: 174
0xffffffffb41c9df6 <crypto_ahash_setkey+22>: mov 0x38(%rdi),%rax
0xffffffffb41c9dfa <crypto_ahash_setkey+26>: call *%rax
0xffffffffb41c9dfc <crypto_ahash_setkey+28>: nopl (%rax)
/home/zengz/linux-OLK-6.6/crypto/ahash.c: 176
0xffffffffb41c9dff <crypto_ahash_setkey+31>: test %eax,%eax
0xffffffffb41c9e01 <crypto_ahash_setkey+33>: jne 0xffffffffb41c9e16 <crypto_ahash_setkey+54>
/home/zengz/linux-OLK-6.6/./include/linux/crypto.h: 492
0xffffffffb41c9e03 <crypto_ahash_setkey+35>: andl $0xfffffffe,0x5c(%rbx)
/home/zengz/linux-OLK-6.6/crypto/ahash.c: 182
0xffffffffb41c9e07 <crypto_ahash_setkey+39>: pop %rbx
应该是进入crypto_ahash_key()的时候,参数tfm->__crt_alg == NULL了,所以crypto_tfm_alg_alignmask()里面访问tfm->__crt_alg->cra_alignmas的时候panic了。
crash> bt
PID: 1494 TASK: ff1100011cbd5f40 CPU: 98 COMMAND: "cryptomgr_test"
#0 [ffa000002006f950] machine_kexec at ffffffffb3c78617
#1 [ffa000002006f9a8] __crash_kexec at ffffffffb3e0143e
#2 [ffa000002006fa68] crash_kexec at ffffffffb3e0247c
#3 [ffa000002006fa70] oops_end at ffffffffb3c3346d
#4 [ffa000002006fa90] page_fault_oops at ffffffffb3c8ca78
#5 [ffa000002006fae8] exc_page_fault at ffffffffb4918d6a #6 [ffa000002006fb40] asm_exc_page_fault at ffffffffb4a012c6
[exception RIP: crypto_ahash_setkey+17]
RIP: ffffffffb41c9df1 RSP: ffa000002006fbf8 RFLAGS: 00010286
RAX: 0000000000000000 RBX: ff11001d07ecc9d0 RCX: 00000000fff000ff
RDX: 0000000000000010 RSI: ffffffffb50ae93a RDI: ff11001d07ecc9d0
RBP: 0000000000000000 R8: 00000000fff000ff R9: ffa000002006fca0
R10: 0000000000000000 R11: ffa000002006fc68 R12: ff11001d07eaf0d8
R13: ffa000002006fca0 R14: 0000000000000000 R15: ffffffffb4c8c900
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffa000002006fc00] crypto_ahash_setkey at ffffffffb41c9dfc
#8 [ffa000002006fc10] test_ahash_vec_cfg at ffffffffb41d5215
#9 [ffa000002006fd90] __alg_test_hash at ffffffffb41d5aaa
#10 [ffa000002006fe08] alg_test at ffffffffb41d4c29
#11 [ffa000002006fee8] cryptomgr_test at ffffffffb41d0144
#12 [ffa000002006fef8] kthread at ffffffffb3d302c5
#13 [ffa000002006ff30] ret_from_fork at ffffffffb3c3eb41
#14 [ffa000002006ff50] ret_from_fork_asm at ffffffffb3c0265b
crash> struct crypto_ahash ff11001d07ecc9d0
struct crypto_ahash {
init = 0xffffffffb41e3f80 <cryptd_hash_final_enqueue>,
update = 0xffffffffb41e3020 <cryptd_hash_finup_enqueue>,
final = 0xffffffffb41e3e90 <cryptd_hash_digest_enqueue>,
finup = 0xffffffffb41e25d0 <cryptd_hash_export>,
digest = 0xffffffffb41e2610 <cryptd_hash_import>,
export = 0xffffffffb41e3440 <cryptd_hash_setkey>,
import = 0x3c00000014,
setkey = 0x0,
statesize = 0,
reqsize = 0,
kabi_reserved1 = 4294967297,
kabi_reserved2 = 4294967295,
base = {
refcnt = {
refs = {
counter = -1273194992
}
},
crt_flags = 4293918975,
node = 416071784,
exit = 0x0,
__crt_alg = 0x0,
kabi_reserved1 = 1,
kabi_reserved2 = 18379471679260441120,
__crt_ctx = 0xff11001d07ecca58
}
}
噢我复现出来了!应该是某个加速ko出了问题
在测试ghash的时候panic
[ 15.473769][ T1468] alg_test: alg: ghash
[ 15.474946][ T1271] ACPI: bus type drm_connector registered [ 15.479992][ T1468] ahash: alg_cra_name: ghash
[ 15.492485][ T1468] BUG: kernel NULL pointer dereference, address: 000000000000002c
[ 15.492486][ T1468] #PF: supervisor read access in kernel mode
[ 15.492487][ T1468] #PF: error_code(0x0000) - not-present page
[ 15.492488][ T1468] PGD 0 [ 15.492490][ T1468] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 15.492493][ T1468] CPU: 43 PID: 1468 Comm: cryptomgr_test Not tainted 6.6.0-iommufd66+ #4
[ 15.492550][ T1269] dca service started, version 1.12.1
[ 15.538232][ T1468] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022
[ 15.538233][ T1468] RIP: 0010:crypto_ahash_setkey+0x11/0x60
嗯嗯是的,ghash的硬件加速算法
从下面这个数据结构的开头的几个函数指针来看,正好错位了2个函数指针,也就是错位了2个64bit。
可能是某个数据结构在编译的时候,没有包括那两个KABI_RESERVE()的field?
crash> struct crypto_ahash ff11001d07ecc9d0
struct crypto_ahash {
init = 0xffffffffb41e3f80 <cryptd_hash_final_enqueue>,
update = 0xffffffffb41e3020 <cryptd_hash_finup_enqueue>,
final = 0xffffffffb41e3e90 <cryptd_hash_digest_enqueue>,
finup = 0xffffffffb41e25d0 <cryptd_hash_export>,
digest = 0xffffffffb41e2610 <cryptd_hash_import>,
export = 0xffffffffb41e3440 <cryptd_hash_setkey>,
import = 0x3c00000014,
setkey = 0x0,
statesize = 0,
reqsize = 0,
kabi_reserved1 = 4294967297,
kabi_reserved2 = 4294967295,
base = {
refcnt = {
refs = {
counter = -1273194992
}
},
crt_flags = 4293918975,
node = 416071784,
exit = 0x0,
__crt_alg = 0x0,
kabi_reserved1 = 1,
kabi_reserved2 = 18379471679260441120,
__crt_ctx = 0xff11001d07ecca58
}
}
是由于类型强转。详情看下面的解释。
问题定位了。问题原因是cryptd_alloc_ahash()里面会把crypto_ahash结构体强转成cryptd_ahsah结构体。
cryptd_ahash结构体里面是直接用一个base成员指到crypto_ahash结构体。
因此cryptd这类结构体是不能在base之前增加任何成员的。否则cryptd_ahash和crypto_ahash就会错位。
启动验证成功:
登录 后才可以发表评论