In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
IntheLinuxkernel,thefollowingvulnerabilityhasbeenresolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don tadvance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as avalid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn tvisited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap atop-level SPTE manifests in one of two ways. If avalidSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ?futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips agfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers ause-after-free in the form ofmarking astruct page as dirty/accessed after it has been put back on thefree list. This directly triggers aWARN due to encountering apage withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 (KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed )moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past atop-level entry when yielding on alower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there sno easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Dont advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas asa valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasnt visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zapa top-level SPTE manifests in one of two ways. Ifa validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU:1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skipsa gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggersa use-after-free in the form ofmarkinga struct page as dirty/accessed after it has been put back on thefree list. This directly triggersa WARN due to encounteringa page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU:7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance pasta top-level entry when yielding ona lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and theres no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don tadvance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as avalid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn tvisited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap atop-level SPTE manifests in one of two ways. If avalidSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ?futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips agfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers ause-after-free in the form ofmarking astruct page as dirty/accessed after it has been put back on thefree list. This directly triggers aWARN due to encountering apage withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 (KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed )moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past atop-level entry when yielding on alower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there sno easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Dont advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas asa valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasnt visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zapa top-level SPTE manifests in one of two ways. Ifa validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU:1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skipsa gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggersa use-after-free in the form ofmarkinga struct page as dirty/accessed after it has been put back on thefree list. This directly triggersa WARN due to encounteringa page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU:7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance pasta top-level entry when yielding ona lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and theres no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don tadvance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as avalid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn tvisited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap atop-level SPTE manifests in one of two ways. If avalidSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ?futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips agfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers ause-after-free in the form ofmarking astruct page as dirty/accessed after it has been put back on thefree list. This directly triggers aWARN due to encountering apage withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 (KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed )moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past atop-level entry when yielding on alower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there sno easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Dont advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas asa valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasnt visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zapa top-level SPTE manifests in one of two ways. Ifa validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU:1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skipsa gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggersa use-after-free in the form ofmarkinga struct page as dirty/accessed after it has been put back on thefree list. This directly triggersa WARN due to encounteringa page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU:7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance pasta top-level entry when yielding ona lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and theres no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don tadvance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as avalid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn tvisited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap atop-level SPTE manifests in one of two ways. If avalidSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ?futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips agfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers ause-after-free in the form ofmarking astruct page as dirty/accessed after it has been put back on thefree list. This directly triggers aWARN due to encountering apage withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 (KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed )moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past atop-level entry when yielding on alower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there sno easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Don t advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas as a valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasn t visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zap a top-level SPTE manifests in one of two ways. If a validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form ofmarking a struct page as dirty/accessed after it has been put back on thefree list. This directly triggers a WARN due to encountering a page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3 ( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed ) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance past a top-level entry when yielding on a lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and there s no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---
In the Linux kernel, the following vulnerability has been resolved:KVM: x86/mmu: Dont advance iterator after restart due to yieldingAfter dropping mmu_lock in the TDP MMU, restart the iterator duringtdp_iter_next() and do not advance the iterator. Advancing the iteratorresults in skipping the top-level SPTE and all its children, which isfatal if any of the skipped SPTEs were not visited before yielding.When zapping all SPTEs, i.e. when min_level == root_level, restarting theiter and then invoking tdp_iter_next() is always fatal if the current gfnhas asa valid SPTE, as advancing the iterator results in try_step_side()skipping the current gfn, which wasnt visited before yielding.Sprinkle WARNs on iter->yielded being true in various helpers that areoften used in conjunction with yielding, and tag the helper with__must_check to reduce the probabily of improper usage.Failing to zapa top-level SPTE manifests in one of two ways. Ifa validSPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU:1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xaeIf kvm_tdp_mmu_zap_all() skipsa gfn/SPTE but that SPTE is then zapped bykvm_tdp_mmu_put_root(), KVM triggersa use-after-free in the form ofmarkinga struct page as dirty/accessed after it has been put back on thefree list. This directly triggersa WARN due to encounteringa page withpage_count() == 0, but it can also lead to data corruption and additionalerrors in the kernel. WARNING: CPU:7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xaeNote, the underlying bug existed even before commit 1af4a96025b3( KVM:x86/mmu: Yield in TDU MMU iter even if no SPTES changed) moved calls totdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could stillincorrectly advance pasta top-level entry when yielding ona lower-levelentry. But with respect to leaking shadow pages, the bug was introducedby yielding before processing the current gfn.Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, orcallers could jump to their retry label. The downside of that approachis that tdp_mmu_iter_cond_resched() _must_ be called before anything elsein the loop, and theres no easy way to enfornce that requirement.Ideally, KVM would handling the cond_resched() fully within the iteratormacro (the code is actually quite clean) and avoid this entire class ofbugs, but that is extremely difficult do wh---truncated---