In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
IntheLinuxkernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before "nodes[0].lock = B") | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until "nodes[1].next != NULL") prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) | ▼ WRITE_ONCE(prev->next, node) | ▼ Spin indefinitely (until nodes[0].locked == 1)Thanks to Saket Kumar Bhaskar for help with recreating the issueThe Linux kernel CVE team has assigned CVE-2024-46797 to this issue.
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes adeadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) |spin_lock(B) |||▼ |id =qnodesp->count++; | (Note that nodes[0].lock == A) |||▼ |Interrupt | (happens before "nodes[0].lock =B") |||▼ | spin_lock_irqsave(A) |||▼ |id =qnodesp->count++ |nodes[1].lock =A|||▼ | Tail of MCS queue ||spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ |CPU0 is previous tail ▼ |Spin indefinitely ▼ (until "nodes[1].next != NULL") prev =get_tail_qnode(A, CPU0) |▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) | ▼ WRITE_ONCE(prev->next, node) | ▼ Spin indefinitely (until nodes[0].locked == 1)Thanks to Saket Kumar Bhaskar for help with recreating the issueThe Linux kernel CVE team has assigned CVE-2024-46797 to this issue.
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causesa deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng ona 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following:$ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A)| spin_unlock_irqrestore(A)| spin_lock(B)||| ▼| id= qnodesp->count++;| (Note that nodes[0].lock == A)||| ▼| Interrupt| (happens before "nodes[0].lock= B")||| ▼| spin_lock_irqsave(A)||| ▼| id= qnodesp->count++| nodes[1].lock=A||| ▼| Tail of MCS queue|| spin_lock_irqsave(A) ▼| Head of MCS queue ▼| CPU0 is previous tail ▼| Spin indefinitely ▼ (until "nodes[1].next != NULL") prev= get_tail_qnode(A, CPU0)| ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A)|
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causesa deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng ona 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following:$ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A)| spin_unlock_irqrestore(A)| spin_lock(B)||| ▼| id= qnodesp->count++;| (Note that nodes[0].lock == A)||| ▼| Interrupt| (happens before "nodes[0].lock= B")||| ▼| spin_lock_irqsave(A)||| ▼| id= qnodesp->count++| nodes[1].lock=A||| ▼| Tail of MCS queue|| spin_lock_irqsave(A) ▼| Head of MCS queue ▼| CPU0 is previous tail ▼| Spin indefinitely ▼ (until "nodes[1].next != NULL") prev= get_tail_qnode(A, CPU0)| ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A)|
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes adeadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) |spin_lock(B) |||▼ |id =qnodesp->count++; | (Note that nodes[0].lock == A) |||▼ |Interrupt | (happens before "nodes[0].lock =B") |||▼ | spin_lock_irqsave(A) |||▼ |id =qnodesp->count++ |nodes[1].lock =A|||▼ | Tail of MCS queue ||spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ |CPU0 is previous tail ▼ |Spin indefinitely ▼ (until "nodes[1].next != NULL") prev =get_tail_qnode(A, CPU0) |▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) | ▼ WRITE_ONCE(prev->next, node) | ▼ Spin indefinitely (until nodes[0].locked == 1)Thanks to Saket Kumar Bhaskar for help with recreating the issueThe Linux kernel CVE team has assigned CVE-2024-46797 to this issue.
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes adeadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) |spin_lock(B) |||▼ |id =qnodesp->count++; | (Note that nodes[0].lock == A) |||▼ |Interrupt | (happens before "nodes[0].lock =B") |||▼ | spin_lock_irqsave(A) |||▼ |id =qnodesp->count++ |nodes[1].lock =A|||▼ | Tail of MCS queue ||spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ |CPU0 is previous tail ▼ |Spin indefinitely ▼ (until "nodes[1].next != NULL") prev =get_tail_qnode(A, CPU0) |▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) | ▼ WRITE_ONCE(prev->next, node) | ▼ Spin indefinitely (until nodes[0].locked == 1)Thanks to Saket Kumar Bhaskar for help with recreating the issueThe Linux kernel CVE team has assigned CVE-2024-46797 to this issue.
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causesa deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng ona 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following:$ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A)| spin_unlock_irqrestore(A)| spin_lock(B)||| ▼| id= qnodesp->count++;| (Note that nodes[0].lock == A)||| ▼| Interrupt| (happens before "nodes[0].lock= B")||| ▼| spin_lock_irqsave(A)||| ▼| id= qnodesp->count++| nodes[1].lock=A||| ▼| Tail of MCS queue|| spin_lock_irqsave(A) ▼| Head of MCS queue ▼| CPU0 is previous tail ▼| Spin indefinitely ▼ (until "nodes[1].next != NULL") prev= get_tail_qnode(A, CPU0)| ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A)|
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes adeadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) |spin_lock(B) |||▼ |id =qnodesp->count++; | (Note that nodes[0].lock == A) |||▼ |Interrupt | (happens before "nodes[0].lock =B") |||▼ | spin_lock_irqsave(A) |||▼ |id =qnodesp->count++ |nodes[1].lock =A|||▼ | Tail of MCS queue ||spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ |CPU0 is previous tail ▼ |Spin indefinitely ▼ (until "nodes[1].next != NULL") prev =get_tail_qnode(A, CPU0) |▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) |
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causesa deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng ona 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following:$ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A)| spin_unlock_irqrestore(A)| spin_lock(B)||| ▼| id= qnodesp->count++;| (Note that nodes[0].lock == A)||| ▼| Interrupt| (happens before "nodes[0].lock= B")||| ▼| spin_lock_irqsave(A)||| ▼| id= qnodesp->count++| nodes[1].lock=A||| ▼| Tail of MCS queue|| spin_lock_irqsave(A) ▼| Head of MCS queue ▼| CPU0 is previous tail ▼| Spin indefinitely ▼ (until "nodes[1].next != NULL") prev= get_tail_qnode(A, CPU0)| ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A)|
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes adeadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) |spin_lock(B) |||▼ |id =qnodesp->count++; | (Note that nodes[0].lock == A) |||▼ |Interrupt | (happens before "nodes[0].lock =B") |||▼ | spin_lock_irqsave(A) |||▼ |id =qnodesp->count++ |nodes[1].lock =A|||▼ | Tail of MCS queue ||spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ |CPU0 is previous tail ▼ |Spin indefinitely ▼ (until "nodes[1].next != NULL") prev =get_tail_qnode(A, CPU0) |▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) |
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causesa deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng ona 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following:$ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A)| spin_unlock_irqrestore(A)| spin_lock(B)||| ▼| id= qnodesp->count++;| (Note that nodes[0].lock == A)||| ▼| Interrupt| (happens before "nodes[0].lock= B")||| ▼| spin_lock_irqsave(A)||| ▼| id= qnodesp->count++| nodes[1].lock=A||| ▼| Tail of MCS queue|| spin_lock_irqsave(A) ▼| Head of MCS queue ▼| CPU0 is previous tail ▼| Spin indefinitely ▼ (until "nodes[1].next != NULL") prev= get_tail_qnode(A, CPU0)| ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A)|
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes adeadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) |spin_lock(B) |||▼ |id =qnodesp->count++; | (Note that nodes[0].lock == A) |||▼ |Interrupt | (happens before "nodes[0].lock =B") |||▼ | spin_lock_irqsave(A) |||▼ |id =qnodesp->count++ |nodes[1].lock =A|||▼ | Tail of MCS queue ||spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ |CPU0 is previous tail ▼ |Spin indefinitely ▼ (until "nodes[1].next != NULL") prev =get_tail_qnode(A, CPU0) |▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) |
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causesa deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng ona 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following:$ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A)| spin_unlock_irqrestore(A)| spin_lock(B)||| ▼| id= qnodesp->count++;| (Note that nodes[0].lock == A)||| ▼| Interrupt| (happens before "nodes[0].lock= B")||| ▼| spin_lock_irqsave(A)||| ▼| id= qnodesp->count++| nodes[1].lock=A||| ▼| Tail of MCS queue|| spin_lock_irqsave(A) ▼| Head of MCS queue ▼| CPU0 is previous tail ▼| Spin indefinitely ▼ (until "nodes[1].next != NULL") prev= get_tail_qnode(A, CPU0)| ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A)|
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linuxkernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCSqueueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized,another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match thelockon that CPU,then we write to the "next"pointer ofthe wrong qnode. This causes a deadlock as the former CPU,once it becomesthehead ofthe MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ngon a16 core (16EC/16VP) shared LPAR,results inoccasionallockupssimilar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU15 HardLOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 CallTrace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrateshow the deadlock occurs.For thesake of brevity, assume that both locks (A and B) arecontended andwe call the queued_spin_lock_slowpath() function. CPU0CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B)|||▼ | id = qnodesp->count++; | (Note that nodes[0].lock==A)|||▼ | Interrupt | (happens before "nodes[0].lock =B")|||▼ | spin_lock_irqsave(A)|||▼ | id = qnodesp->count++ | nodes[1].lock=A|||▼| Tail of MCSqueue|| spin_lock_irqsave(A)▼| Head of MCSqueue▼|CPU0 is previoustail▼ | Spin indefinitely▼ (until "nodes[1].next != NULL") prev = get_tail_qnode(A,CPU0)|▼prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock==A)|
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
IntheLinux kernel,the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIfan interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, anotherCPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock onthatCPU, then wewrite to the "next" pointerofthe wrong qnode. This causes a deadlock as the former CPU, once itbecomesthe head ofthe MCSqueue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a 16core(16EC/16VP) shared LPAR, resultsinoccasional lockupssimilarto the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 HardLOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd00x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how thedeadlock occurs.For the sake ofbrevity, assume that both locks (A and B) arecontended and we callthe queued_spin_lock_slowpath() function. CPU0CPU1 --------spin_lock_irqsave(A) |spin_unlock_irqrestore(A) |spin_lock(B) |||▼|id= qnodesp->count++; |(Note that nodes[0].lock == A) |||▼|Interrupt |(happens before "nodes[0].lock = B")|||▼|spin_lock_irqsave(A) |||▼|id= qnodesp->count++ |nodes[1].lock = A |||▼|TailofMCS queue ||spin_lock_irqsave(A) ▼|HeadofMCS queue ▼|CPU0isprevious tail ▼|Spin indefinitely ▼ (until"nodes[1].next != NULL") prev=get_tail_qnode(A, CPU0) |▼prev==&qnodes[CPU0].nodes[0] (asqnodes[CPU0].nodes[0].lock == A) | ▼ WRITE_ONCE(prev->next, node) | ▼ Spin indefinitely (until nodes[0].locked == 1)Thanks to Saket Kumar Bhaskar for help with recreating the issueThe Linux kernel CVE team has assigned CVE-2024-46797 to this issue.
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ? | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ? | Interrupt | (happens before nodes[0].lock = B ) | | | ? | spin_lock_irqsave(A) | | | ? | id = qnodesp->count++ | nodes[1].lock = A | | | ? | Tail of MCS queue | | spin_lock_irqsave(A) ? | Head of MCS queue ? | CPU0 is previous tail ? | Spin indefinitely ? (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ? prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before "nodes[0].lock = B") | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until "nodes[1].next != NULL") prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) | ▼ WRITE_ONCE(prev->next, node) | ▼ Spin indefinitely (until nodes[0].locked == 1)Thanks to Saket Kumar Bhaskar for help with recreating the issueThe Linux kernel CVE team has assigned CVE-2024-46797 to this issue.
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before "nodes[0].lock = B") | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until "nodes[1].next != NULL") prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) |
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes adeadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) |spin_lock(B) |||▼ |id =qnodesp->count++; | (Note that nodes[0].lock == A) |||▼ |Interrupt | (happens before "nodes[0].lock =B") |||▼ | spin_lock_irqsave(A) |||▼ |id =qnodesp->count++ |nodes[1].lock =A|||▼ | Tail of MCS queue ||spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ |CPU0 is previous tail ▼ |Spin indefinitely ▼ (until "nodes[1].next != NULL") prev =get_tail_qnode(A, CPU0) |▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) |
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causesa deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng ona 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following:$ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A)| spin_unlock_irqrestore(A)| spin_lock(B)||| ▼| id= qnodesp->count++;| (Note that nodes[0].lock == A)||| ▼| Interrupt| (happens before "nodes[0].lock= B")||| ▼| spin_lock_irqsave(A)||| ▼| id= qnodesp->count++| nodes[1].lock=A||| ▼| Tail of MCS queue|| spin_lock_irqsave(A) ▼| Head of MCS queue ▼| CPU0 is previous tail ▼| Spin indefinitely ▼ (until "nodes[1].next != NULL") prev= get_tail_qnode(A, CPU0)| ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A)|
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes adeadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) |spin_lock(B) |||▼ |id =qnodesp->count++; | (Note that nodes[0].lock == A) |||▼ |Interrupt | (happens before "nodes[0].lock =B") |||▼ | spin_lock_irqsave(A) |||▼ |id =qnodesp->count++ |nodes[1].lock =A|||▼ | Tail of MCS queue ||spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ |CPU0 is previous tail ▼ |Spin indefinitely ▼ (until "nodes[1].next != NULL") prev =get_tail_qnode(A, CPU0) |▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) |
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causesa deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng ona 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following:$ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A)| spin_unlock_irqrestore(A)| spin_lock(B)||| ▼| id= qnodesp->count++;| (Note that nodes[0].lock == A)||| ▼| Interrupt| (happens before "nodes[0].lock= B")||| ▼| spin_lock_irqsave(A)||| ▼| id= qnodesp->count++| nodes[1].lock=A||| ▼| Tail of MCS queue|| spin_lock_irqsave(A) ▼| Head of MCS queue ▼| CPU0 is previous tail ▼| Spin indefinitely ▼ (until "nodes[1].next != NULL") prev= get_tail_qnode(A, CPU0)| ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A)|
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes adeadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) |spin_lock(B) |||▼ |id =qnodesp->count++; | (Note that nodes[0].lock == A) |||▼ |Interrupt | (happens before "nodes[0].lock =B") |||▼ | spin_lock_irqsave(A) |||▼ |id =qnodesp->count++ |nodes[1].lock =A|||▼ | Tail of MCS queue ||spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ |CPU0 is previous tail ▼ |Spin indefinitely ▼ (until "nodes[1].next != NULL") prev =get_tail_qnode(A, CPU0) |▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) |
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causesa deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng ona 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following:$ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A)| spin_unlock_irqrestore(A)| spin_lock(B)||| ▼| id= qnodesp->count++;| (Note that nodes[0].lock == A)||| ▼| Interrupt| (happens before "nodes[0].lock= B")||| ▼| spin_lock_irqsave(A)||| ▼| id= qnodesp->count++| nodes[1].lock=A||| ▼| Tail of MCS queue|| spin_lock_irqsave(A) ▼| Head of MCS queue ▼| CPU0 is previous tail ▼| Spin indefinitely ▼ (until "nodes[1].next != NULL") prev= get_tail_qnode(A, CPU0)| ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A)|
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes adeadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) |spin_lock(B) |||▼ |id =qnodesp->count++; | (Note that nodes[0].lock == A) |||▼ |Interrupt | (happens before "nodes[0].lock =B") |||▼ | spin_lock_irqsave(A) |||▼ |id =qnodesp->count++ |nodes[1].lock =A|||▼ | Tail of MCS queue ||spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ |CPU0 is previous tail ▼ |Spin indefinitely ▼ (until "nodes[1].next != NULL") prev =get_tail_qnode(A, CPU0) |▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) |
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causesa deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng ona 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following:$ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A)| spin_unlock_irqrestore(A)| spin_lock(B)||| ▼| id= qnodesp->count++;| (Note that nodes[0].lock == A)||| ▼| Interrupt| (happens before "nodes[0].lock= B")||| ▼| spin_lock_irqsave(A)||| ▼| id= qnodesp->count++| nodes[1].lock=A||| ▼| Tail of MCS queue|| spin_lock_irqsave(A) ▼| Head of MCS queue ▼| CPU0 is previous tail ▼| Spin indefinitely ▼ (until "nodes[1].next != NULL") prev= get_tail_qnode(A, CPU0)| ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A)|
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes adeadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) |spin_lock(B) |||▼ |id =qnodesp->count++; | (Note that nodes[0].lock == A) |||▼ |Interrupt | (happens before "nodes[0].lock =B") |||▼ | spin_lock_irqsave(A) |||▼ |id =qnodesp->count++ |nodes[1].lock =A|||▼ | Tail of MCS queue ||spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ |CPU0 is previous tail ▼ |Spin indefinitely ▼ (until "nodes[1].next != NULL") prev =get_tail_qnode(A, CPU0) |▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) |
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causesa deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng ona 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following:$ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A)| spin_unlock_irqrestore(A)| spin_lock(B)||| ▼| id= qnodesp->count++;| (Note that nodes[0].lock == A)||| ▼| Interrupt| (happens before "nodes[0].lock= B")||| ▼| spin_lock_irqsave(A)||| ▼| id= qnodesp->count++| nodes[1].lock=A||| ▼| Tail of MCS queue|| spin_lock_irqsave(A) ▼| Head of MCS queue ▼| CPU0 is previous tail ▼| Spin indefinitely ▼ (until "nodes[1].next != NULL") prev= get_tail_qnode(A, CPU0)| ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A)|
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes adeadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) |spin_lock(B) |||▼ |id =qnodesp->count++; | (Note that nodes[0].lock == A) |||▼ |Interrupt | (happens before "nodes[0].lock =B") |||▼ | spin_lock_irqsave(A) |||▼ |id =qnodesp->count++ |nodes[1].lock =A|||▼ | Tail of MCS queue ||spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ |CPU0 is previous tail ▼ |Spin indefinitely ▼ (until "nodes[1].next != NULL") prev =get_tail_qnode(A, CPU0) |▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) |
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causesa deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng ona 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following:$ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A)| spin_unlock_irqrestore(A)| spin_lock(B)||| ▼| id= qnodesp->count++;| (Note that nodes[0].lock == A)||| ▼| Interrupt| (happens before "nodes[0].lock= B")||| ▼| spin_lock_irqsave(A)||| ▼| id= qnodesp->count++| nodes[1].lock=A||| ▼| Tail of MCS queue|| spin_lock_irqsave(A) ▼| Head of MCS queue ▼| CPU0 is previous tail ▼| Spin indefinitely ▼ (until "nodes[1].next != NULL") prev= get_tail_qnode(A, CPU0)| ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A)|
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes adeadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) |spin_lock(B) |||▼ |id =qnodesp->count++; | (Note that nodes[0].lock == A) |||▼ |Interrupt | (happens before "nodes[0].lock =B") |||▼ | spin_lock_irqsave(A) |||▼ |id =qnodesp->count++ |nodes[1].lock =A|||▼ | Tail of MCS queue ||spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ |CPU0 is previous tail ▼ |Spin indefinitely ▼ (until "nodes[1].next != NULL") prev =get_tail_qnode(A, CPU0) |▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) |
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causesa deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng ona 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following:$ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A)| spin_unlock_irqrestore(A)| spin_lock(B)||| ▼| id= qnodesp->count++;| (Note that nodes[0].lock == A)||| ▼| Interrupt| (happens before "nodes[0].lock= B")||| ▼| spin_lock_irqsave(A)||| ▼| id= qnodesp->count++| nodes[1].lock=A||| ▼| Tail of MCS queue|| spin_lock_irqsave(A) ▼| Head of MCS queue ▼| CPU0 is previous tail ▼| Spin indefinitely ▼ (until "nodes[1].next != NULL") prev= get_tail_qnode(A, CPU0)| ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A)|
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes adeadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) |spin_lock(B) |||▼ |id =qnodesp->count++; | (Note that nodes[0].lock == A) |||▼ |Interrupt | (happens before "nodes[0].lock =B") |||▼ | spin_lock_irqsave(A) |||▼ |id =qnodesp->count++ |nodes[1].lock =A|||▼ | Tail of MCS queue ||spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ |CPU0 is previous tail ▼ |Spin indefinitely ▼ (until "nodes[1].next != NULL") prev =get_tail_qnode(A, CPU0) |▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) |
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causesa deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng ona 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following:$ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A)| spin_unlock_irqrestore(A)| spin_lock(B)||| ▼| id= qnodesp->count++;| (Note that nodes[0].lock == A)||| ▼| Interrupt| (happens before "nodes[0].lock= B")||| ▼| spin_lock_irqsave(A)||| ▼| id= qnodesp->count++| nodes[1].lock=A||| ▼| Tail of MCS queue|| spin_lock_irqsave(A) ▼| Head of MCS queue ▼| CPU0 is previous tail ▼| Spin indefinitely ▼ (until "nodes[1].next != NULL") prev= get_tail_qnode(A, CPU0)| ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A)|
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes adeadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) |spin_lock(B) |||▼ |id =qnodesp->count++; | (Note that nodes[0].lock == A) |||▼ |Interrupt | (happens before "nodes[0].lock =B") |||▼ | spin_lock_irqsave(A) |||▼ |id =qnodesp->count++ |nodes[1].lock =A|||▼ | Tail of MCS queue ||spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ |CPU0 is previous tail ▼ |Spin indefinitely ▼ (until "nodes[1].next != NULL") prev =get_tail_qnode(A, CPU0) |▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) |
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causesa deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng ona 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following:$ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A)| spin_unlock_irqrestore(A)| spin_lock(B)||| ▼| id= qnodesp->count++;| (Note that nodes[0].lock == A)||| ▼| Interrupt| (happens before "nodes[0].lock= B")||| ▼| spin_lock_irqsave(A)||| ▼| id= qnodesp->count++| nodes[1].lock=A||| ▼| Tail of MCS queue|| spin_lock_irqsave(A) ▼| Head of MCS queue ▼| CPU0 is previous tail ▼| Spin indefinitely ▼ (until "nodes[1].next != NULL") prev= get_tail_qnode(A, CPU0)| ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A)|
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes adeadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) |spin_lock(B) |||▼ |id =qnodesp->count++; | (Note that nodes[0].lock == A) |||▼ |Interrupt | (happens before "nodes[0].lock =B") |||▼ | spin_lock_irqsave(A) |||▼ |id =qnodesp->count++ |nodes[1].lock =A|||▼ | Tail of MCS queue ||spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ |CPU0 is previous tail ▼ |Spin indefinitely ▼ (until "nodes[1].next != NULL") prev =get_tail_qnode(A, CPU0) |▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) |
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causesa deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng ona 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following:$ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A)| spin_unlock_irqrestore(A)| spin_lock(B)||| ▼| id= qnodesp->count++;| (Note that nodes[0].lock == A)||| ▼| Interrupt| (happens before "nodes[0].lock= B")||| ▼| spin_lock_irqsave(A)||| ▼| id= qnodesp->count++| nodes[1].lock=A||| ▼| Tail of MCS queue|| spin_lock_irqsave(A) ▼| Head of MCS queue ▼| CPU0 is previous tail ▼| Spin indefinitely ▼ (until "nodes[1].next != NULL") prev= get_tail_qnode(A, CPU0)| ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A)|
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before "nodes[0].lock = B") | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until "nodes[1].next != NULL") prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) |
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the next pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it s next pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before nodes[0].lock = B ) | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until nodes[1].next != NULL ) prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes---truncated---
In the Linux kernel, the following vulnerability has been resolved:powerpc/qspinlock: Fix deadlock in MCS queueIf an interrupt occurs in queued_spin_lock_slowpath() after we incrementqnodesp->count and before node->lock is initialized, another CPU mightsee stale lock values in get_tail_qnode(). If the stale lock value happensto match the lock on that CPU, then we write to the "next" pointer ofthe wrong qnode. This causes a deadlock as the former CPU, once it becomesthe head of the MCS queue, will spin indefinitely until it's "next" pointeris set by its successor in the queue.Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results inoccasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive --maximize --oomable --verify --syslog --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ecThe following code flow illustrates how the deadlock occurs.For the sake of brevity, assume that both locks (A and B) arecontended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before "nodes[0].lock = B") | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until "nodes[1].next != NULL") prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) |