From ab4d84bf0846e31a99c53c83c79c90611fb51a0a Mon Sep 17 00:00:00 2001
From: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Date: Wed, 7 Feb 2024 18:18:54 +0000
Subject: [PATCH 1/2] PCI/DPC: Ignore Surprise Down error on hot removal
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

ANBZ: #26350

commit 2ae8fbbe1cd4 upstream

mainline inclusion
from mainline-v6.9-rc1
commit 2ae8fbbe1cd42a6624d345151859982cba89bbd3
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/IBWFCP
CVE: NA

Reference: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ae8fbbe1cd42a6624d345151859982cba89bbd3

----------------------------------------------------------------------

MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

According to PCIe r6.0 sec 6.7.6 [1], async removal with DPC may result in
surprise down error. This error is expected and is just a side-effect of
async remove.

Ignore surprise down error generated as a side-effect of async remove.
Typically, this error is benign as the pciehp handler invoked by PDC
or/and DLLSC alongside DPC, de-enumerates and brings down the device
appropriately, but the error messages might confuse users. Get rid of
these irritating log messages with a 1s delay while pciehp waits for
DPC recovery.

The implementation is as follows: On an async remove a DPC is triggered
along with a Presence Detect State change and/or DLL State Change.
Determine it's an async remove by checking for DPC Trigger Status in DPC
Status Register and Surprise Down Error Status in AER Uncorrected Error
Status to be non-zero. If true, treat the DPC event as a side-effect of
async remove, clear the error status registers and continue with hot-plug
tear down routines. If not, follow the existing routine to handle AER and
DPC errors.

Masking Surprise Down Errors was explored as an alternative approach, but
discarded due to the odd behavior that masking only avoids the interrupt,
but still records an error per PCIe r6.0, sec 6.2.3.2.2. That stale error
would be reported the next time some error other than Surprise Down is
handled.

Dmesg before:

  pcieport 0000:00:01.4: DPC: containment event, status:0x1f01 source:0x0000
  pcieport 0000:00:01.4: DPC: unmasked uncorrectable error detected
  pcieport 0000:00:01.4: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
  pcieport 0000:00:01.4:   device [1022:14ab] error status/mask=00000020/04004000
  pcieport 0000:00:01.4:    [ 5] SDES (First)
  nvme nvme2: frozen state error detected, reset controller
  pcieport 0000:00:01.4: DPC: Data Link Layer Link Active not set in 1000 msec
  pcieport 0000:00:01.4: AER: subordinate device reset failed
  pcieport 0000:00:01.4: AER: device recovery failed
  pcieport 0000:00:01.4: pciehp: Slot(16): Link Down
  nvme2n1: detected capacity change from 1953525168 to 0
  pci 0000:04:00.0: Removing from iommu group 49

Dmesg after:

 pcieport 0000:00:01.4: pciehp: Slot(16): Link Down
 nvme1n1: detected capacity change from 1953525168 to 0
 pci 0000:04:00.0: Removing from iommu group 37

[1] PCI Express Base Specification Revision 6.0, Dec 16 2021.
    https://members.pcisig.com/wg/PCI-SIG/document/16609

Fixes: aea47413e7ce ("PCI/DPC: Expose dpc_process_error(), dpc_reset_link() for use by EDR")
Link: https://lore.kernel.org/r/20240207181854.121335-1-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: lujunhua <lujunhua7@h-partners.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: xia-bing1 <2328702590@qq.com>
Signed-off-by: lujunhua <lujunhua7@h-partners.com>
---
 drivers/pci/pcie/dpc.c | 60 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 76a74bb21c2b..44ac62abeff0 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -293,10 +293,70 @@ void dpc_process_error(struct pci_dev *pdev)
 	}
 }
 
+static void pci_clear_surpdn_errors(struct pci_dev *pdev)
+{
+	if (pdev->dpc_rp_extensions)
+		pci_write_config_dword(pdev, pdev->dpc_cap +
+				       PCI_EXP_DPC_RP_PIO_STATUS, ~0);
+
+	/*
+	 * In practice, Surprise Down errors have been observed to also set
+	 * error bits in the Status Register as well as the Fatal Error
+	 * Detected bit in the Device Status Register.
+	 */
+	pci_write_config_word(pdev, PCI_STATUS, 0xffff);
+
+	pcie_capability_write_word(pdev, PCI_EXP_DEVSTA, PCI_EXP_DEVSTA_FED);
+}
+
+static void dpc_handle_surprise_removal(struct pci_dev *pdev)
+{
+	if (!pcie_wait_for_link(pdev, false)) {
+		pci_info(pdev, "Data Link Layer Link Active not cleared in 1000 msec\n");
+		goto out;
+	}
+
+	if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev))
+		goto out;
+
+	pci_aer_raw_clear_status(pdev);
+	pci_clear_surpdn_errors(pdev);
+
+	pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS,
+			      PCI_EXP_DPC_STATUS_TRIGGER);
+
+out:
+	clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
+	wake_up_all(&dpc_completed_waitqueue);
+}
+
+static bool dpc_is_surprise_removal(struct pci_dev *pdev)
+{
+	u16 status;
+
+	if (!pdev->is_hotplug_bridge)
+		return false;
+
+	if (pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS,
+				 &status))
+		return false;
+
+	return status & PCI_ERR_UNC_SURPDN;
+}
+
 static irqreturn_t dpc_handler(int irq, void *context)
 {
 	struct pci_dev *pdev = context;
 
+	/*
+	 * According to PCIe r6.0 sec 6.7.6, errors are an expected side effect
+	 * of async removal and should be ignored by software.
+	 */
+	if (dpc_is_surprise_removal(pdev)) {
+		dpc_handle_surprise_removal(pdev);
+		return IRQ_HANDLED;
+	}
+
 	dpc_process_error(pdev);
 
 	/* We configure DPC so it only triggers on ERR_FATAL */
-- 
Gitee


From 20a4cd6e1481e6eb927a7c310e0a5be69247914f Mon Sep 17 00:00:00 2001
From: Jay Fang <f.fangjian@huawei.com>
Date: Tue, 30 Jul 2024 09:16:39 +0800
Subject: [PATCH 2/2] PCI/ASPM: Update ASPM sysfs on MFD function removal to
 avoid use-after-free

ANBZ: #26350

commit 951c5e26f86a upstream

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/IB2LVJ
CVE: NA

--------------------------------

From 'commit 456d8aa37d0f ("PCI/ASPM: Disable ASPM on MFD function removal
to avoid use-after-free")' we know that PCIe spec r6.0, sec 7.5.3.7,
recommends that software program the same ASPM Control(pcie_link_state)
value in all functions of multi-function devices, and free the
pcie_link_state when any child function is removed.

However, ASPM Control sysfs is still visible to other children even if it
has been removed by any child function, and careless use it will
trigger use-after-free error, e.g.:

  # lspci -tv
    -[0000:16]---00.0-[17]--+-00.0  Device 19e5:0222
                            \-00.1  Device 19e5:0222
  # echo 1 > /sys/bus/pci/devices/0000:17:00.0/remove
  //pcie_link_state will be released
  # echo 1 > /sys/bus/pci/devices/0000:17:00.1/link/l1_aspm
  //will trigger error

Unable to handle kernel NULL pointer dereference at virtual
address 0000000000000030 Call trace:
aspm_attr_store_common.constprop.0+0x10c/0x154
l1_aspm_store+0x24/0x30
dev_attr_store+0x20/0x34
sysfs_kf_write+0x4c/0x5c

We can solve this problem by updating the ASPM Control sysfs of all
children immediately after ASPM Control have been freed.

Extra note: The Linux community plans to optimize to the pcie_link_state
management code.For details,see the following link:
https://patchwork.kernel.org/project/linux-pci/patch/
20240730011639.2590846-1-f.fangjian@huawei.com/

Fixes: 456d8aa37d0f ("PCI/ASPM: Disable ASPM on MFD function removal to avoid use-after-free")
Signed-off-by: Jay Fang <f.fangjian@huawei.com>
Signed-off-by: lujunhua <lujunhua7@h-partners.com>
---
 drivers/pci/pcie/aspm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 2d57a3c643b2..7b96275bdd39 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -1029,6 +1029,8 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
 		pcie_config_aspm_path(parent_link);
 	}
 
+	pcie_aspm_update_sysfs_visibility(parent);
+
 	mutex_unlock(&aspm_lock);
 	up_read(&pci_bus_sem);
 }
-- 
Gitee