Thread: Re: [PATCH v2] PCI: Mark LSI FW643 to avoid bus reset
Brought to you by:
aeb,
bencollins
From: Takashi S. <o-t...@sa...> - 2024-03-25 01:21:50
|
Hi Bjorn Helgaas, (C.C.ed to lin...@li...) I have an objection to applying the change. I've been using the issued 1394 OHCI hardware in my development for recent years, while I have never faced the reported trouble. I think there are any misunderstanding or misjudge somwhow in the review process to apply it. Would I ask your precise advice to regenerate the reported issue in my local? This is my 1394 OHCI hardware. ``` $ sudo lspci -vvvnns 06:00.0 06:00.0 FireWire (IEEE 1394) [0c00]: LSI Corporation FW643 [TrueFire] PCIe 1394b Controller [11c1:5901] (rev 06) (prog-if 10 [OHCI]) Subsystem: LSI Corporation FW643 [TrueFire] PCIe 1394b Controller [11c1:5900] Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 255 IOMMU group: 17 Region 0: Memory at fc700000 (64-bit, non-prefetchable) [disabled] [size=4K] Capabilities: [44] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D3 NoSoftRst- PME-Enable+ DSel=0 DScale=0 PME- Capabilities: [4c] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [60] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0W DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <64us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1 TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [140 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- VC1: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable- ID=1 ArbSelect=Fixed TC/VC=00 Status: NegoPending- InProgress- Capabilities: [170 v1] Device Serial Number 12-34-56-10-12-30-00-86 Kernel driver in use: vfio-pci Kernel modules: firewire_ohci ``` I use it in the following environment at present: * Host system * AMD Ryzen 5 2400G * TUF GAMING X570-PLUS with BIOS 5003 (AGESA ComboV2PI 1.2.0.B) * SMT enabled * SVM enabled * IOMMU enabled * Secure boot disabled * Ubuntu 24.04 LTS amd64 * linux-image-6.8.0-11-generic (6.8.0-11.11) * default kernel cmdline * QEMU 8.2.1 (1:8.2.1+ds-1ubuntu1) * Libvert 10.0.0 (10.0.0-2ubuntu1) * Guest system * UEFI using OVMF * Seecure boot enabled * Ubuntu 24.04 LTS amd64 (the same as above) * default kernel cmdline > Using LSI / Agere FW643 with vfio-pci will exhaust all > pci_reset_fn_methods, the bus reset at the end causes a broken link > only recoverable by removing power > (power-off / suspend + rescan). > Prevent this bus reset. > With this change, the device can be assigned to VMs with VFIO. > Note that it will not be reset, resulting in leaking state between VMs > and host. > > Signed-off-by: Edmund Raile <edm...@pr...> > > I sincerely thank you for your patience and explaining > the background of pci resets which I lacked. > The commit message and comment now describe it correctly. > The comment on leaking states was added. > > Usefulness: > > The LSI FW643 PCIe->FireWire 800 interface may be EOL but it is > the only one that does not use a PCIe->PCI bridge. > It is reliable and enables FireWire audio interfaces to be used > on modern machines. > > Virtualization allows for flexible access to professional audio > software. > > It has been used in at least the following Apple machines: > MacBookPro10,1 > MacBookPro9,2 > MacBookPro6,2 > MacBookPro5,1 > Macmini6,1 > Macmini3,1 > iMac12,2 > iMac9,1 > iMac8,1 > > Implementation: > > PCI_VENDOR_ID_ATT was reused as they are identical. > > drivers/pci/quirks.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index d797df6e5f3e..e0e4ad9e6d50 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -3765,6 +3765,19 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003e, quirk_no_bus_reset); > */ > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset); > > +/* > + * Using LSI / Agere FW643 with vfio-pci will exhaust all > + * pci_reset_fn_methods, the bus reset at the end causes a broken link > + * only recoverable by removing power > + * (power-off / suspend + rescan). > + * Prevent this bus reset. > + * With this change, the device can be assigned to VMs with VFIO. > + * Note that it will not be reset, resulting in leaking state between VMs > + * and host. > + */ > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATT, 0x5900, quirk_no_bus_reset); > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATT, 0x5901, quirk_no_bus_reset); > + > /* > * Some TI KeyStone C667X devices do not support bus/hot reset. The PCIESS > * automatically disables LTSSM when Secondary Bus Reset is received and Regards Takashi Sakamoto |
From: Takashi S. <o-t...@sa...> - 2024-03-26 13:19:16
|
Hi Bjorn, Thanks for your reply. On Mon, Mar 25, 2024 at 09:41:49AM -0500, Bjorn Helgaas wrote: > So even without this patch, you are able to pass the FW643 to a VM > with VFIO, and you don't see any issues caused by VFIO resetting the > device? Absolutely yes, at least in my VM, for recent years to maintain Linux FireWire subsystem and ALSA firewire stack. > Can you collect the output of: > > $ find /sys/devices -name reset_method | xargs grep . ``` $ sudo find /sys/devices -name reset_method | xargs grep . /sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:09.0/0000:09:00.0/reset_method:bus /sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:09.0/reset_method:pm /sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:02.0/0000:06:00.0/reset_method:pm bus /sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:02.0/reset_method:pm /sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:05.0/0000:07:00.0/reset_method:bus /sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:05.0/reset_method:pm /sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:0a.0/0000:0a:00.0/reset_method:bus /sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:0a.0/reset_method:pm /sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:08.0/0000:08:00.0/reset_method:bus /sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:08.0/0000:08:00.3/reset_method:pm /sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:08.0/reset_method:pm /sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:01.0/reset_method:pm bus /sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:01.0/0000:05:00.0/reset_method:device_specific flr bus /sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/reset_method:pm bus /sys/devices/pci0000:00/0000:00:01.2/reset_method:pm /sys/devices/pci0000:00/0000:00:08.1/0000:0c:00.0/reset_method:flr bus /sys/devices/pci0000:00/0000:00:08.1/0000:0c:00.3/reset_method:pm /sys/devices/pci0000:00/0000:00:08.1/0000:0c:00.1/reset_method:flr pm /sys/devices/pci0000:00/0000:00:08.1/reset_method:pm /sys/devices/pci0000:00/0000:00:08.1/0000:0c:00.4/reset_method:pm /sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/reset_method:pm bus /sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/reset_method:bus /sys/devices/pci0000:00/0000:00:01.1/reset_method:pm /sys/devices/pci0000:00/0000:00:01.6/0000:0b:00.0/reset_method:flr bus /sys/devices/pci0000:00/0000:00:01.6/reset_method:pm ``` If you need each PCI bus bridge information, I can provide it to you. but I can promise they are typical hardware in AMD CPU or chipset for Zen generation and nothing special. > You should be able to manually reset the device with something like > this (I don't know your topology, so you might have to replace "1d.6" > with the bridge leading to 06:00.0): > > # sudo echo 1 > # /sys/devices/pci0000:00/0000:00:1d.6/0000:06:00.0/reset ``` $ echo 1 > sudo tee -a /sys/devices/pci0000:00/0000:00:01.2/0000:03:00.0/0000:04:02.0/0000:06:00.0/reset (nothing happens) $ journalctl -k -n10 (nothing specific) ``` Would I ask you any point to check after the reset operation? > I don't *know* of a reason why a Secondary Bus Reset would work > correctly on your hardware but not on a Mac, but there could be > something weird going on. Note that the hardware provided by Apple for the past decade has no IEEE 1394 interface, thus the patch author seems to use any kind of bus extension to connect the issued 1394 OHCI hardware. I guess: * Apple Thunderbolt Display * https://lore.kernel.org/linux-pci/137...@li.../ * Apple Thunderbolt-OHCI1394 adapter * I know FW643 is used for the product. * Some kind of eGPU box > Does the patch cause a problem for you? (Other than the fact that the > device leaks state between VMs?) It takes a bit time for me to set up my system with self-compiled v6.9-rc1 kernel. However the leak between VMs is really inconvenient to me by itself. Thanks Takashi Sakamoto |
From: Takashi S. <o-t...@sa...> - 2024-03-29 08:12:37
|
Hi Lukas, On Fri, Mar 29, 2024 at 05:41:16AM +0100, Lukas Wunner wrote: > On Thu, Mar 28, 2024 at 02:42:01PM -0600, Alex Williamson wrote: > > On Wed, 27 Mar 2024 10:01:19 -0500 Bjorn Helgaas <he...@ke...> wrote: > > The original patch proposed for this gave me the impression that this > > was a device used on various old Mac systems, not likely applicable to > > a general purpose plug-in card. > > I'm still using one of those "old Mac systems" as my daily driver. > > Just checked the ACPI tables and there's an FPEN method below the > FRWR device which toggles GPIO 48 on the PCH. Checked the schematics > as well and GPIO 48 is marked FW_PWR_EN. The GPIO controls load > switches which cut power to the FW643 chip when nothing is connected. > > Also, FW_PWR_EN feeds into an SLG4AP016V chip where it seems to > internally gate FW_CLKREQ_L. > > I'm guessing the driver may need to call the FPEN ACPI method after > issuing a SBR to force the chip on (or perhaps first off, then on) > and thereby re-enable Clock Request. > > It's a pity the ohci.c driver doesn't seem to support runtime PM. > That would allow cutting power to the chip when nothing is connected > and thus increase battery life. The ACPI tables indicate that the > platform sends a notification when something is plugged in, so all > the necessary ingredients are there but we're not taking advantage > of them. > > Thanks, > > Lukas Yup. In both PCI drivers and unit drivers belonging to Linux FireWire subsystem, any type of runtime PM is not supported. If I integrate 1394 OHCI driver, I should implement all of the above in any callback of runtime PM, or the part of the above is already supported by any driver in parent PCI layer? Thanks Takashi Sakamoto |
From: Takashi S. <o-t...@sa...> - 2024-03-29 08:05:32
|
Hi, On Thu, Mar 28, 2024 at 06:35:29PM +0000, edmund.raile wrote: > So instead I ran this: > ``` > su -c 'echo 1 > /sys/devices/pci0000\:00/0000\:00\:1c.1/0000\:03\:00.0/reset' > ``` > Playback stopped immediately and could not be resumed. > > Then I received this trace: > > INFO: task alsa-sink-Firef:4110 blocked for more than 245 seconds. > Tainted: G W OE 6.6.10-1-MANJARO #1 > task:alsa-sink-Firef state:D stack:0 pid:4110 ppid:2657 flags:0x00000002 > Call Trace: > <TASK> > __schedule+0x3e7/0x1410 > ? tlb_batch_pages_flush+0x3d/0x70 > schedule+0x5e/0xd0 > schedule_timeout+0x151/0x160 > wait_for_completion+0x8a/0x160 > fw_run_transaction+0xe5/0x120 [firewire_core d9ff4eaf1ffb23a203d413e851f405323b49fec7] > ? __pfx_split_transaction_timeout_callback+0x10/0x10 [firewire_core d9ff4eaf1ffb23a203d413e851f405323b49fec7] > ? __pfx_transmit_complete_callback+0x10/0x10 [firewire_core d9ff4eaf1ffb23a203d413e851f405323b49fec7] > ? __pfx_transaction_callback+0x10/0x10 [firewire_core d9ff4eaf1ffb23a203d413e851f405323b49fec7] > snd_fw_transaction+0x70/0x110 [snd_firewire_lib 30b43a591db389bbc6be51459cb243ba1fe1e662] > ff800_finish_session+0x43/0x70 [snd_fireface 5f7f3f556960f4838886792be8e9c18aa5089b0a] > snd_ff_stream_stop_duplex+0x39/0x70 [snd_fireface 5f7f3f556960f4838886792be8e9c18aa5089b0a] > pcm_hw_free+0x3c/0x50 [snd_fireface 5f7f3f556960f4838886792be8e9c18aa5089b0a] > snd_pcm_common_ioctl+0xe28/0x12b0 [snd_pcm 24933227879438b755ef98bc4844113025f38cdf] > ? __seccomp_filter+0x32c/0x510 > ? __vm_munmap+0xbb/0x150 > snd_pcm_ioctl+0x2e/0x50 [snd_pcm 24933227879438b755ef98bc4844113025f38cdf] > __x64_sys_ioctl+0x94/0xd0 > do_syscall_64+0x5d/0x90 > ? syscall_exit_to_user_mode+0x2b/0x40 > ? do_syscall_64+0x6c/0x90 > ? do_syscall_64+0x6c/0x90 > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Please mind that current software stack to operate your device does not support this kind of operation, as I've already sent to you several times. Users should cancel any type of communication on IEEE 1394 bus, then unplug devices from the bus (or power them off), finally operate suspending. By the way, it is apart from PCI subsystem. Your change is now going to be reverted for v6.9. Regards Takashi Sakamoto |