Re: [PATCH v2] firewire: core: send bus reset promptly on gap count error
Brought to you by:
aeb,
bencollins
From: Takashi S. <o-t...@sa...> - 2024-02-06 23:38:55
|
Hi, Thanks for the patch. I applied it to for-linus branch and will send it for v6.8-rc4 in this week. Thanks for your long patience in the review process. On Sun, Feb 04, 2024 at 09:21:14PM -0800, Adam Goldman wrote: > If we are bus manager and the bus has inconsistent gap counts, send a > bus reset immediately instead of trying to read the root node's config > ROM first. Otherwise, we could spend a lot of time trying to read the > config ROM but never succeeding. > > This eliminates a 50+ second delay before the FireWire bus is usable after > a newly connected device is powered on in certain circumstances. > > The delay occurs if a gap count inconsistency occurs, we are not the root > node, and we become bus manager. One scenario that causes this is with a TI > XIO2213B OHCI, the first time a Sony DSR-25 is powered on after being > connected to the FireWire cable. In this configuration, the Linux box will > not receive the initial PHY configuration packet sent by the DSR-25 as IRM, > resulting in the DSR-25 having a gap count of 44 while the Linux box has a > gap count of 63. > > FireWire devices have a gap count parameter, which is set to 63 on power-up > and can be changed with a PHY configuration packet. This determines the > duration of the subaction and arbitration gaps. For reliable communication, > all nodes on a FireWire bus must have the same gap count. > > A node may have zero or more of the following roles: root node, bus manager > (BM), isochronous resource manager (IRM), and cycle master. Unless a root > node was forced with a PHY configuration packet, any node might become root > node after a bus reset. Only the root node can become cycle master. If the > root node is not cycle master capable, the BM or IRM should force a change > of root node. > > After a bus reset, each node sends a self-ID packet, which contains its > current gap count. A single bus reset does not change the gap count, but > two bus resets in a row will set the gap count to 63. Because a consistent > gap count is required for reliable communication, IEEE 1394a-2000 requires > that the bus manager generate a bus reset if it detects that the gap count > is inconsistent. > > When the gap count is inconsistent, build_tree() will notice this after the > self identification process. It will set card->gap_count to the invalid > value 0. If we become bus master, this will force bm_work() to send a bus > reset when it performs gap count optimization. > > After a bus reset, there is no bus manager. We will almost always try to > become bus manager. Once we become bus manager, we will first determine > whether the root node is cycle master capable. Then, we will determine if > the gap count should be changed. If either the root node or the gap count > should be changed, we will generate a bus reset. > > To determine if the root node is cycle master capable, we read its > configuration ROM. bm_work() will wait until we have finished trying to > read the configuration ROM. > > However, an inconsistent gap count can make this take a long time. > read_config_rom() will read the first few quadlets from the config ROM. Due > to the gap count inconsistency, eventually one of the reads will time out. > When read_config_rom() fails, fw_device_init() calls it again until > MAX_RETRIES is reached. This takes 50+ seconds. > > Once we give up trying to read the configuration ROM, bm_work() will wake > up, assume that the root node is not cycle master capable, and do a bus > reset. Hopefully, this will resolve the gap count inconsistency. > > This change makes bm_work() check for an inconsistent gap count before > waiting for the root node's configuration ROM. If the gap count is > inconsistent, bm_work() will immediately do a bus reset. This eliminates > the 50+ second delay and rapidly brings the bus to a working state. > > I considered that if the gap count is inconsistent, a PHY configuration > packet might not be successful, so it could be desirable to skip the PHY > configuration packet before the bus reset in this case. However, IEEE > 1394a-2000 and IEEE 1394-2008 say that the bus manager may transmit a PHY > configuration packet before a bus reset when correcting a gap count error. > Since the standard endorses this, I decided it's safe to retain the PHY > configuration packet transmission. > > Normally, after a topology change, we will reset the bus a maximum of 5 > times to change the root node and perform gap count optimization. However, > if there is a gap count inconsistency, we must always generate a bus reset. > Otherwise the gap count inconsistency will persist and communication will > be unreliable. For that reason, if there is a gap count inconstency, we > generate a bus reset even if we already reached the 5 reset limit. > > Signed-off-by: Adam Goldman <ad...@po...> > Link: https://sourceforge.net/p/linux1394/mailman/message/58727806/ > --- Takashi Sakamoto |