From: marcello.carla <mar...@gm...> - 2024-07-28 21:08:10
|
Dear Michael and Dear David, 1) here is a proposal for a patch to system hang in case of an off-line or non existent device. Thanks to Michael for having spotted the problem. When bb_write() is called, NDAC has to be already asserted low, or no device is listening. This was not checked and made go crazy bb_NRFD_interrupt(). In the patch, I have also moved to debug level 1 the (normally useless) warnings for out of order or idle interrupts. But see also point 2. --- gpib_bitbang.c-76c3dc 2024-07-27 23:35:01.412798005 +0200 +++ gpib_bitbang.c 2024-07-27 23:48:26.639818311 +0200 @@ -417,7 +417,7 @@ int send_eoi, size_t *bytes_written) { unsigned long flags; - int retval = 0; + int retval = -1; bb_private_t *priv = board->private_data; @@ -438,6 +438,7 @@ dbg_printk(1,"Enabling interrupts - NRFD: %d NDAC: %d\n", gpiod_get_value(NRFD), gpiod_get_value(NDAC)); + if (gpiod_get_value(NDAC)) goto write_end; spin_lock_irqsave (&priv->rw_lock, flags); priv->w_busy = 1; /* make the interrupt routines active */ @@ -506,13 +507,13 @@ if (priv->phase == 99) ENABLE_IRQ (priv->irq_NRFD, IRQ_TYPE_EDGE_RISING); if (priv->w_busy == 0) { - dbg_printk(0,"interrupt while idle after %zu/%zu at %d\n", + dbg_printk(1,"interrupt while idle after %zu/%zu at %d\n", priv->w_cnt, priv->length, priv->phase); priv->nrfd_idle++; goto nrfd_exit; /* idle */ } if (nrfd == 0) { - dbg_printk(0,"out of order interrupt after %zu/%zu at %d cmd %d " LINFMT ".\n", + dbg_printk(1,"out of order interrupt after %zu/%zu at %d cmd %d " LINFMT ".\n", priv->w_cnt, priv->length, priv->phase, priv->cmd, LINVAL); priv->phase = 3; priv->nrfd_seq++; @@ -565,12 +566,12 @@ irq, gpiod_get_value(NRFD), ndac, board->status, priv->direction, priv->w_busy, priv->r_busy); if (priv->w_busy == 0) { - dbg_printk(0,"interrupt while idle.\n"); + dbg_printk(1,"interrupt while idle.\n"); priv->ndac_idle++; goto ndac_exit; } if (ndac == 0) { - dbg_printk(0,"out of order interrupt at %zu:%d.\n", priv->w_cnt, priv->phase); + dbg_printk(1,"out of order interrupt at %zu:%d.\n", priv->w_cnt, priv->phase); priv->phase = 5; priv->ndac_seq++; goto ndac_exit; 2) I see the problem of a buffer overflow while writing, but to reproduce repeated interrupts with slow edges I had to use rising and falling times longer than 100 us, a not common event, and only with RPi3b. RPi4 and RPi5 have Schmitt triggers on input, and this makes the event impossible. Yet, a good interlock between NDAC and NRFD interrupts is advisable, and the current implementation (for historical reasons) is a mess. Sorry for that and thanks again to Michael for signalling the problem. A revision of this part of the code was already on schedule. Asap, as usual. Bye Marcello Carla' On 7/28/24 21:17, Michael Schwingen wrote: > On 26.07.24 17:48, marcello.carla via Linux-gpib-general wrote: >> Dear Michael, >> >> bb_DAV_interrupt(): >> >> the check for buffer overflow is at line 390 of current version >> [76c3dc] (line 379 of last unpatched [b4cbd1]): >> >> priv->end_flag = ((priv->count >= priv->request) || priv->end); >> >> 'count' is the number of read character; 'request' is the buffer >> length; when the buffer is full, the operation is terminated even >> before an EOI or newline. Can you reproduce the conditions when >> this mechanism does not work correctly? > > No, I currently can't reproduce it, but I am quite sure I had cases > where the count incremented way beyond the expected transfer count. > > It might have been the write case: > > in bb_NRFD_interrupt, we have > > set_data_lines(priv->w_buf[priv->w_cnt++]); // put the data on > the lines > with no check of the transfer size - it checks the size to assert EOI, > but does not stop further interrupts from happening. > > The check to end the transfer is in bb_NDAC_interrupt. > > Now if you get lots of NRFD interrupts before NDAC interrupt is > called, you will increment w_cnt without limit. > > Also, if you get multiple NRFD interrupts for one transfer (which can > happen with sloppy rising/falling edges and reflections), you will get > wrong data in the buffer. The NRFD interrupt should be locked out > until the NDAC phase has happened (and vice-versa). > > >> >> system hang: >> >> yes, there is a problem; when you address a non existing device >> with ibrd(), you correctly obtain a timeout error; when you try >> an ibwrt() on a non existing device, the system hangs. I shall >> try to spot the error and propose a remedy asap. > > The interesting thing is this only happens some of the time. i had to > power-cycle the DMM about 5 times before I could catch the hang. > > cu > > Michael > > > > _______________________________________________ > Linux-gpib-general mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linux-gpib-general |