Thread: [libdc] EINTR and DC1394_CAPTURE_POLICY_WAIT (Again)
Capture and control API for IIDC compliant cameras
Brought to you by:
ddouxchamps,
gordp
From: Irv E. <el...@wl...> - 2008-06-30 14:37:07
|
Hi, I'm still having problems on a certain system with no images being delivered to LibDC1394 and higher. Sometimes it works, sometimes it doesn't. The hardware is at a remote, unmanned location, which is both good (no question about configuration changes) and bad. Software-only measures like reloading drivers and rebooting has restored normal operation. I want to automate this. (Sigh...) In March 2008 there was a discussion on this list about what to do with interrupts to the ioctl in linux/platform_capture_dequeue in the case of DC1394_CAPTURE_POLICY_WAIT. Apparently DC1394_IOCTL_FAILURE had an overloaded meaning, and there was an issue with the video1394 driver returning inappropriate EINTRs. The ultimate solution was to ignore interrupts and endlessly retry the ioctl. Unfortunately, this approach precludes an application using LibDC from getting out a hopelessly blocked dc1394_capture_dequeue. IMHO opinion there should always be a graceful way out (no SIGKILL). I'm sorry I wasn't paying attention at the time. The most elegant option is a LibDC max retries parameter. Minus one is default and any negative value means infinite. This requires a new API function to set the parameter (and probably another one to get it...). An alternative is that LibDC retries a small, fixed number of times (e.g., 4-10?), and that this number N is defined as a API constant. An application that really wants to cancel a waiting dequeue will have to send N+1 signals. Not elegant (what is the optimal value of N, how long to wait between kills?) but easy to implement and quick-and-dirty effective. Why don't I just poll? In normal operation with 5+ cameras in a multi-threaded application polling adds unnecessary overhead that may/will lead to frame loss. It's also less elegant. You don't want to be forced into polling just because some time in the future some camera could stop delivering images (e.g., due to a failure in external triggering). Having dc1394_capture_dequeue be application-interruptable will not solve my primary problem. It's not clear that LibDC (or my application) are the problem Unfortunately, diagnostic tools are not at hand. Or should I try getting deeper into /sys and /proc with custom code? Cheers, Irv. -- Delft Hydraulics, GeoDelft, the Subsurface and Groundwater unit of TNO and parts of Rijkswaterstaat have joined forces in a new independent institute for delta technology, Deltares. Deltares combines knowledge and experience in the field of water, soil and the subsurface. We provide innovative solutions to make living in deltas, coastal areas and river basins safe, clean and sustainable. DISCLAIMER: This message is intended exclusively for the addressee(s) and may contain confidential and privileged information. If you are not the intended recipient please notify the sender immediately and destroy this message. Unauthorized use, disclosure or copying of this message is strictly prohibited. The foundation 'Stichting Deltares', which has its seat at Delft, The Netherlands, Commercial Registration Number 41146461, is not liable in any way whatsoever for consequences and/or damages resulting from the improper, incomplete and untimely dispatch, receipt and/or content of this e-mail. |
From: David M. <dcm@MIT.EDU> - 2008-06-30 20:39:31
|
On Mon, 2008-06-30 at 16:36 +0200, Irv Elshoff wrote: > In March 2008 there was a discussion on this list about what to do with > interrupts to the ioctl in linux/platform_capture_dequeue in the case of > DC1394_CAPTURE_POLICY_WAIT. Apparently DC1394_IOCTL_FAILURE had an > overloaded meaning, and there was an issue with the video1394 driver > returning inappropriate EINTRs. The ultimate solution was to ignore > interrupts and endlessly retry the ioctl. > The issue was with signals, not interrupts. The convention for dealing with signals during system calls is to restart the system call automatically whenever there is a handled signal. Before March, that didn't work properly. Now it does. > Unfortunately, this approach precludes an application using LibDC from > getting out a hopelessly blocked dc1394_capture_dequeue. IMHO opinion > there should always be a graceful way out (no SIGKILL). I'm sorry I > wasn't paying attention at the time. Well, a hopelessly blocked dc1394_capture_dequeue() may happen for a lot of reasons, and you may or may not be getting signals during that time. The EINTR/signal issue is not really relevant to your problem. > Why don't I just poll? In normal operation with 5+ cameras in a > multi-threaded application polling adds unnecessary overhead that > may/will lead to frame loss. It's also less elegant. You don't want to > be forced into polling just because some time in the future some camera > could stop delivering images (e.g., due to a failure in external > triggering). > Do you mean: 1) Converting your application to be single-threaded causes it to have unnecessary overhead. or: 2) Adding a select/poll before each call to dequeue causes unnecessary overhead. I think you're saying 1), because 2) should not really be a problem. Linux can handle _lots_ of system calls per second, and adding an extra system call per frame grabbed should be negligible. I suggest simply adding a select/poll before each call to dequeue (keeping your multi-threaded design), and only do the dequeue once you know a frame is available. You can abort the select/poll for whatever reason you want -- either use a timeout, or set up a pipe which will determine when to break early from a dequeue. > Having dc1394_capture_dequeue be application-interruptable will not > solve my primary problem. It's not clear that LibDC (or my application) > are the problem Unfortunately, diagnostic tools are not at hand. Or > should I try getting deeper into /sys and /proc with custom code? You mention dequeues that hang indefinitely. I assume that's not on purpose? Is this the problem you are trying to solve? Can you give a more complete description of when it happens and any other information you might have about the problem such as which card, how many cards, which cameras, any error messages, etc? Thanks, David |
From: Irv E. <el...@wl...> - 2008-07-01 20:30:04
|
David Moore wrote: > On Mon, 2008-06-30 at 16:36 +0200, Irv Elshoff wrote: > >> In March 2008 there was a discussion on this list about what to do with >> [signals] to the ioctl in linux/platform_capture_dequeue... >> > > The issue was with signals, not interrupts. Sorry for the confusion. I did mean signals in the POSIX sense of the term, and interrupts in the sense that system calls are interrupted, and return EINTR in errno. > The convention for dealing > with signals during system calls is to restart the system call > automatically whenever there is a handled signal. Before March, that > didn't work properly. Now it does. > I would like to argue that it's up the application - not the library, LibDC in this case - to decide what to do. Or at least be able to control what happens from the application. >> [The current] approach precludes an application using LibDC from >> getting out a hopelessly blocked dc1394_capture_dequeue.... >> > > Well, a hopelessly blocked dc1394_capture_dequeue() may happen for a lot > of reasons, and you may or may not be getting signals during that time. > The EINTR/signal issue is not really relevant to your problem. > My primary problem is indeed something else (see below). But what used to be an escape hatch - allowing a SIGALRM to wake up the blocked dequeue - is now nailed shut. But... > Do you mean: > 1) Converting your application to be single-threaded causes it to have > unnecessary overhead. > > or: > 2) Adding a select/poll before each call to dequeue causes unnecessary > overhead. > > I think you're saying 1), because 2) should not really be a problem. > Linux can handle _lots_ of system calls per second, and adding an extra > system call per frame grabbed should be negligible. I suggest simply > adding a select/poll before each call to dequeue (keeping your > multi-threaded design), and only do the dequeue once you know a frame is > available. > > You can abort the select/poll for whatever reason you want -- either use > a timeout, or set up a pipe which will determine when to break early > from a dequeue. > This is a very useful comment. Thanks! Actually, I meant neither. My application is multi-threaded because lots of cooperating concurrent things are going on. It would be a nightmare to implement this in a single thread and multiple processes (in the POSIX sense, with very frequent context switches and IPC calls) would ruin performance. My initial reaction to "do your own poll/select " was "then I'm going to have to mess with YOUR file descriptor". As an ardent practitioner of object-oriented design, punching through the LibDC layer to underlying system calls wasn't appealing. But I do see there's a (new?) dc1394_capture_get_fileno API call that punches most of the hole for me. That is, LibDC explicitly allows access to the file descriptor. I'm going to assume for select/poll only, and not my own reads and writes and whatever, right? In a prefect OO world spanning software layers wouldn't be necessary. A LibDC poll/select function that hides the file descriptor would IMHO be better than a get_fileno function that exposes it. But others might argue that this is needless overhead. So's life... > You mention dequeues that hang indefinitely. I assume that's not on > purpose? Is this the problem you are trying to solve? Can you give a > more complete description of ... No, it's definitely not on purpose, and yes, it is THE problem I'm trying to solve. It's better if I start a new thread for this. Just for background: I had software problems with multi-camera in transitioning from 2.0.0-rc7 to 2.0.2. And a solution that was was triggered by David Trotz and entails using the all-in-one Format7 ROI function instead of individual calls to set size, position, etc. This is, AFAIK, not the problem anymore. That problem was reproducable on many systems. The system/problem in question was working fine (with new software) at a certain point until an automatic reboot, then no camera produced images. The system is at a remote location; I have no direct access to the hardware. But what could a reboot do at four in the morning? I'm not convinced it's a LibDC problem at all. Feels lower level. To be continued... As for the signal/EINTR issue: I can manage now with polling (POSIX) before dequeue (LibDC). Now that the file descriptor genie is out of the bottle it's probably not a good ideal to suck it back in, at least until 3.0. Thanks! Cheers, Irv. -- Delft Hydraulics, GeoDelft, the Subsurface and Groundwater unit of TNO and parts of Rijkswaterstaat have joined forces in a new independent institute for delta technology, Deltares. Deltares combines knowledge and experience in the field of water, soil and the subsurface. We provide innovative solutions to make living in deltas, coastal areas and river basins safe, clean and sustainable. DISCLAIMER: This message is intended exclusively for the addressee(s) and may contain confidential and privileged information. If you are not the intended recipient please notify the sender immediately and destroy this message. Unauthorized use, disclosure or copying of this message is strictly prohibited. The foundation 'Stichting Deltares', which has its seat at Delft, The Netherlands, Commercial Registration Number 41146461, is not liable in any way whatsoever for consequences and/or damages resulting from the improper, incomplete and untimely dispatch, receipt and/or content of this e-mail. |
From: David M. <dcm@MIT.EDU> - 2008-07-01 21:08:44
|
On Tue, 2008-07-01 at 22:29 +0200, Irv Elshoff wrote: > I would like to argue that it's up the application - not the library, > LibDC in this case - to decide what to do. Or at least be able to > control what happens from the application. > Yes, you're right of course. By default, system calls like "read", "write", and "poll" are automatically restarted under linux upon receiving a signal. This default behavior is what we are currently consistent with. Normally, you can change this default using the sigaction() function, and choosing not specify the SA_RESTART flag. See: http://www.opengroup.org/onlinepubs/000095399/functions/sigaction.html You are right that video1394 support under libdc1394 will not obey the case in which SA_RESTART is turned off because of our workaround. Note that this will no longer be a problem with the Juju stack, which will soon be the preferred stack. That's why we are reluctant to change this behavior in the old stack. > My initial reaction to "do your own poll/select " was "then I'm going > to have to mess with YOUR file descriptor". As an ardent practitioner > of object-oriented design, punching through the LibDC layer to > underlying system calls wasn't appealing. But I do see there's a > (new?) dc1394_capture_get_fileno API call that punches most of the > hole for me. That is, LibDC explicitly allows access to the file > descriptor. I'm going to assume for select/poll only, and not my own > reads and writes and whatever, right? Yes, get_fileno is for exactly this purpose (using poll). You will get a different fd from each of your cameras, so you can poll them individually (or together if you wish). > In a prefect OO world spanning software layers wouldn't be necessary. > A LibDC poll/select function that hides the file descriptor would IMHO > be better than a get_fileno function that exposes it. But others > might argue that this is needless overhead. So's life... The great advantage of file descriptors is that they are generic mechanism. If libdc provided its own poll/select function, how would you integrate it with other custom poll/select functions from different libraries? The unix design philosophy of "everything is a file descriptor" solves this dilemma nicely and allows application writers to share the same event loop if they so choose. -David |