Thread: [libdc1394-devel] Hanging at the VIDEO1394_LISTEN_WAIT_BUFFER ioctl
Capture and control API for IIDC compliant cameras
Brought to you by:
ddouxchamps,
gordp
From: Carlos Y. V. <ca...@br...> - 2005-04-20 00:06:45
|
Hey all, I'm trying to chase down a problem with 1394 capture using libdc, and have run out of ideas. So I'm trolling for suggestions. It is a custom hardware embedded system using Xilinx's VirtexIIPro, which will explain why we're using an old version of libdc and Linux. The system is a VirtexIIPro, with on-board PCI/ethernet/DDR/serial/CF using Xilinx's opb_pci_ref core based on the ML300 reference design. We're using Linux kernel 2.4.22, libraw1394 0.9.0, and libdc1394 0.9.1 and 0.9.3. We can't swtich to Linux 2.6 yet since MontaVista is supplying the Linux device drivers for Xilinx, and they're still at 2.4.x. There is a TI 4322 family OHCI 1394 chip hanging off the PCI bus, and we've even run experiments with a Synergy 1394 PMC card connected to the on-board PCI bus using PMC with the same results. The same results will happen to both our custom board, and the Xilinx ML300 evaluation board. The symptoms are: Linux boots, sees the 1394 chip (or both, if the Synergy board is connected) and configures it. The ohci1394 and ieee1394 modules see the chip(s) and the devices appear in /dev (using devfs at the moment). lspci lists the devices correctly. /proc/bus/ieee1394/devices lists the host, and the 1394 cameras on the bus (Pt Grey Dragonflys at the moment. Hi PtGrey!) Using a custom program based on the example included in libdc, but modified to capture stereo images using dma_multi_capture(), I can see the cameras, query them, set them up, set up the dma capture, and start isochronous transmission. This is all verified using a 1394 bus analyzer. Capture will happen fine from anywhere from 20 seconds to 20 minutes, and then hang. Hanging will occur with the program waiting on a VIDEO1394_LISTEN_WAIT_BUFFER ioctl to return. I can control-C out of the program, and re-start it with the same behavior. The hanging seems to be triggered on a race condition. If I increase the frequency in which I call dma_multi_capture() the quicker it will hang. On infrequent, but not insignificant, occasion, the hang will occur on the dma_setup_capture() call. If I control-C out of that, the entire OS will hang. I'm hoping it's the last two items that will jog someone's memory. I'm also working on converting to the full EDK PCI, but having Linux problems at the moment. I'm not ruling out anything, but I've also speed up the PCI clock, slowed it down, changed its phase internal vs external, etc. Same results. I don't expect this list to answer hardware problems, I'm asking elsewhere for that. Any clues? I'm chasing two paths, 1) something's up with the Linux kernel/libdc, which is for this list, and 2) Something's up with OPB_PCI_REF, which is for my other source. Thanks! --Carlos V. |
From: Damien D. <da...@do...> - 2005-04-20 01:33:48
|
Hi Carlos, On Tue, 2005-04-19 at 17:06 -0700, Carlos Y. Villalpando wrote: > Hey all, > > I'm trying to chase down a problem with 1394 capture using libdc, and > have run out of ideas. So I'm trolling for suggestions. > > It is a custom hardware embedded system using Xilinx's VirtexIIPro, > which will explain why we're using an old version of libdc and Linux. > > The system is a VirtexIIPro, with on-board PCI/ethernet/DDR/serial/CF > using Xilinx's opb_pci_ref core based on the ML300 reference design. > We're using Linux kernel 2.4.22, libraw1394 0.9.0, and libdc1394 0.9.1 > and 0.9.3. We can't swtich to Linux 2.6 yet since MontaVista is > supplying the Linux device drivers for Xilinx, and they're still at > 2.4.x. There is a TI 4322 family OHCI 1394 chip hanging off the PCI > bus, and we've even run experiments with a Synergy 1394 PMC card > connected to the on-board PCI bus using PMC with the same results. > The same results will happen to both our custom board, and the Xilinx > ML300 evaluation board. Have you tried the latest 2.4.30? I don't know how well the 2.4 branch is maintained but there must have been improvements since 2.4.22... > The symptoms are: > > Linux boots, sees the 1394 chip (or both, if the Synergy board is > connected) and configures it. > > The ohci1394 and ieee1394 modules see the chip(s) and the devices > appear in /dev (using devfs at the moment). > > lspci lists the devices correctly. > > /proc/bus/ieee1394/devices lists the host, and the 1394 cameras on > the bus (Pt Grey Dragonflys at the moment. Hi PtGrey!) Hi Don! ;-) > Using a custom program based on the example included in libdc, but > modified to capture stereo images using dma_multi_capture(), I can > see the cameras, query them, set them up, set up the dma capture, and > start isochronous transmission. This is all verified using a 1394 > bus analyzer. > > Capture will happen fine from anywhere from 20 seconds to 20 minutes, > and then hang. Hanging will occur with the program waiting on a > VIDEO1394_LISTEN_WAIT_BUFFER ioctl to return. I can control-C out of > the program, and re-start it with the same behavior. The hanging > seems to be triggered on a race condition. If I increase the > frequency in which I call dma_multi_capture() the quicker it will > hang. If it works for some time and then hang I would suspect a kernel problem, not something related to libdc. The only thing I can imagine with libdc would be that your image buffer gets full at some point and then something goes wrong. Very unlikely though... > On infrequent, but not insignificant, occasion, the hang will occur on > the dma_setup_capture() call. If I control-C out of that, the entire > OS will hang. > > I'm hoping it's the last two items that will jog someone's memory. I'm > also working on converting to the full EDK PCI, but having Linux > problems at the moment. I remember that a serious issue was corrected around 2.4.23. If possible upgrade to 2.4.30 and hope it will work. If not, upgrade your kernel version step by step (2.4.23, 2.4.24,...) to see if something works. I know this can be time consuming but I'm not a kernel hacker so I don't have many ideas here... Alternatively, you can check linux1394.org and get the a recent SVN snapshot for your kernel (be sure to get the 2.4 branch, of course). That will avoid lengthy compilations but you could run into compilation troubles. > I'm not ruling out anything, but I've also speed up the PCI clock, > slowed it down, changed its phase internal vs external, etc. Same > results. I don't expect this list to answer hardware problems, I'm > asking elsewhere for that. > > Any clues? I'm chasing two paths, 1) something's up with the Linux > kernel/libdc, which is for this list, and 2) Something's up with > OPB_PCI_REF, which is for my other source. I'm not aware of any problem in libdc that would trigger this hang. To me it's rather a kernel problem. Damien -- _ Damien 'Takahara' Douxchamps, PhD ('- Post-doctoral investigator //\ Image Processing Group, Nara Institute of Science and Technology V_/_ http://chihara.aist-nara.ac.jp/ |
From: Carlos Y. V. <ca...@br...> - 2005-04-21 17:22:19
|
Quoting Damien Douxchamps <da...@do...>: > > Have you tried the latest 2.4.30? I don't know how well the 2.4 branch > is maintained but there must have been improvements since 2.4.22... > Ok, I found a 2.4.30 PPC kernel, and still get the same results. Thanks all for the notes you all responded with yesterday. I'm still chugging away at it. --Carlos V. |
From: Damien D. <da...@do...> - 2005-04-22 01:13:35
|
On Thu, 2005-04-21 at 10:22 -0700, Carlos Y. Villalpando wrote: > Quoting Damien Douxchamps <da...@do...>: > > > > Have you tried the latest 2.4.30? I don't know how well the 2.4 branch > > is maintained but there must have been improvements since 2.4.22... > > > > Ok, I found a 2.4.30 PPC kernel, and still get the same results. > Thanks all for the notes you all responded with yesterday. I'm still > chugging away at it. PPC might be the problem... Have you tried with a PC? Damien -- _ Damien 'Takahara' Douxchamps, PhD ('- Post-doctoral investigator //\ Image Processing Group, Nara Institute of Science and Technology V_/_ http://chihara.aist-nara.ac.jp/ |
From: Carlos Y. V. <ca...@br...> - 2005-04-26 18:37:25
|
Hey all, Just as a final wrap-up, it wasn't a kernel/libdc problem, it was a hardware problem. Bad board layout that is fortunately fixable in the FPGA with a DCM and a phase shift. The coments I received about libdc/kernel were still useful in me understanding what the entire process was. Thanks all for the help! Two last questions. 1) What's the latest version of libdc that works with a 2.4 kernel? Is it the 1.0 split that switches to 2.6? We're now using libraw 1.2 and libdc 0.9.4 on 2.4.30. 2) Do the 1.x versions of libdc have stereo synchronization code? In stereo, we've found that the kernel/libdc can give us a stereo pair that's a frame off from each other. e.g left is the current frame, and right is the previous frame. Since we're using Pt Grey cameras, we know the cameras are synchronized in both capture and transmission. We've got code to make sure the frames are synchronized even if we have to go back a frame from the newest frame. Synchronized frames are important for stereo if the scene isn't static. If not, once we package it up, would you like the patch? It'll probably be against libdc 0.9.4. Thanks all! --Carlos V. |
From: Damien D. <da...@do...> - 2005-04-27 01:22:28
|
Hi Carlos, On Tue, 2005-04-26 at 11:37 -0700, Carlos Y. Villalpando wrote: > Hey all, > > Just as a final wrap-up, it wasn't a kernel/libdc problem, it was a > hardware problem. Bad board layout that is fortunately fixable in the > FPGA with a DCM and a phase shift. The coments I received about > libdc/kernel were still useful in me understanding what the entire > process was. Good to know, thanks. > Thanks all for the help! > > Two last questions. > > 1) What's the latest version of libdc that works with a 2.4 kernel? > Is it the 1.0 split that switches to 2.6? We're now using libraw > 1.2 and libdc 0.9.4 on 2.4.30. I did not know that libdc only worked with 2.6. AFAIK the latest version should work with 2.4 too. Oh, wait, there's been some incompatibility with IOCTLs back then. From the ChangeLog: 2003-12-24 Damien <ddo...@us...> * updated the IOCTLs for kernels 2.4.21+ and 2.6.0+. This breaks compatibility with previous kernel versions. So you'll need something more recent than 2.4.21. > 2) Do the 1.x versions of libdc have stereo synchronization code? No, although you can use the broadcast address (63) to send a start_iso command to all cameras, thereby sync'ing them _once_. Due to clock drift and the absence of a PLL desync will appear with time. > In stereo, we've found that the kernel/libdc can give us a stereo > pair that's a frame off from each other. e.g left is the current > frame, and right is the previous frame. Since we're using Pt Grey > cameras, we know the cameras are synchronized in both capture and > transmission. We've got code to make sure the frames are (since you have ptgrey cams you don't need the 'broadcast hack' mentioned above) > synchronized even if we have to go back a frame from the newest > frame. Synchronized frames are important for stereo if the scene > isn't static. If not, once we package it up, would you like the > patch? It'll probably be against libdc 0.9.4. That could be interesting indeed! :-) Damien -- _ Damien 'Takahara' Douxchamps, PhD ('- Post-doctoral investigator //\ Image Processing Group, Nara Institute of Science and Technology V_/_ http://chihara.aist-nara.ac.jp/ |