Thread: Problems in 3.3.x with via VT6315
Brought to you by:
aeb,
bencollins
From: Lluís B. i R. <vi...@vi...> - 2012-05-08 11:17:37
|
Hello, here we have two computers, both with VT6315, running all x86_64, one is an intel i7 and the other an amd Phenom II. The I7 runs 2.6.35 and dc1394 cameras work perfectly in it. The Phenom II runs 3.3.4 and dc1394 cameras work for a short seconds, and hang. Then the process hangs too. Killing the process leaves it <defunct> (unrelated to the parent accepting it or not). Using the same userland software in both computers. I can't test 3.3.4 on the I7 easily, but I imagine something could have got broken for the VT6315 since 2.6.35. Can anyone confirm that VT6315 works in 3.3.x kernels? Regards, Lluís. |
From: Clemens L. <cl...@la...> - 2012-05-08 11:56:27
|
Lluís Batlle i Rossell wrote: > here we have two computers, both with VT6315, running all x86_64, one is an > intel i7 and the other an amd Phenom II. > > The I7 runs 2.6.35 and dc1394 cameras work perfectly in it. > The Phenom II runs 3.3.4 and dc1394 cameras work for a short seconds, and hang. > Then the process hangs too. Killing the process leaves it <defunct> (unrelated > to the parent accepting it or not). Apparently, the driver is waiting for the hardware to do something. When this happens, are there any messages in the system log (see the output of dmesg)? Regards, Clemens |
From: Lluís B. i R. <vi...@vi...> - 2012-05-08 13:42:11
|
On Tue, May 08, 2012 at 02:00:15PM +0200, Clemens Ladisch wrote: > Lluís Batlle i Rossell wrote: > > here we have two computers, both with VT6315, running all x86_64, one is an > > intel i7 and the other an amd Phenom II. > > > > The I7 runs 2.6.35 and dc1394 cameras work perfectly in it. > > The Phenom II runs 3.3.4 and dc1394 cameras work for a short seconds, and hang. > > Then the process hangs too. Killing the process leaves it <defunct> (unrelated > > to the parent accepting it or not). > > Apparently, the driver is waiting for the hardware to do something. > > When this happens, are there any messages in the system log (see the output of > dmesg)? Another thing I could not find anywhere is that in the 2.6.35 computer, I get /dev/raw1394 and /dev/video1394-0. In the 3.3.x computer, I don't get any such device node, and I don't have any modules named raw1394 or video1394. Did those device nodes disappeared in recent kernels? Regards, Lluís. |
From: Clemens L. <cl...@la...> - 2012-05-10 13:25:33
|
Lluís Batlle i Rossell wrote: > On Wed, May 09, 2012 at 08:21:29AM +0200, Clemens Ladisch wrote: >> Lluís Batlle i Rossell wrote: >>> # lsfirewirephy >>> timeout >>> timeout >> >> (This should work directly after booting, before any hang.) > > Yes, they work fine. And what _is_ the output? :) >> What is the entire output of dmesg immediately after booting? What are the >> outputs of "cat /proc/interrupt" and "dmesg | tail" when it hangs? > > I attach the dmesg, but it does not have the firewire camera connected now. > [33793.225644] [Firmware Bug]: cpu 1, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu > [33793.225644] perf: IBS APIC setup failed on cpu #1 > [33793.227208] [Firmware Bug]: cpu 2, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu > [33793.227208] perf: IBS APIC setup failed on cpu #2 > [33793.228639] [Firmware Bug]: cpu 3, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu > [33793.228639] perf: IBS APIC setup failed on cpu #3 These errors should not affect you, and it appears are normal with the BIOSes for certain CPU family. There's nothing else relevant in the log. > 19: 7 0 1 533 IO-APIC-fasteoi ehci_hcd:usb2, firewire_ohci Could you check whether this EHCI controller that shares the same interrupt works in this situation, i.e., if high-speed USB devices work on these ports? (Check whether there are USB ports where these interrupt counts increase when you plug something in.) Regards, Clemens |
From: Lluís B. i R. <vi...@vi...> - 2012-05-10 13:29:12
|
On Thu, May 10, 2012 at 03:29:17PM +0200, Clemens Ladisch wrote: > Lluís Batlle i Rossell wrote: > > On Wed, May 09, 2012 at 08:21:29AM +0200, Clemens Ladisch wrote: > >> Lluís Batlle i Rossell wrote: > >>> # lsfirewirephy > >>> timeout > >>> timeout > >> > >> (This should work directly after booting, before any hang.) > > > > Yes, they work fine. > > And what _is_ the output? :) Sorry. Now with camera: # lsfirewirephy bus 0, node 0: 080028:424296 Texas Instruments TSB41AB1/2 bus 0, node 1: 001163:306001 VIA Technologies VT630x > > 19: 7 0 1 533 IO-APIC-fasteoi ehci_hcd:usb2, firewire_ohci > > Could you check whether this EHCI controller that shares the same > interrupt works in this situation, i.e., if high-speed USB devices > work on these ports? (Check whether there are USB ports where these > interrupt counts increase when you plug something in.) Yes, the interrupt count increases, the usb works fine. Thank you a lot, Lluís. |
From: Lluís B. i R. <vi...@vi...> - 2012-05-08 13:21:13
|
On Tue, May 08, 2012 at 02:00:15PM +0200, Clemens Ladisch wrote: > Lluís Batlle i Rossell wrote: > > here we have two computers, both with VT6315, running all x86_64, one is an > > intel i7 and the other an amd Phenom II. > > > > The I7 runs 2.6.35 and dc1394 cameras work perfectly in it. > > The Phenom II runs 3.3.4 and dc1394 cameras work for a short seconds, and hang. > > Then the process hangs too. Killing the process leaves it <defunct> (unrelated > > to the parent accepting it or not). > > Apparently, the driver is waiting for the hardware to do something. > > When this happens, are there any messages in the system log (see the output of > dmesg)? Nothing there. I only get some lines if I disconnect the camera, but the process remains <defunct>. I'll try to switch kernels some time soon. |
From: Clemens L. <cl...@la...> - 2012-05-08 14:23:27
|
Lluís Batlle i Rossell wrote: > On Tue, May 08, 2012 at 02:00:15PM +0200, Clemens Ladisch wrote: >> Lluís Batlle i Rossell wrote: >>> here we have two computers, both with VT6315, running all x86_64, one is an >>> intel i7 and the other an amd Phenom II. >>> >>> The I7 runs 2.6.35 and dc1394 cameras work perfectly in it. >>> The Phenom II runs 3.3.4 and dc1394 cameras work for a short seconds, and hang. >>> Then the process hangs too. Killing the process leaves it <defunct> (unrelated >>> to the parent accepting it or not). >> >> Apparently, the driver is waiting for the hardware to do something. >> >> When this happens, are there any messages in the system log (see the output of >> dmesg)? > > Nothing there. I only get some lines if I disconnect the camera, but the process > remains <defunct>. Try executing (as root) echo w > /proc/sysrq-trigger when this happens; this should output a list of blocked tasks to the system log. > Another thing I could not find anywhere is that in the 2.6.35 computer, I get > /dev/raw1394 and /dev/video1394-0. In the 3.3.x computer, I don't get any such > device node, and I don't have any modules named raw1394 or video1394. The FireWire stack was completely rewritten. Check that your libraw1394 package is recent enough (current is 2.0.8, but 2.0.6 or newer should work). In any case, this library could not cause the kernel to lock up. Regards, Clemens |
From: Stefan R. <st...@s5...> - 2012-05-08 18:13:56
|
On May 08 Clemens Ladisch wrote: > Lluís Batlle i Rossell wrote: > > Another thing I could not find anywhere is that in the 2.6.35 computer, I get > > /dev/raw1394 and /dev/video1394-0. In the 3.3.x computer, I don't get any such > > device node, and I don't have any modules named raw1394 or video1394. > > The FireWire stack was completely rewritten. Check that your libraw1394 package > is recent enough (current is 2.0.8, but 2.0.6 or newer should work). In any > case, this library could not cause the kernel to lock up. On the older kernel stack, libdc1394 per default uses direct video1394 accesses for video capture but raw1394 accesses via libraw1394 for discovery, control and status I/O. (It also had a mode in which it used libraw1394 + raw1394 also for video capture, but support for that mode was removed in kernel 2.6.23 and libraw1394 2.0. I don't know if it was removed from the libdc1394 v2 codebase too; could be.) On the newer kernel stack, libdc1394 uses /dev/fw* for everything, without assistance by libraw1394. libdc1394 v2.1.2 (06/2009) is recommended as the minimum version to use together with the newer kernel stack; v2.2.0 (03/2012) is current. I don't know if the libdc1394 version could have an impact on the kind of behavior here, nor am I familiar with how well the VT6315 works for IIDC applications. I only have a VT6306 and two low-end IIDC cameras; it's been a while that I tested the VT6306 with these cameras. However, VT6306 is known to behave differently from (buggier than) VT6307/08/15. -- Stefan Richter -=====-===-- -=-= -=--- http://arcgraph.de/sr/ |
From: Clemens L. <cl...@la...> - 2012-05-10 13:46:31
|
Lluís Batlle i Rossell wrote: > # lsfirewirephy > bus 0, node 1: 001163:306001 VIA Technologies VT630x Many thanks. As excpected, no changes from the VT6308. > On Thu, May 10, 2012 at 03:29:17PM +0200, Clemens Ladisch wrote: >> Lluís Batlle i Rossell wrote: >>> 19: 7 0 1 533 IO-APIC-fasteoi ehci_hcd:usb2, firewire_ohci >> >> Could you check whether this EHCI controller that shares the same >> interrupt works in this situation, i.e., if high-speed USB devices >> work on these ports? (Check whether there are USB ports where these >> interrupt counts increase when you plug something in.) > > Yes, the interrupt count increases, the usb works fine. So the problem is not with IRQ 19 itself but with the routing of the PCI interrupt line from the controller to IRQ19. Please show the output of "lspci -s 2:0 -vvv" after a cold boot and after a suspend/resume. Regards, Clemens |
From: Lluís B. i R. <vi...@vi...> - 2012-05-10 20:32:26
Attachments:
after-suspend.txt
before-suspend.txt
|
On Thu, May 10, 2012 at 03:50:15PM +0200, Clemens Ladisch wrote: > Lluís Batlle i Rossell wrote: > > Yes, the interrupt count increases, the usb works fine. > > So the problem is not with IRQ 19 itself but with the routing of the PCI > interrupt line from the controller to IRQ19. Ah ok. I'm glad someone understands a bit how that works. It's not my case. I only know until the 8259, two cascaded at most. :) > Please show the output of "lspci -s 2:0 -vvv" after a cold boot and > after a suspend/resume. Here they are. Only 'RollOver' has a difference, watching on vimdiff. Not that I know what is it about. Just to avoid confusion, the firewire hangs either if it just rebooted or it came back from suspend. No difference. Thank you, Lluís. |
From: Lluís B. i R. <vi...@vi...> - 2012-05-08 15:11:24
|
On Tue, May 08, 2012 at 04:27:20PM +0200, Clemens Ladisch wrote: > Lluís Batlle i Rossell wrote: > > On Tue, May 08, 2012 at 02:00:15PM +0200, Clemens Ladisch wrote: > >> Lluís Batlle i Rossell wrote: > >>> here we have two computers, both with VT6315, running all x86_64, one is an > >>> intel i7 and the other an amd Phenom II. > >>> > >>> The I7 runs 2.6.35 and dc1394 cameras work perfectly in it. > >>> The Phenom II runs 3.3.4 and dc1394 cameras work for a short seconds, and hang. > >>> Then the process hangs too. Killing the process leaves it <defunct> (unrelated > >>> to the parent accepting it or not). > >> > >> Apparently, the driver is waiting for the hardware to do something. > >> > >> When this happens, are there any messages in the system log (see the output of > >> dmesg)? > > > > Nothing there. I only get some lines if I disconnect the camera, but the process > > remains <defunct>. > The FireWire stack was completely rewritten. Check that your libraw1394 package > is recent enough (current is 2.0.8, but 2.0.6 or newer should work). In any > case, this library could not cause the kernel to lock up. I'm using libraw1394 2.0.7. I can try 2.0.8. I will later. The program just blocked (in 3.3.4), and I attach strace: read(13, I look at the fd 13 in /proc: 13 -> /dev/fw1 I use kill. SIGINT causes the read() call to restart: read(13, 0x7fff19de7760, 40) = ? ERESTARTSYS (To be restarted) --- {si_signo=SIGINT, si_code=SI_KERNEL, si_value={int=1174533440, ptr=0x7fbf4601f540}} (Interrupt) --- rt_sigaction(SIGINT, {0x7fe92d954390, [INT], SA_RESTORER|SA_RESTART, 0x7fe92b4e8b10}, {0x7fe92d954390, [INT], SA_RESTORER|SA_RESTART, 0x7fe92b4e8b10}, 8) = 0 rt_sigreturn(0) = 0 read(13, If now with strace attached I use kill -9, strace does not notice that. Ctrl-C to the strace, hangs: read(13, ^C <unfinished ...> cat /proc/5918/status: Name: cc1394 State: D (disk sleep) .... If I hadn't had strace attached, the process would appear defunct - as I tried other times. Unplugging the fw camera does not show any message in dmesg. Now issuing the sysreq w shows one task: [17613.274738] SysRq : Show Blocked State [17613.274749] task PC stack pid father [17613.274790] cc1394 D ffffffff81407520 0 5918 5880 0x00000005 [17613.274801] ffff880032101b68 0000000000000046 ffff880032101b08 ffff880000000000 [17613.274810] ffff8801285ea940 ffff880032101fd8 ffff880032101fd8 ffff880032101fd8 [17613.274819] ffff880129ace040 ffff8801285ea940 ffff880032101b78 ffff8800364a2c00 [17613.274828] Call Trace: [17613.274843] [<ffffffff813c8baf>] schedule+0x3f/0x60 [17613.274882] [<ffffffffa0106fcd>] fw_device_op_release+0x18d/0x250 [firewire_core] [17613.274894] [<ffffffff8106b7c0>] ? add_wait_queue+0x60/0x60 [17613.274904] [<ffffffff8114c6ba>] fput+0xea/0x220 [17613.274912] [<ffffffff81149466>] filp_close+0x66/0x90 [17613.274921] [<ffffffff8104cc88>] put_files_struct+0x88/0xf0 [17613.274930] [<ffffffff8104cd9a>] exit_files+0x4a/0x60 [17613.274938] [<ffffffff8104d277>] do_exit+0x197/0x870 [17613.274946] [<ffffffff8104dcb4>] do_group_exit+0x44/0xa0 [17613.274955] [<ffffffff8105d7c5>] get_signal_to_deliver+0x215/0x5e0 [17613.274964] [<ffffffff810131b5>] do_signal+0x65/0x710 [17613.274972] [<ffffffff8104dfbb>] ? sys_wait4+0xab/0xf0 [17613.274979] [<ffffffff8106b7c0>] ? add_wait_queue+0x60/0x60 [17613.274987] [<ffffffff810138e5>] do_notify_resume+0x65/0x80 [17613.274996] [<ffffffff813ca7a2>] int_signal+0x12/0x17 Does it tell enough to you? Any other test? Regards, Lluís. |
From: Clemens L. <cl...@la...> - 2012-05-08 15:26:41
|
Lluís Batlle i Rossell wrote: > SysRq : Show Blocked State > task PC stack pid father > cc1394 D ffffffff81407520 0 5918 5880 0x00000005 > Call Trace: > [<ffffffff813c8baf>] schedule+0x3f/0x60 > [<ffffffffa0106fcd>] fw_device_op_release+0x18d/0x250 [firewire_core] Some command to the device did not get a response; the kernel is still waiting for one. In theory, there should be a two-second timeout; I don't know why is doesn't trigger. Please check whether a bus reset helps: download and install the jujuutils package from <http://code.google.com/p/jujuutils/>, and run: firewire-request /dev/fw1 reset (BTW: what is the output of lsfirewirephy?) Regards, Clemens |
From: Lluís B. i R. <vi...@vi...> - 2012-05-08 15:34:55
|
On Tue, May 08, 2012 at 05:30:34PM +0200, Clemens Ladisch wrote: > Lluís Batlle i Rossell wrote: > Please check whether a bus reset helps: download and install the jujuutils > package from <http://code.google.com/p/jujuutils/>, and run: > firewire-request /dev/fw1 reset It does not help... Nothing changes, neither to fw1 or fw0. > (BTW: what is the output of lsfirewirephy?) I only have 'lsfirewire': # lsfirewire -v device fw0: vendor ID: 0xd00d1e model ID: 0x000001 vendor: Linux Firewire model: Juju guid: 0x001e8c0001c08664 device fw1: vendor ID: 0x000a47 guid: 0x000a47010f0a468c units: 0x00a02d:0x000102 unit fw1.0: specifier ID: 0x00a02d version: 0x000102 |
From: Clemens L. <cl...@la...> - 2012-05-11 14:09:19
|
Lluís Batlle i Rossell wrote: > On Thu, May 10, 2012 at 03:50:15PM +0200, Clemens Ladisch wrote: >> Please show the output of "lspci -s 2:0 -vvv" after a cold boot and >> after a suspend/resume. > > Here they are. Only 'RollOver' has a difference, watching on vimdiff. Not that I > know what is it about. A "reply number rollover" happens when the device failed four times to send a PCIe packet. This is automatically handled by retraining the link and retrying again. This might be a consequence of the suspend. What I actually wanted to ask: Is there any difference in the lspci output before and during a hang? Regards, Clemens |
From: Lluís B. i R. <vi...@vi...> - 2012-05-11 15:55:59
|
On Fri, May 11, 2012 at 04:13:03PM +0200, Clemens Ladisch wrote: > Lluís Batlle i Rossell wrote: > > On Thu, May 10, 2012 at 03:50:15PM +0200, Clemens Ladisch wrote: > >> Please show the output of "lspci -s 2:0 -vvv" after a cold boot and > >> after a suspend/resume. > > > > Here they are. Only 'RollOver' has a difference, watching on vimdiff. Not that I > > know what is it about. > > A "reply number rollover" happens when the device failed four times to > send a PCIe packet. This is automatically handled by retraining the > link and retrying again. This might be a consequence of the suspend. > > What I actually wanted to ask: Is there any difference in the lspci > output before and during a hang? Ah, ok. Yes, very different - lspci fails, after hang. # lspci -s 2:0 -vvv 02:00.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6315 Series Firewire Controller (rev ff) (prog-if ff) !!! Unknown header type 7f Kernel driver in use: firewire_ohci Does it tell anything good? Thank you, Lluís. |
From: Lluís B. i R. <vi...@vi...> - 2012-05-08 16:29:00
|
On Tue, May 08, 2012 at 05:30:34PM +0200, Clemens Ladisch wrote: > Lluís Batlle i Rossell wrote: > > SysRq : Show Blocked State > > task PC stack pid father > > cc1394 D ffffffff81407520 0 5918 5880 0x00000005 > > Call Trace: > > [<ffffffff813c8baf>] schedule+0x3f/0x60 > > [<ffffffffa0106fcd>] fw_device_op_release+0x18d/0x250 [firewire_core] > (BTW: what is the output of lsfirewirephy?) I notice that, let's say, my /usr/include/linux is from 2.6.35, and I've all built with those headers. Is the public API of the kernel changed at 3.3.x, for firewire? Should I rebuild all the libdc/libraw code with newer headers? Or simply getting the newer firewire-cdev header files would be enough, building jujuutils with them? I'll try this later. Regards, Lluís. |
From: Clemens L. <cl...@la...> - 2012-05-09 06:17:37
|
Lluís Batlle i Rossell wrote: > I notice that, let's say, my /usr/include/linux is from 2.6.35, and I've all > built with those headers. Is the public API of the kernel changed at 3.3.x, for > firewire? There were some additions in 2.6.36, but none of them affect your applications. > # lsfirewirephy > timeout > timeout (This should work directly after booting, before any hang.) All these symptoms indicate that the FireWire interrupt just isn't reported anymore. > I updated the i7 machine to 3.3.5 too, and there it does not hang. Same chipset, > same kernel, but in the phenomII it hangs (any of the two cameras we have, > colour and mono), and in the i7 not. So this is probably not related with the VT6315 itself but with the system's interrupt routing. Sometimes, this is caused by ACPI problems; try updating the BIOS, if possible. What is the entire output of dmesg immediately after booting? What are the outputs of "cat /proc/interrupt" and "dmesg | tail" when it hangs? Regards, Clemens |
From: Clemens L. <cl...@la...> - 2012-05-11 17:47:36
|
Lluís Batlle i Rossell wrote: > lspci fails, after hang. > # lspci -s 2:0 -vvv > 02:00.0 ... (rev ff) (prog-if ff) All reads fail (I guess -x instead of -vvv will show all ff's). This means the chip doesn't react at all, as if it were unplugged. This looks like a plain hardware error. Regards, Clemens |
From: Lluís B. i R. <vi...@vi...> - 2012-05-11 17:48:52
|
On Fri, May 11, 2012 at 07:47:12PM +0200, Clemens Ladisch wrote: > Lluís Batlle i Rossell wrote: > > lspci fails, after hang. > > # lspci -s 2:0 -vvv > > 02:00.0 ... (rev ff) (prog-if ff) > > All reads fail (I guess -x instead of -vvv will show all ff's). > This means the chip doesn't react at all, as if it were unplugged. > > This looks like a plain hardware error. Oh. Bad luck. This motherboard also hangs from time to time. Very annoying. Thank you for all your time! |
From: Lluís B. i R. <vi...@vi...> - 2012-05-08 16:31:51
|
On Tue, May 08, 2012 at 06:28:50PM +0200, Lluís Batlle i Rossell wrote: > On Tue, May 08, 2012 at 05:30:34PM +0200, Clemens Ladisch wrote: > > Lluís Batlle i Rossell wrote: > > > SysRq : Show Blocked State > > > task PC stack pid father > > > cc1394 D ffffffff81407520 0 5918 5880 0x00000005 > > > Call Trace: > > > [<ffffffff813c8baf>] schedule+0x3f/0x60 > > > [<ffffffffa0106fcd>] fw_device_op_release+0x18d/0x250 [firewire_core] > > (BTW: what is the output of lsfirewirephy?) > > I notice that, let's say, my /usr/include/linux is from 2.6.35, and I've all > built with those headers. Is the public API of the kernel changed at 3.3.x, for > firewire? Should I rebuild all the libdc/libraw code with newer headers? > > Or simply getting the newer firewire-cdev header files would be enough, building > jujuutils with them? > > I'll try this later. Hm this was fast. I have the jujutils built: # lsfirewirephy timeout timeout Does this mean much? Regards, Lluís. |
From: Lluís B. i R. <vi...@vi...> - 2012-05-08 16:37:17
|
On Tue, May 08, 2012 at 06:31:36PM +0200, Lluís Batlle i Rossell wrote: > On Tue, May 08, 2012 at 06:28:50PM +0200, Lluís Batlle i Rossell wrote: > > On Tue, May 08, 2012 at 05:30:34PM +0200, Clemens Ladisch wrote: > > > Lluís Batlle i Rossell wrote: > > > > SysRq : Show Blocked State > > > > task PC stack pid father > > > > cc1394 D ffffffff81407520 0 5918 5880 0x00000005 > > > > Call Trace: > > > > [<ffffffff813c8baf>] schedule+0x3f/0x60 > > > > [<ffffffffa0106fcd>] fw_device_op_release+0x18d/0x250 [firewire_core] Sorry, but before I told you the phenomII was running 3.3.4, but it is running 3.3.5. I updated the i7 machine to 3.3.5 too, and there it does not hang. Same chipset, same kernel, but in the phenomII it hangs (any of the two cameras we have, colour and mono), and in the i7 not. Annoying... Any idea from this confusion? :) |
From: Stefan R. <st...@s5...> - 2012-05-08 18:28:59
|
On May 08 Lluís Batlle i Rossell wrote: > On Tue, May 08, 2012 at 06:31:36PM +0200, Lluís Batlle i Rossell wrote: > > On Tue, May 08, 2012 at 06:28:50PM +0200, Lluís Batlle i Rossell wrote: > > > On Tue, May 08, 2012 at 05:30:34PM +0200, Clemens Ladisch wrote: > > > > Lluís Batlle i Rossell wrote: > > > > > SysRq : Show Blocked State > > > > > task PC stack pid father > > > > > cc1394 D ffffffff81407520 0 5918 5880 0x00000005 > > > > > Call Trace: > > > > > [<ffffffff813c8baf>] schedule+0x3f/0x60 > > > > > [<ffffffffa0106fcd>] fw_device_op_release+0x18d/0x250 [firewire_core] One potential cause for never timing out would be if the controller never gives us an AT-req IRQ ('asynchronous request transmitted' interrupt). That would be a chip bug against which our drivers are currently not protected. JMicron JMB38x is known to behave that way; I have particularly observed it with Coriander (IIDC capture application). However, I have not yet heard of VIA VT6315 behaving this way. (The old driver stack would time out even after missing AT-req IRQ, as would earlier versions of the newer driver stack --- at the risk of races with transaction completion though.) I'm not saying that this is what happens, just suggesting it as a possibility. After # echo 2 > /sys/module/firewire_ohci/parameters/debug the driver will log all AT and AR events that do happen; alas it won't reveal a case when upper layers inserted a request but the controller did not raise AT-req. To detect that, we would have to modify the driver to log request insertion too. > Sorry, but before I told you the phenomII was running 3.3.4, but it is running > 3.3.5. > > I updated the i7 machine to 3.3.5 too, and there it does not hang. Same chipset, > same kernel, but in the phenomII it hangs (any of the two cameras we have, > colour and mono), and in the i7 not. > > Annoying... Any idea from this confusion? :) That's curious. Maybe these are two different chip revisions? lspci would possibly still show them with same revision number; a look at the actual chip may or may not tell more. Are these PCIe cards which you could swap between the two PCs? -- Stefan Richter -=====-===-- -=-= -=--- http://arcgraph.de/sr/ |
From: Stefan R. <st...@s5...> - 2012-05-08 19:20:30
|
On May 08 Stefan Richter wrote: > One potential cause for never timing out would be if the controller never > gives us an AT-req IRQ ('asynchronous request transmitted' interrupt). > That would be a chip bug against which our drivers are currently not > protected. JMicron JMB38x is known to behave that way; I have particularly > observed it with Coriander (IIDC capture application). However, I have not > yet heard of VIA VT6315 behaving this way. > > (The old driver stack would time out even after missing AT-req IRQ, as > would earlier versions of the newer driver stack --- at the risk of races > with transaction completion though.) > > I'm not saying that this is what happens, just suggesting it as a > possibility. PS, I forgot: A bus reset would flush such stuck requests out (hence Clemens' suggestion to test a bus reset...). But that only works if the controller's self-ID-complete DMA and interrupts still work. Alas, if the controller stopped issuing AT-req IRQs due to whatever fault, chances are limited that it would still issue self-ID-complete IRQs. -- Stefan Richter -=====-===-- -=-= -=--- http://arcgraph.de/sr/ |
From: Lluís B. i R. <vi...@vi...> - 2012-05-09 21:28:38
Attachments:
comanegra.txt.gz
comanegra-interrupts.txt
|
On Wed, May 09, 2012 at 08:21:29AM +0200, Clemens Ladisch wrote: > Lluís Batlle i Rossell wrote: > > # lsfirewirephy > > timeout > > timeout > > (This should work directly after booting, before any hang.) Yes, they work fine. > All these symptoms indicate that the FireWire interrupt just isn't reported > anymore. Ok. > > I updated the i7 machine to 3.3.5 too, and there it does not hang. Same chipset, > > same kernel, but in the phenomII it hangs (any of the two cameras we have, > > colour and mono), and in the i7 not. > > So this is probably not related with the VT6315 itself but with the system's > interrupt routing. Sometimes, this is caused by ACPI problems; try updating > the BIOS, if possible. I run the latest BIOS available two weeks ago. I wanted to get k8-powernow, and hence updated the bios. But I really did not notice any change other than the 'nicer' bios configuration and boot screens. I also run the latest microcode in the cpu. :) > What is the entire output of dmesg immediately after booting? What are the > outputs of "cat /proc/interrupt" and "dmesg | tail" when it hangs? I attach the dmesg, but it does not have the firewire camera connected now. 'dmesg | tail' when it hangs does not have any additional line. I attach the full dmesg (after a pm-suspend and awake), and the /proc/interrupts. Now gzipped. Regards, Lluís. |