Thread: diag
Brought to you by:
aeb,
bencollins
From: Carl K. <ca...@pe...> - 2012-10-22 15:40:47
|
A friend is helping me by coding up a diagnostic tool. It is meant to test firewire hardware. We don't have a baseline to know the quality of the hardware we are testing, so we are not sure if the results are due to faulty hardware or faulty code or we just don't know how to interpret the results. Could someone help review this? https://gitorious.org/~eviljoel/cfk_misc/fwdiag-cleanup/ 1 box with 2 controllers, cable between the 2, here is what the results look like: ************************************ (veyepar)juser@pc9d:~/fwdiag-cleanup$ sudo ./async-loop-ck 0, node_id: 65473, local_node_id: 65473, devices[ 0 ]->receiveDescriptor: /dev/fw0 1, node_id: 65472, local_node_id: 65472, devices[ 1 ]->receiveDescriptor: /dev/fw1 2, node_id: 65472, local_node_id: 65473, devices[ 0 ]->sendDescriptor: /dev/fw2 3, node_id: 65473, local_node_id: 65472, devices[ 1 ]->sendDescriptor: /dev/fw3 Main process: starting receiver Main: receiver started, pid: 11107; waiting 4 seconds... Receiver child process 11107 starting... Main process: starting sender Main: sender started, pid: 11108 Main process, waiting for child processes to finish... Sender child process 11108 starting... Received 524289389 bytes = 500.00 MB in 511745 packets in 21 seconds ...receiver finished Sent 524288119 bytes = 500.00 MB in 511753 packets in 21 seconds ...sender finished Main: processes finished Main process: starting receiver Main: receiver started, pid: 11109; waiting 4 seconds... Receiver child process 11109 starting... unexpected packet: 0x7cf2c unexpected packet: 0x7cf2d unexpected packet: 0x7cf2e unexpected packet: 0x7cf2f unexpected packet: 0x7cf30 unexpected packet: 0x7cf31 unexpected packet: 0x7cf32 unexpected packet: 0x7cf33 unexpected packet: 0x7cf34 unexpected packet: 0x7cf35 unexpected packet: 0x7cf36 unexpected packet: 0x7cf37 unexpected packet: 0x7cf38 unexpected packet: 0x7cf39 Main process: starting sender Main: sender started, pid: 11110 Main process, waiting for child processes to finish... Sender child process 11110 starting... Received 524289257 bytes = 500.00 MB in 511746 packets in 32 seconds ...receiver finished Sent 524288119 bytes = 500.00 MB in 511753 packets in 28 seconds ...sender finished Main: processes finished ************************************ And sometimes it gets stuck: ************************************ (veyepar)juser@pc9d:~/fwdiag-cleanup$ sudo ./async-loop-ck 0, node_id: 65472, local_node_id: 65472, devices[ 0 ]->receiveDescriptor: /dev/fw0 1, node_id: 65473, local_node_id: 65473, devices[ 1 ]->receiveDescriptor: /dev/fw1 2, node_id: 65472, local_node_id: 65473, devices[ 1 ]->sendDescriptor: /dev/fw2 3, node_id: 65473, local_node_id: 65472, devices[ 0 ]->sendDescriptor: /dev/fw3 Main process: starting receiver Main: receiver started, pid: 11516; waiting 4 seconds... Receiver child process 11516 starting... Main process: starting sender Main: sender started, pid: 11517 Main process, waiting for child processes to finish... Sender child process 11517 starting... send error: 0x10 ************************************ -- Carl K |
From: eviljoel <evi...@li...> - 2012-10-23 04:19:09
|
Hey All, I'm the guy currently working on this code. I can't really say I'm the author because it is based on some code that another guy wrote who based it off of a udev test (or something like that). But I'm also here to answer any questions you might have for me. Thanks, eviljoel On Mon, Oct 22, 2012 at 10:40 AM, Carl Karsten <ca...@pe...> wrote: > A friend is helping me by coding up a diagnostic tool. It is meant to > test firewire hardware. We don't have a baseline to know the quality > of the hardware we are testing, so we are not sure if the results are > due to faulty hardware or faulty code or we just don't know how to > interpret the results. > > Could someone help review this? > > https://gitorious.org/~eviljoel/cfk_misc/fwdiag-cleanup/ > > 1 box with 2 controllers, cable between the 2, here is what the > results look like: > > ************************************ > (veyepar)juser@pc9d:~/fwdiag-cleanup$ sudo ./async-loop-ck > 0, node_id: 65473, local_node_id: 65473, devices[ 0 > ]->receiveDescriptor: /dev/fw0 > 1, node_id: 65472, local_node_id: 65472, devices[ 1 > ]->receiveDescriptor: /dev/fw1 > 2, node_id: 65472, local_node_id: 65473, devices[ 0 ]->sendDescriptor: /dev/fw2 > 3, node_id: 65473, local_node_id: 65472, devices[ 1 ]->sendDescriptor: /dev/fw3 > Main process: starting receiver > Main: receiver started, pid: 11107; waiting 4 seconds... > Receiver child process 11107 starting... > Main process: starting sender > Main: sender started, pid: 11108 > Main process, waiting for child processes to finish... > Sender child process 11108 starting... > > Received 524289389 bytes = 500.00 MB in 511745 packets in 21 seconds > ...receiver finished > > Sent 524288119 bytes = 500.00 MB in 511753 packets in 21 seconds > ...sender finished > Main: processes finished > > Main process: starting receiver > Main: receiver started, pid: 11109; waiting 4 seconds... > Receiver child process 11109 starting... > unexpected packet: 0x7cf2c > unexpected packet: 0x7cf2d > unexpected packet: 0x7cf2e > unexpected packet: 0x7cf2f > unexpected packet: 0x7cf30 > unexpected packet: 0x7cf31 > unexpected packet: 0x7cf32 > unexpected packet: 0x7cf33 > unexpected packet: 0x7cf34 > unexpected packet: 0x7cf35 > unexpected packet: 0x7cf36 > unexpected packet: 0x7cf37 > unexpected packet: 0x7cf38 > unexpected packet: 0x7cf39 > Main process: starting sender > Main: sender started, pid: 11110 > Main process, waiting for child processes to finish... > Sender child process 11110 starting... > > Received 524289257 bytes = 500.00 MB in 511746 packets in 32 seconds > ...receiver finished > > Sent 524288119 bytes = 500.00 MB in 511753 packets in 28 seconds > ...sender finished > Main: processes finished > ************************************ > > And sometimes it gets stuck: > > ************************************ > (veyepar)juser@pc9d:~/fwdiag-cleanup$ sudo ./async-loop-ck > 0, node_id: 65472, local_node_id: 65472, devices[ 0 > ]->receiveDescriptor: /dev/fw0 > 1, node_id: 65473, local_node_id: 65473, devices[ 1 > ]->receiveDescriptor: /dev/fw1 > 2, node_id: 65472, local_node_id: 65473, devices[ 1 ]->sendDescriptor: /dev/fw2 > 3, node_id: 65473, local_node_id: 65472, devices[ 0 ]->sendDescriptor: /dev/fw3 > Main process: starting receiver > Main: receiver started, pid: 11516; waiting 4 seconds... > Receiver child process 11516 starting... > Main process: starting sender > Main: sender started, pid: 11517 > Main process, waiting for child processes to finish... > Sender child process 11517 starting... > send error: 0x10 > ************************************ > > -- > Carl K > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > mailing list Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linux1394-user |
From: Clemens L. <cl...@la...> - 2012-10-23 11:45:56
|
Carl Karsten wrote: > A friend is helping me by coding up a diagnostic tool. It is meant to > test firewire hardware. We don't have a baseline to know the quality > of the hardware we are testing, so we are not sure if the results are > due to faulty hardware or faulty code or we just don't know how to > interpret the results. > > Main process: starting receiver > Main: receiver started, pid: 11109; waiting 4 seconds... > Receiver child process 11109 starting... > unexpected packet: 0x7cf2c > unexpected packet: 0x7cf2d > unexpected packet: 0x7cf2e > unexpected packet: 0x7cf2f > unexpected packet: 0x7cf30 > unexpected packet: 0x7cf31 > unexpected packet: 0x7cf32 > unexpected packet: 0x7cf33 > unexpected packet: 0x7cf34 > unexpected packet: 0x7cf35 > unexpected packet: 0x7cf36 > unexpected packet: 0x7cf37 > unexpected packet: 0x7cf38 > unexpected packet: 0x7cf39 > Main process: starting sender This looks as if there are some leftover packets from the last run. In theory, this should not be possible because received packets that are not handled by anybody get thrown away immediately. > And sometimes it gets stuck: > > Sender child process 11517 starting... > send error: 0x10 0x10 is RCODE_SEND_ERROR, and usually means that the sending queue is full. However, the test program takes care to queue less packets than that, so in this case it might indicate that the controller thinks it still has some left-over packets (which might be the same problem as the previous one). It's possible to get such errors when multiple senders are run simultaneously, but assuming that this is not the case, this looks rather like a hardware error. Please name names (i.e., the chip). Regards, Clemens |
From: Carl K. <ca...@pe...> - 2012-10-23 17:24:16
|
On Tue, Oct 23, 2012 at 6:28 AM, Clemens Ladisch <cl...@la...> wrote: > Carl Karsten wrote: >> A friend is helping me by coding up a diagnostic tool. It is meant to >> test firewire hardware. We don't have a baseline to know the quality >> of the hardware we are testing, so we are not sure if the results are >> due to faulty hardware or faulty code or we just don't know how to >> interpret the results. >> >> Main process: starting receiver >> Main: receiver started, pid: 11109; waiting 4 seconds... >> Receiver child process 11109 starting... >> unexpected packet: 0x7cf2c >> unexpected packet: 0x7cf2d >> unexpected packet: 0x7cf2e >> unexpected packet: 0x7cf2f >> unexpected packet: 0x7cf30 >> unexpected packet: 0x7cf31 >> unexpected packet: 0x7cf32 >> unexpected packet: 0x7cf33 >> unexpected packet: 0x7cf34 >> unexpected packet: 0x7cf35 >> unexpected packet: 0x7cf36 >> unexpected packet: 0x7cf37 >> unexpected packet: 0x7cf38 >> unexpected packet: 0x7cf39 >> Main process: starting sender > > This looks as if there are some leftover packets from the last run. > In theory, this should not be possible because received packets that > are not handled by anybody get thrown away immediately. > >> And sometimes it gets stuck: >> >> Sender child process 11517 starting... >> send error: 0x10 > > 0x10 is RCODE_SEND_ERROR, and usually means that the sending queue is > full. However, the test program takes care to queue less packets than > that, so in this case it might indicate that the controller thinks it > still has some left-over packets (which might be the same problem as > the previous one). > > It's possible to get such errors when multiple senders are run > simultaneously, but assuming that this is not the case, this looks > rather like a hardware error. > > Please name names (i.e., the chip). > Hewlett-Packard F.06 HP EliteBook 8530w Linux pc9d 3.0.0-17-generic #30-Ubuntu SMP Thu Mar 8 20:45:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Description: Ubuntu 11.10 Onboard chip: /sys/bus/firewire/devices/fw0 0x5566778811223344 (yes. that really is the GUD, which I have found on quite a few HPs.) 86:09.0 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 06) pcie card in the slot /sys/bus/firewire/devices/fw1 0x01020304000003d8 05:00.0 FireWire (IEEE 1394): Texas Instruments XIO2200A IEEE-1394a-2000 Controller (PHY/Link) (rev 01) -- Carl K |
From: Carl K. <ca...@pe...> - 2012-10-24 02:01:01
|
model name : Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz cpu MHz : 3200.000 /sys/bus/firewire/devices/fw0 0x001e8c00004166f5 /sys/bus/firewire/devices/fw1 0x0011066600000566 /sys/bus/firewire/devices/fw2 0x001e8c00004166f5 /sys/bus/firewire/devices/fw3 0x0011066600000566 Linux cnt2 3.2.0-26-generic #41-Ubuntu SMP Thu Jun 14 17:49:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux System manufacturer System Version System Product Name Description: Ubuntu 12.04 LTS Codename: precise juser@cnt2:~$ lspci |grep 1394 04:00.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6315 Series Firewire Controller (rev 01) 08:00.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller (rev 46) juser@cnt2:~/fwdiag-cleanup$ ./async-loop-ck 0 firewire device nodes found. Four device nodes are required to operate. (Each physical firewire device should have two device nodes. One for sending, one for receiving.) juser@cnt2:~/fwdiag-cleanup$ sudo !! sudo ./async-loop-ck [sudo] password for juser: 0, node_id: 65473, local_node_id: 65473, devices[ 0 ]->receiveDescriptor: /dev/fw0 1, node_id: 65472, local_node_id: 65472, devices[ 1 ]->receiveDescriptor: /dev/fw1 2, node_id: 65473, local_node_id: 65472, devices[ 1 ]->sendDescriptor: /dev/fw2 3, node_id: 65472, local_node_id: 65473, devices[ 0 ]->sendDescriptor: /dev/fw3 Main process: starting receiver Main: receiver started, pid: 2195; waiting 4 seconds... Receiver child process 2195 starting... Main process: starting sender Main: sender started, pid: 2196 Main process, waiting for child processes to finish... Sender child process 2196 starting... after about a min.. stack dump on the console, but I still have an ssh session running. top line of console: unable to handle kernel NULL pointer dereference at 00000000000000690 (not sure how many 0's, and I hope I don't need to transcribe the whole screen..) -- Carl K |
From: Stefan R. <st...@s5...> - 2012-10-24 06:40:28
|
On Oct 23 Carl Karsten wrote: > Linux cnt2 3.2.0-26-generic #41-Ubuntu SMP Thu Jun 14 17:49:24 UTC > 2012 x86_64 x86_64 x86_64 GNU/Linux [...] > sudo ./async-loop-ck [...] > after about a min.. > > stack dump on the console, but I still have an ssh session running. > > top line of console: > unable to handle kernel NULL pointer dereference at 00000000000000690 > (not sure how many 0's, and I hope I don't need to transcribe the > whole screen..) If possible, take a digital photograph and upload it somewhere (e.g. bugzilla.kernel.org) or send it to me and I attach it at bugzilla. -- Stefan Richter -=====-===-- =-=- ==--- http://arcgraph.de/sr/ |
From: Stefan R. <st...@s5...> - 2012-10-24 22:02:18
|
> On Oct 23 Carl Karsten wrote: > > Linux cnt2 3.2.0-26-generic #41-Ubuntu SMP Thu Jun 14 17:49:24 UTC > > 2012 x86_64 x86_64 x86_64 GNU/Linux > [...] > > sudo ./async-loop-ck > [...] > > after about a min.. > > > > stack dump on the console, but I still have an ssh session running. > > > > top line of console: > > unable to handle kernel NULL pointer dereference at 00000000000000690 Carl sent screenshots. I uploaded them to: https://bugzilla.kernel.org/show_bug.cgi?id=49491 https://bugzilla.kernel.org/attachment.cgi?id=84731 https://bugzilla.kernel.org/attachment.cgi?id=84741 The call trace is: in page_waitqueue+06e/0x90 unlock_page+0x1d/0x40 filemap_fault+0x3ba/0x3e0 __do_fault+0x72/0x550 ? rb_insert_color+0x110/0x150 handle_pte_fault+0xfa0x200 ? cpumask_any_but+0x2d/0x40 handle_mm_fault+0x1f8/0x350 do_page_fault+0x150/0x520 ? mprotect_fixup+0x17d/0x2b0 ? sys_mprotect+0x1f0/0x250 page_fault+0x25/0x30 Carl, is this rare or frequent? -- Stefan Richter -=====-===-- =-=- ==--- http://arcgraph.de/sr/ |
From: Carl K. <ca...@pe...> - 2012-10-24 22:23:49
|
On Wed, Oct 24, 2012 at 5:02 PM, Stefan Richter <st...@s5...> wrote: >> On Oct 23 Carl Karsten wrote: >> > Linux cnt2 3.2.0-26-generic #41-Ubuntu SMP Thu Jun 14 17:49:24 UTC >> > 2012 x86_64 x86_64 x86_64 GNU/Linux >> [...] >> > sudo ./async-loop-ck >> [...] >> > after about a min.. >> > >> > stack dump on the console, but I still have an ssh session running. >> > >> > top line of console: >> > unable to handle kernel NULL pointer dereference at 00000000000000690 > > Carl sent screenshots. I uploaded them to: > https://bugzilla.kernel.org/show_bug.cgi?id=49491 > https://bugzilla.kernel.org/attachment.cgi?id=84731 > https://bugzilla.kernel.org/attachment.cgi?id=84741 > > The call trace is: > in page_waitqueue+06e/0x90 > unlock_page+0x1d/0x40 > filemap_fault+0x3ba/0x3e0 > __do_fault+0x72/0x550 > ? rb_insert_color+0x110/0x150 > handle_pte_fault+0xfa0x200 > ? cpumask_any_but+0x2d/0x40 > handle_mm_fault+0x1f8/0x350 > do_page_fault+0x150/0x520 > ? mprotect_fixup+0x17d/0x2b0 > ? sys_mprotect+0x1f0/0x250 > page_fault+0x25/0x30 > > Carl, is this rare or frequent? Can't say. It has happened once. But I have only run ./async-loop-ck maybe 10 times. sometimes it has 'hung' and I ^Ced it. -- Carl K |
From: eviljoel <evi...@li...> - 2012-10-25 02:37:20
|
I've ran it a lot and never saw it. - eviljoel On Wed, Oct 24, 2012 at 5:23 PM, Carl Karsten <ca...@pe...> wrote: > On Wed, Oct 24, 2012 at 5:02 PM, Stefan Richter > <st...@s5...> wrote: >>> On Oct 23 Carl Karsten wrote: >>> > Linux cnt2 3.2.0-26-generic #41-Ubuntu SMP Thu Jun 14 17:49:24 UTC >>> > 2012 x86_64 x86_64 x86_64 GNU/Linux >>> [...] >>> > sudo ./async-loop-ck >>> [...] >>> > after about a min.. >>> > >>> > stack dump on the console, but I still have an ssh session running. >>> > >>> > top line of console: >>> > unable to handle kernel NULL pointer dereference at 00000000000000690 >> >> Carl sent screenshots. I uploaded them to: >> https://bugzilla.kernel.org/show_bug.cgi?id=49491 >> https://bugzilla.kernel.org/attachment.cgi?id=84731 >> https://bugzilla.kernel.org/attachment.cgi?id=84741 >> >> The call trace is: >> in page_waitqueue+06e/0x90 >> unlock_page+0x1d/0x40 >> filemap_fault+0x3ba/0x3e0 >> __do_fault+0x72/0x550 >> ? rb_insert_color+0x110/0x150 >> handle_pte_fault+0xfa0x200 >> ? cpumask_any_but+0x2d/0x40 >> handle_mm_fault+0x1f8/0x350 >> do_page_fault+0x150/0x520 >> ? mprotect_fixup+0x17d/0x2b0 >> ? sys_mprotect+0x1f0/0x250 >> page_fault+0x25/0x30 >> >> Carl, is this rare or frequent? > > Can't say. It has happened once. But I have only run > ./async-loop-ck maybe 10 times. sometimes it has 'hung' and I ^Ced it. > > > -- > Carl K > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > mailing list Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linux1394-user |
From: Carl K. <ca...@pe...> - 2012-10-25 17:52:17
|
on a different box I just ran it in a loop for 12 hours. no crash. On Wed, Oct 24, 2012 at 9:37 PM, eviljoel <evi...@li...> wrote: > I've ran it a lot and never saw it. > > - eviljoel > > > On Wed, Oct 24, 2012 at 5:23 PM, Carl Karsten <ca...@pe...> wrote: >> On Wed, Oct 24, 2012 at 5:02 PM, Stefan Richter >> <st...@s5...> wrote: >>>> On Oct 23 Carl Karsten wrote: >>>> > Linux cnt2 3.2.0-26-generic #41-Ubuntu SMP Thu Jun 14 17:49:24 UTC >>>> > 2012 x86_64 x86_64 x86_64 GNU/Linux >>>> [...] >>>> > sudo ./async-loop-ck >>>> [...] >>>> > after about a min.. >>>> > >>>> > stack dump on the console, but I still have an ssh session running. >>>> > >>>> > top line of console: >>>> > unable to handle kernel NULL pointer dereference at 00000000000000690 >>> >>> Carl sent screenshots. I uploaded them to: >>> https://bugzilla.kernel.org/show_bug.cgi?id=49491 >>> https://bugzilla.kernel.org/attachment.cgi?id=84731 >>> https://bugzilla.kernel.org/attachment.cgi?id=84741 >>> >>> The call trace is: >>> in page_waitqueue+06e/0x90 >>> unlock_page+0x1d/0x40 >>> filemap_fault+0x3ba/0x3e0 >>> __do_fault+0x72/0x550 >>> ? rb_insert_color+0x110/0x150 >>> handle_pte_fault+0xfa0x200 >>> ? cpumask_any_but+0x2d/0x40 >>> handle_mm_fault+0x1f8/0x350 >>> do_page_fault+0x150/0x520 >>> ? mprotect_fixup+0x17d/0x2b0 >>> ? sys_mprotect+0x1f0/0x250 >>> page_fault+0x25/0x30 >>> >>> Carl, is this rare or frequent? >> >> Can't say. It has happened once. But I have only run >> ./async-loop-ck maybe 10 times. sometimes it has 'hung' and I ^Ced it. >> >> >> -- >> Carl K >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_sfd2d_oct >> _______________________________________________ >> mailing list Lin...@li... >> https://lists.sourceforge.net/lists/listinfo/linux1394-user > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > mailing list Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linux1394-user -- Carl K |