From: Ying X. <yin...@wi...> - 2014-10-23 06:20:48
|
On 10/22/2014 09:14 PM, Matthew Clark wrote: > Erik, > > The Zynq is based on a dual-core ARM Cortex-A9 with some FPGA fabric. > Both the ZC706 and Zedboard have one and the Zedboard runs TIPC fine, > albeit with an older kernel. > > Dmesg on Node4 (an ARM cortex A9) shows the following. This was the only > TIPC related messages it logged. > > [54305.337134] tipc: Activated (version 2.0.0) > [54305.342407] NET: Registered protocol family 30 > [54305.348292] tipc: Started in single node mode > [54305.374682] tipc: Started in network mode > [54305.378620] tipc: Own node address <1.1.4>, network identity 1227 > [54305.385623] tipc: Enabled bearer <eth:eth0>, discovery domain <1.1.0>, > priority 10 > > === > > Ying, > > I used tcpdump to capture the following packets. They kept repeating, so I > only captured one set. Also, I added another board, 1.1.2 to show Node3 > can establish a TPIC link. From what I can tell, neither Node2 or 3 > receives anything from Node4. It suggests that while Node4 thinks it's > sending packets, neither Node2 nor 3 receives anything. But where would I > even being to diagnose that? > > Matt > > === > > >>From Node2 (a Gumstix Overo (single core Cortex A8, runnign a 3.5 kernel) > > 13:12:38.396703 TIPC v2.0 1.1.2 > 1.1.3, headerlength 40 bytes, MessageSize > 56 bytes, Link State Protocol internal, messageType CONN_MSG (0x00000000) > Previous Node 1.1.2, Session No. 60777, Broadcast Ack 0, Sequence Gap 0, > Broadcast Gap After 2, Broadcast Gap To 32769, Last Sent Packet No. 0, > Next sent Packet No. 2, Transport Sequence 0, msg_count 0, Link Tolerance 0 > 0x0000: 4f40 0038 0000 0000 0002 8001 0100 1002 O@.8............ > 0x0010: 0000 0002 ed69 00a1 0100 1002 0100 1003 .....i.......... > 0x0020: 0000 0000 0000 0000 6574 6830 0000 0000 ........eth0.... > 0x0030: 0000 0000 0000 0000 ........ > 13:12:38.396855 TIPC v2.0 1.1.3 > 1.1.2, headerlength 40 bytes, MessageSize > 56 bytes, Link State Protocol internal, messageType CONN_MSG (0x00000000) > Previous Node 1.1.3, Session No. 62897, Broadcast Ack 0, Sequence Gap 0, > Broadcast Gap After 1, Broadcast Gap To 32770, Last Sent Packet No. 0, > Next sent Packet No. 3, Transport Sequence 0, msg_count 0, Link Tolerance 0 > 0x0000: 4f40 0038 0000 0000 0001 8002 0100 1003 O@.8............ > 0x0010: 0000 0003 f5b1 00a0 0100 1003 0100 1002 ................ > 0x0020: 0000 0000 0000 0000 6574 6830 0000 0000 ........eth0.... > 0x0030: 0000 0000 0000 0000 ........ > > ---- > >>From Node3 (the zedboard/zynq that's working, running a 3.8 kernel) > > 09:05:41.697681 TIPC v2.0 1.1.2 > 1.1.3, headerlength 40 bytes, MessageSize > 56 bytes, Link State Protocol internal, messageType CONN_MSG (0x00000000) > Previous Node 1.1.2, Session No. 60777, Broadcast Ack 0, Sequence Gap 0, > Broadcast Gap After 2, Broadcast Gap To 32769, Last Sent Packet No. 0, > Next sent Packet No. 2, Transport Sequence 0, msg_count 0, Link Tolerance 0 > 0x0000: 4f40 0038 0000 0000 0002 8001 0100 1002 O@.8............ > 0x0010: 0000 0002 ed69 00a0 0100 1002 0100 1003 .....i.......... > 0x0020: 0000 0000 0000 0000 6574 6830 0000 0000 ........eth0.... > 0x0030: 0000 0000 0000 0000 ........ > 09:05:42.280580 TIPC v2.0 1.1.2 > 1.1.3, headerlength 40 bytes, MessageSize > 56 bytes, Link State Protocol internal, messageType CONN_MSG (0x00000000) > Previous Node 1.1.2, Session No. 60777, Broadcast Ack 0, Sequence Gap 0, > Broadcast Gap After 2, Broadcast Gap To 32769, Last Sent Packet No. 0, > Next sent Packet No. 2, Transport Sequence 0, msg_count 0, Link Tolerance 0 > 0x0000: 4f40 0038 0000 0000 0002 8001 0100 1002 O@.8............ > 0x0010: 0000 0002 ed69 00a1 0100 1002 0100 1003 .....i.......... > 0x0020: 0000 0000 0000 0000 6574 6830 0000 0000 ........eth0.... > 0x0030: 0000 0000 0000 0000 ........ > > ---- > >>From Node4 (the zc706/zynq that's not working running a 3.14 kernel) > > 09:05:35.132403 TIPC v2.0 1.1.4 > 1.1.0, headerlength 40 bytes, MessageSize > 40 bytes, Neighbor Detection Protocol internal, messageType Link request > NodeSignature 4944, network_id 1227, media_id 1 > 0x0000: 5b50 0028 0000 1350 0100 1000 0100 1004 [P.(...P........ > 0x0010: 0000 04cb 0000 0001 0000 0000 0000 0000 ................ > 0x0020: 0000 0000 0000 0000 ........ > 09:05:35.282382 TIPC v2.0 1.1.4 > 1.1.3, headerlength 40 bytes, MessageSize > 56 bytes, Link State Protocol internal, messageType MCAST_MSG (0x20000000) > Previous Node 1.1.4, Session No. 4944, Broadcast Ack 0, Sequence Gap 0, > Broadcast Gap After 65535, Broadcast Gap To 32768, Last Sent Packet No. 0, > Next sent Packet No. 1, Transport Sequence 0, msg_count 375, Link Tolerance > 1500 > 0x0000: 4f40 0038 2000 0000 ffff 8000 0100 1004 O@.8............ > 0x0010: 0000 0001 1350 00a0 0100 1004 0100 1003 .....P.......... > 0x0020: 0000 0000 0177 05dc 6574 6830 0000 0000 .....w..eth0.... > 0x0030: 0000 0000 0000 0000 ........ > 09:05:35.452403 TIPC v2.0 1.1.4 > 1.1.2, headerlength 40 bytes, MessageSize > 56 bytes, Link State Protocol internal, messageType MCAST_MSG (0x20000000) > Previous Node 1.1.4, Session No. 4944, Broadcast Ack 0, Sequence Gap 0, > Broadcast Gap After 65535, Broadcast Gap To 32768, Last Sent Packet No. 0, > Next sent Packet No. 1, Transport Sequence 0, msg_count 375, Link Tolerance > 1500 > 0x0000: 4f40 0038 2000 0000 ffff 8000 0100 1004 O@.8............ > 0x0010: 0000 0001 1350 00a0 0100 1004 0100 1002 .....P.......... > 0x0020: 0000 0000 0177 05dc 6574 6830 0000 0000 .....w..eth0.... > 0x0030: 0000 0000 0000 0000 ........ > Your above captured message might be incomplete. For example, node4 at least ever received two discovery response messages from 1.1.2 and 1.1.3. Otherwise, it does not know 1.1.2 and 1.1.3 nodes exits in the cluster. The current only problem is why node4 cannot receive link state messages from its neighbours although it repeatedly sends link state messages to them. However, just regarding current info, it's hard for us to understand where the root cause is. So the first important thing needed to do should be to identify whether the issue is caused by hardware/Ethernet driver or TIPC software. Therefore, you should first try to degrade zc706's kernel to 3.8.0-xilinx, and check whether the issue exits or not. If it's impossible to degrade the whole kernel, you can consider to replace tipc module code of 3.14 with 3.8.0 version on zc706. The latter is pretty easier than the former. Of course, you can consider to back port tipc module code from 3.14 to 3.8, and verify whether issue happens or not. In all, above experiments can help us isolate where problem is. Regards, Ying > > > > > > > On Wed, Oct 22, 2014 at 3:45 AM, Erik Hugne <eri...@er...> wrote: > >> I've never worked with a zynq board before, but i do have a small >> beaglebone black >> cluster on my desk. >> From your wireshark trace it seems that 1.1.4 is receiving the ndisc >> request and >> is responding to this. 1.1.3 is then expected to reply with a LINK_RESET, >> followed >> by a LINK_ACTIVATE from 1.1.4. This reset/activate never happens. >> Maybe it would help if you can provide us with a .pcap file for the failed >> link >> setup? >> >> Also, do you have any suspicious logs from tipc in dmesg? >> >> //E >> >> On Tue, Oct 21, 2014 at 10:48:34AM -0400, Matthew Clark wrote: >>> Hi All, >>> >>> Did my info reveal anything? I'm trying to get some middleware built on >>> these zynq boards and TIPC is a dependency, so I really need it working. >>> Thanks. >>> >>> Matt >>> >>> >>> On Fri, Oct 17, 2014 at 12:54 PM, Matthew Clark <lin...@gm... >>> >>> wrote: >>> >>>> Hi, Jon, >>>> >>>> Here's how I configured the zc706's. And how I'm checking what's >> going on >>>> >>>> # ifconfig eth0 192.168.100.193 >>>> # modprobe tipc >>>> # tipc-config -netid=1227 -addr=1.1.3 -be=eth:eth0/1.1.0 >>>> >>>> --- >>>> >>>> # ifconfig eth0 192.168.100.194 >>>> # modprobe tipc >>>> # tipc-config -netid=1227 -addr=1.1.4 -be=eth:eth0/1.1.0 >>>> >>>> # tipc-config -n >>>> Neighbors: >>>> <1.1.3>: down >>>> >>>> # tcpdump not port 22 >>>> tcpdump: verbose output suppressed, use -v or -vv for full protocol >> decode >>>> listening on eth0, link-type EN10MB (Ethernet), capture size 65535 >> bytes >>>> 16:52:45.644933 TIPC v2.0 1.1.4 > 1.1.3, headerlength 40 bytes, >>>> MessageSize 56 bytes, Link State Protocol internal, >>>> messageType MCAST_MSG (0x20000000) >>>> 16:52:45.734227 TIPC v2.0 1.1.3 > 1.1.0, headerlength 40 bytes, >>>> MessageSize 40 bytes, Neighbor Detection Protocol internal, >>>> messageType Link request >>>> 16:52:45.734286 TIPC v2.0 1.1.4 > 1.1.3, headerlength 40 bytes, >>>> MessageSize 40 bytes, Neighbor Detection Protocol internal, >>>> messageType Link response >>>> 16:52:45.864952 TIPC v2.0 1.1.4 > 1.1.0, headerlength 40 bytes, >>>> MessageSize 40 bytes, Neighbor Detection Protocol internal, >>>> messageType Link request >>>> >>>> # tipc-config -ls >>>> Link statistics: >>>> Link <broadcast-link> >>>> Window:20 packets >>>> RX packets:0 fragments:0/0 bundles:0/0 >>>> TX packets:0 fragments:0/0 bundles:0/0 >>>> RX naks:0 defs:0 dups:0 >>>> TX naks:0 acks:0 dups:0 >>>> Congestion link:0 Send queue max:0 avg:0 >>>> >>>> Link <1.1.4:eth0-1.1.3:unknown> >>>> DEFUNCT MTU:1500 Priority:10 Tolerance:1500 ms Window:50 packets >>>> RX packets:0 fragments:0/0 bundles:0/0 >>>> TX packets:0 fragments:0/0 bundles:0/0 >>>> TX profile sample:0 packets average:0 octets >>>> 0-64:0% -256:0% -1024:0% -4096:0% -16384:0% -32768:0% -66000:0% >>>> RX states:0 probes:0 naks:0 defs:0 dups:0 >>>> TX states:0 probes:0 naks:0 acks:0 dups:0 >>>> Congestion link:0 Send queue max:0 avg:0 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Oct 17, 2014 at 11:16 AM, Jon Maloy <jon...@er...> >>>> wrote: >>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Matthew Clark [mailto:lin...@gm...] >>>>>> Sent: October-17-14 10:31 AM >>>>>> To: Ying Xue >>>>>> Cc: tip...@li... >>>>>> Subject: Re: [tipc-discussion] TIPC 2.0.0 packets not being >> transmitted >>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Oct 16, 2014 at 9:41 PM, Ying Xue <yin...@wi... >>> >>>>>> wrote: >>>>>>>> I'm trying to get a TIPC cluster using a variety of ARM based >>>>>>>> processors, but I'm having issues with some zc706 Zynq boards >>>>>>>> running a Yocto-built kernel 3.14.2. Some see its neighbors >> perfect >>>>>>>> well, but the zc706 boards can't be seen by anyone and think >>>>>>>> everyone else is down. I ran wireshark and from what I can >> tell, the >>>>>>>> ZC706 boards simply aren't broadcasting any packets. I see TIPC >>>>>>>> packets flying around from the overos and zedboard, >>>>>>> but >>>>>>>> nothing from the zynqs. >>>>>>>> >>>>>>>> Can anyone help me debug this? I'm at a bit of a loss to >> explain the >>>>>>>> behavior. Thanks! >>>>>>>> >>>>>>> It sounds like a new bug. >>>>>>> I saw the two important changes were recently made on tipc >> neighbour >>>>>>> discovery protocol: >>>>>>> c82910e2a8d6fc9dd321a1f30dd4e89fb779cfe1 (tipc: clean up neigbor >>>>>>> discovery message reception) >>>>>>> 38504c28a201a80d12a6a0f821fecb331cb1f223 (tipc: improve and extend >>>>>>> media address conversion functions) Can you please confirm whether >>>>>>> above two patches are merged into 3.14.2 tree? >>>>>>> If yes, please try to revert them and verify again. >>>>>>> Regards, >>>>>>> Ying >>>>>>> >>>>>> >>>>>> Hi, Ying, >>>>>> >>>>>> I'm building from the latest meta-xilinx layer for Yocto, which >> calls >>>>> in commit >>>>>> >>>>>> https://github.com/Xilinx/linux- >>>>>> xlnx/commit/2b48a8aeea7367359f9eebe55c4a09a05227f32b >>>>>> >>>>>> This was committed back in April 25, where the commits you reference >>>>>> above are dated mid-May. I checked my discover.c and the changes >> the >>>>>> commits above reference are not included. Should I try them? >>>>> >>>>> Those commits were not intended to fix any problems; just to make >>>>> the code and algorithm more comprehensible. I doubt it will make any >>>>> difference. >>>>> >>>>> When you have configured the zc706, can you confirm that it actually >>>>> has the correct address, and that the correct interface(s) is enabled? >>>>> If so, and everything is correct, you may have to instrument the code >>>>> and look at what is happening in the discovery logics (discover.c) >>>>> We can help you out with the latter, if you are positive that the >>>>> configuraton >>>>> is ok, and the discovery broadcast actually don't go to the wrong >>>>> interface. >>>>> >>>>> Regards >>>>> ///jon >>>>> >>>>>> >>>>>> Matt >>>>>> >>>>> >> ------------------------------------------------------------------------------ >>>>>> Comprehensive Server Monitoring with Site24x7. >>>>>> Monitor 10 servers for $9/Month. >>>>>> Get alerted through email, SMS, voice calls or mobile push >>>>> notifications. >>>>>> Take corrective actions from your mobile device. >>>>>> http://p.sf.net/sfu/Zoho >>>>>> _______________________________________________ >>>>>> tipc-discussion mailing list >>>>>> tip...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/tipc-discussion >>>>> >>>> >>>> >>> >> ------------------------------------------------------------------------------ >>> Comprehensive Server Monitoring with Site24x7. >>> Monitor 10 servers for $9/Month. >>> Get alerted through email, SMS, voice calls or mobile push notifications. >>> Take corrective actions from your mobile device. >>> http://p.sf.net/sfu/Zoho >>> _______________________________________________ >>> tipc-discussion mailing list >>> tip...@li... >>> https://lists.sourceforge.net/lists/listinfo/tipc-discussion >> > ------------------------------------------------------------------------------ > Comprehensive Server Monitoring with Site24x7. > Monitor 10 servers for $9/Month. > Get alerted through email, SMS, voice calls or mobile push notifications. > Take corrective actions from your mobile device. > http://p.sf.net/sfu/Zoho > _______________________________________________ > tipc-discussion mailing list > tip...@li... > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > |