I'm having a hard time figuring out which sites to use to follow tipc work (bug reporting and patches).
I'll try here to see if this sounds familiar. I don't see any fragmentation issues reported here, and many google searches also gave no useful hits.
We get message corruption when there are 3+ fragments. In the case of 3, we think it's reassembled as 1/3/2. I think we might have seen 4 fragments as 1/4/2/3, but I need to improve our test. but maybe we are getting the final fragment inserted after frag1.
We believe the sender and client must be on different machines; in our case, two ATCA blades.
Underlying MTU is 1500. wireshark reports tipc fragment lengths of 1420 or less. The packets are OK in pcap on the receiver.
The first problem happens at 2817 payload bytes, i.e. 2 * 1408 + 1. In this case, fragment 3 had 1 payload byte.
I just did a quick test varying length from 1-32768. There were errors from 2817-3002, 4237-4422, 5657-5842, 7077-7262, 8497-8682... i.e. 2817 + 1420 * N, for 186 bytes. I haven't had time to get pcaps on this group and I want to improve the test utility.
Earlier, the closest thing I found was http://t154544.network-tipc-general.networkbuzz.info/tipc-fix-message-corruption-bug-fordeferred-packets-t154544.html .
I don't understand how we could get out-of-order delivery with our hardware, but tipc-config -ls did show some non-zero RX "defs".
I'm not certain where I should be looking for discussion, patches, etc. During recent websearches, I have also seen:
http://12.network-tipc-general.networkbuzz.info/
http://www.spinics.net/lists/netdev/
We are currently based on a Ubuntu 12.04 system, using their recently backported kernel 3.13. I think it came from their from 14.04 'trusty' development. Sorry, but I really don't understand all the kernel and driver versioning, but I can provide more details if you give instructions.
It sounds like Erik also met the same issue, so I forward it to him and
cc tipc-discussion mail list.
Regards,
Ying
On 08/15/2014 06:46 AM, Ned Kittlitz wrote:
Related
Tasks:
#118Hi, we normally use the mailinglist for all TIPC issues.
Your sf email bounced, so i'm pasting in the same update here.
In my case i got a similar problem to yours, with corrupt packets. I found out that this was caused by the missing range check when setting the port importance (which effectively modifies the msg_user field). The test program i used was actually faulty, and passed an uninitialized value to the setsockopt()/TIPC_IMPORTANCE call. The effect of this is that the port phdr was set to a random type, causing all kinds of errors on the receiving side. I sent in a patch just now to netdev stable for this, tipc-discussion cc'ed:
http://thread.gmane.org/gmane.network.tipc.general/7053
I'm updating the bug, because I'm not sure how to use the mailing list. I don't know why my sf email appeared to bounce for you. I've received email because of bug 117 and bug 118 updates.
I got some help from a contractor who had some time to investigate.
He said he extracted the change below from a larger patch that came from the tipc group. He also said that the patch didn't mention reassembly. It seems to have fixed our problem.
I'm sorry I can't give better information right now. I'm overwhelmed by not understanding so many things.
Thanks,
Ned
diff -uprN -X linux-3.13.orig/Documentation/dontdiff linux-3.13.orig/net/tipc/link.c linux-3.13-tipc-fixes/net/tipc/link.c
--- linux-3.13.orig/net/tipc/link.c 2014-08-15 05:46:35.000000000 +0530
+++ linux-3.13-tipc-fixes/net/tipc/link.c 2014-08-15 06:56:51.000000000 +0530
@@ -2386,6 +2386,8 @@ int tipc_link_recv_fragment(struct sk_bu
(tail)->next = frag;
tail = frag;
(head)->truesize += frag->truesize;
+ (head)->len += frag->len;
+ (head)->data_len += frag->len;
}
if (fragid == LAST_FRAGMENT) {
fbuf = *head;
Great to hear that it resolved the problem.
Should you have future questions, don't hesitate to drop a mail to
tipc-discussion@lists.sourceforge.net
2014-08-18 23:38 GMT+02:00 Ned Kittlitz nkittlitz@users.sf.net:
Related
Tasks:
#118