TIPC Cluster Domain Sockets / Tasks / #118 fragment reassembly error?

Ying Xue - 2014-08-15

It sounds like Erik also met the same issue, so I forward it to him and
cc tipc-discussion mail list.

Regards,
Ying

On 08/15/2014 06:46 AM, Ned Kittlitz wrote:

[bugs:#118] http://sourceforge.net/p/tipc/bugs/118 fragment
reassembly error?

Status: open
Group:
Created: Thu Aug 14, 2014 10:46 PM UTC by Ned Kittlitz
Last Updated: Thu Aug 14, 2014 10:46 PM UTC
Owner: nobody

I'm having a hard time figuring out which sites to use to follow tipc
work (bug reporting and patches).

I'll try here to see if this sounds familiar. I don't see any
fragmentation issues reported here, and many google searches also gave
no useful hits.

We get message corruption when there are 3+ fragments. In the case of 3,
we think it's reassembled as 1/3/2. I think we might have seen 4
fragments as 1/4/2/3, but I need to improve our test. but maybe we are
getting the final fragment inserted after frag1.

We believe the sender and client must be on different machines; in our
case, two ATCA blades.

Underlying MTU is 1500. wireshark reports tipc fragment lengths of 1420
or less. The packets are OK in pcap on the receiver.

The first problem happens at 2817 payload bytes, i.e. 2 * 1408 + 1. In
this case, fragment 3 had 1 payload byte.

I just did a quick test varying length from 1-32768. There were errors
from 2817-3002, 4237-4422, 5657-5842, 7077-7262, 8497-8682... i.e. 2817
+ 1420 * N, for 186 bytes. I haven't had time to get pcaps on this group
and I want to improve the test utility.

Earlier, the closest thing I found was
http://t154544.network-tipc-general.networkbuzz.info/tipc-fix-message-corruption-bug-fordeferred-packets-t154544.html
.
I don't understand how we could get out-of-order delivery with our
hardware, but tipc-config -ls did show some non-zero RX "defs".

I'm not certain where I should be looking for discussion, patches, etc.
During recent websearches, I have also seen:
http://12.network-tipc-general.networkbuzz.info/
http://12.network-tipc-general.networkbuzz.info
http://www.spinics.net/lists/netdev/ http://www.spinics.net/lists/netdev

We are currently based on a Ubuntu 12.04 system, using their recently
backported kernel 3.13. I think it came from their from 14.04 'trusty'
development. Sorry, but I really don't understand all the kernel and
driver versioning, but I can provide more details if you give instructions.

Sent from sourceforge.net because you indicated interest in
https://sourceforge.net/p/tipc/bugs/118/
https://sourceforge.net/p/tipc/bugs/118

To unsubscribe from further messages, please visit
https://sourceforge.net/auth/subscriptions/
https://sourceforge.net/auth/subscriptions

Related

Tasks: ~~#118~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Erik Hugne - 2014-08-15

Hi, we normally use the mailinglist for all TIPC issues.
Your sf email bounced, so i'm pasting in the same update here.
In my case i got a similar problem to yours, with corrupt packets. I found out that this was caused by the missing range check when setting the port importance (which effectively modifies the msg_user field). The test program i used was actually faulty, and passed an uninitialized value to the setsockopt()/TIPC_IMPORTANCE call. The effect of this is that the port phdr was set to a random type, causing all kinds of errors on the receiving side. I sent in a patch just now to netdev stable for this, tipc-discussion cc'ed:

http://thread.gmane.org/gmane.network.tipc.general/7053

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ned Kittlitz - 2014-08-18

I'm updating the bug, because I'm not sure how to use the mailing list. I don't know why my sf email appeared to bounce for you. I've received email because of bug 117 and bug 118 updates.

I got some help from a contractor who had some time to investigate.

He said he extracted the change below from a larger patch that came from the tipc group. He also said that the patch didn't mention reassembly. It seems to have fixed our problem.

I'm sorry I can't give better information right now. I'm overwhelmed by not understanding so many things.

Thanks,
Ned

diff -uprN -X linux-3.13.orig/Documentation/dontdiff linux-3.13.orig/net/tipc/link.c linux-3.13-tipc-fixes/net/tipc/link.c
--- linux-3.13.orig/net/tipc/link.c 2014-08-15 05:46:35.000000000 +0530
+++ linux-3.13-tipc-fixes/net/tipc/link.c 2014-08-15 06:56:51.000000000 +0530
@@ -2386,6 +2386,8 @@ int tipc_link_recv_fragment(struct sk_bu
(tail)->next = frag;
tail = frag;
(head)->truesize += frag->truesize;
+ (head)->len += frag->len;
+ (head)->data_len += frag->len;
}
if (fragid == LAST_FRAGMENT) {
fbuf = *head;

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Erik Hugne - 2014-08-19
  
  Great to hear that it resolved the problem.
  Should you have future questions, don't hesitate to drop a mail to
  tipc-discussion@lists.sourceforge.net
  
  2014-08-18 23:38 GMT+02:00 Ned Kittlitz nkittlitz@users.sf.net:
  
  I'm updating the bug, because I'm not sure how to use the mailing list. I
  don't know why my sf email appeared to bounce for you. I've received email
  because of bug 117 and bug 118 updates.
  
  I got some help from a contractor who had some time to investigate.
  
  He said he extracted the change below from a larger patch that came from
  the tipc group. He also said that the patch didn't mention reassembly. It
  seems to have fixed our problem.
  
  I'm sorry I can't give better information right now. I'm overwhelmed by
  not understanding so many things.
  
  Thanks,
  Ned
  
  diff -uprN -X linux-3.13.orig/Documentation/dontdiff
  linux-3.13.orig/net/tipc/link.c linux-3.13-tipc-fixes/net/tipc/link.c
  --- linux-3.13.orig/net/tipc/link.c 2014-08-15 05:46:35.000000000 +0530
  +++ linux-3.13-tipc-fixes/net/tipc/link.c 2014-08-15 06:56:51.000000000
  +0530
  @@ -2386,6 +2386,8 @@ int tipc_link_recv_fragment(struct sk_bu
  (
  
  tail)->next = frag; tail = frag;
  (
  head)->truesize += frag->truesize; + (head)->len += frag->len;
  + (
  
  head)->data_len += frag->len; } if (fragid == LAST_FRAGMENT) { fbuf =
  *head;
  
  [bugs:#118] http://sourceforge.net/p/tipc/bugs/118 fragment reassembly
  error?*
  
  Status: open
  Group:
  Created: Thu Aug 14, 2014 10:46 PM UTC by Ned Kittlitz
  Last Updated: Mon Aug 18, 2014 09:53 AM UTC
  Owner: nobody
  
  I'm having a hard time figuring out which sites to use to follow tipc work
  (bug reporting and patches).
  
  I'll try here to see if this sounds familiar. I don't see any
  fragmentation issues reported here, and many google searches also gave no
  useful hits.
  
  We get message corruption when there are 3+ fragments. In the case of 3,
  we think it's reassembled as 1/3/2. I think we might have seen 4 fragments
  as 1/4/2/3, but I need to improve our test. but maybe we are getting the
  final fragment inserted after frag1.
  
  We believe the sender and client must be on different machines; in our
  case, two ATCA blades.
  
  Underlying MTU is 1500. wireshark reports tipc fragment lengths of 1420 or
  less. The packets are OK in pcap on the receiver.
  
  The first problem happens at 2817 payload bytes, i.e. 2 * 1408 + 1. In
  this case, fragment 3 had 1 payload byte.
  
  I just did a quick test varying length from 1-32768. There were errors
  from 2817-3002, 4237-4422, 5657-5842, 7077-7262, 8497-8682... i.e. 2817 +
  1420 * N, for 186 bytes. I haven't had time to get pcaps on this group and
  I want to improve the test utility.
  
  Earlier, the closest thing I found was
  http://t154544.network-tipc-general.networkbuzz.info/tipc-fix-message-corruption-bug-fordeferred-packets-t154544.html
  .
  I don't understand how we could get out-of-order delivery with our
  hardware, but tipc-config -ls did show some non-zero RX "defs".
  
  I'm not certain where I should be looking for discussion, patches, etc.
  During recent websearches, I have also seen:
  http://12.network-tipc-general.networkbuzz.info/
  http://www.spinics.net/lists/netdev/
  
  We are currently based on a Ubuntu 12.04 system, using their recently
  backported kernel 3.13. I think it came from their from 14.04 'trusty'
  development. Sorry, but I really don't understand all the kernel and driver
  versioning, but I can provide more details if you give instructions.
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/tipc/bugs/118/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  Related
  
  Tasks: ~~#118~~
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jon Paul Maloy - 2018-05-22

Owner: Anonymous --> Jon Paul Maloy

Status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

fragment reassembly error?

Cluster wide IPC providing datagram, connection, and bus messaging

Group

Searches

Help

#118 fragment reassembly error?

Related

Discussion

Related

Related