From: Serge G. <s_g...@ma...> - 2006-04-06 12:35:12
|
Hi all! I configured a virtual network using UML kernels and i got some problems. My network consists of two virtual subnets and a virtual router connecting them. first net has 192.168.0.x hosts (net0) second net has 192.168.1.x hosts (net1) router has eth0 = 192.168.0.1 and eth1 = 192.168.1.1 it interacts with each net via 2 virtual uml switches (uml_switch tool from uml_utilities tarball). thus my network looks like this (hope you understand that poor scheme): net0 uml_switch router uml_switch net1 eth0 eth1 192.168.0.x === /tmp/sw0 === 192.168.0.1 | 192.168.1.1 === /tmp/sw1 === 192.168.1.x the problem is when I ping any net1 host from any net0 host I get about 90% packet loss and 'tcpdump -x' started on router eth1 device shows the following packets (which are obviously corrupted): 07:38:44.457214 truncated-ip - 22998 bytes missing!90.90.90.90 > 90.90.90.90: (frag 23130:23090@53968) [tos 0x5a,ECT] 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 0000 0000 0004 0000 0000 0002 0800 4500 0054 0010 4000 3f01 b828 c0a8 0190 c0a8 0090 0800 e8bf 9102 1000 c4fd 3444 83f8 07:38:45.477980 truncated-ip - 22998 bytes missing!90.90.90.90 > 90.90.90.90: (frag 23130:23090@53968) [tos 0x5a,ECT] 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 5a5a 0000 0000 0004 0000 0000 0002 0800 4500 0054 0011 4000 3f01 b827 c0a8 0190 c0a8 0090 0800 c36e 9102 1100 c5fd 3444 a649 these packets seem to be normal icmp ping requests/replies (at least according to their length) but with some amount of 5a5a words at the beginning! I also noticed that such packets appear only after arp request-reply (which according to tcpdump works ok!) the same packets arrive at the host being pinged (i.e. on net1 host) and there are no such packets on router eth0 device (there I see just normal icmp requests-replies). so I suppose corruption occurs somewhere between router eth1 and net1. the situation turns out to be symmetric (i.e. I got the same problem pinging net0 from net1) I wonder could this be some bug in uml_switch util or daemon transport or smth else and can I fix it somehow? I am using 2.6.13.4 UML kernels and latest version of uml_utilities. thanks in advance, Serge MIPT Moscow, Russia |
From: Jeff D. <jd...@ad...> - 2006-04-06 14:59:02
|
On Thu, Apr 06, 2006 at 04:34:58PM +0400, Serge Goodenko wrote: > the problem is when I ping any net1 host from any net0 host I get > about 90% packet loss and 'tcpdump -x' started on router eth1 device > shows the following packets (which are obviously corrupted): And the same pinging from net1 to net0? What version of UML? Can you see where the corruption originates? I.e. tcpdump the originating UML's eth0, if that's OK, check what's leaving the switch (which you can do by attaching it to a host tap device, running it -hub, and tcpdumping the tap device), etc. The 0x5a5a pattern is kernel slab poisoning, BTW, so it looks like a kernel is sending out a packet which had already been freed. Jeff |
From: Serge G. <s_g...@ma...> - 2006-04-06 16:30:23
|
> On Thu, Apr 06, 2006 at 04:34:58PM +0400, Serge Goodenko wrote: > > the problem is when I ping any net1 host from any net0 host I get > > about 90% packet loss and 'tcpdump -x' started on router eth1 device > > shows the following packets (which are obviously corrupted): > > And the same pinging from net1 to net0? yeah, it's absolutely symmetric - if pinging from net1 to net0 corruption occurs between router eth0 and net0. > > What version of UML? UML kernel version 2.6.13-4 with built-in UML running in SKAS3 mode. I don't know whether UML code itself has any versioning... host kernel is 2.6.13-15, if it matters.. ) > > Can you see where the corruption originates? I.e. tcpdump the > originating UML's eth0, if that's OK, check what's leaving the switch > (which you can do by attaching it to a host tap device, running it > -hub, and tcpdumping the tap device), etc. yes, I tried tcpdumping tap device as you said - the same result. a lot of 5a5a.. packets arrives at it. however, if I ping 192.168.1.1 from net0 it goes ok and 100% packets are replied. I have also tried to run tcpdump with '-e' option to see the source and dest mac addresses and it became clear that packets currupt when router sends them to net1 (before tcpdump shows it) but a fact that the corruption is generally accidental confuses me most... I also tried to send TCP messages using simple tcp client and server between net0 and net1 and the data always transmits ok (apparently due to tcp guarantees delivery) but these 5a5a packets still appear and therefore sometimes tcp transmission is very slow. > > The 0x5a5a pattern is kernel slab poisoning, BTW, so it looks like a > kernel is sending out a packet which had already been freed. > what makes kernel behave so is the biggest question for me now.... Serge MIPT Moscow, Russia |
From: Jeff D. <jd...@ad...> - 2006-04-06 16:44:06
|
On Thu, Apr 06, 2006 at 08:30:19PM +0400, Serge Goodenko wrote: > UML kernel version 2.6.13-4 with built-in UML running in SKAS3 > mode. I don't know whether UML code itself has any versioning... host > kernel is 2.6.13-15, if it matters.. ) Can you try something newer, to see if this is something that still needs fixing? > yes, I tried tcpdumping tap device as you said - the same result. a > lot of 5a5a.. packets arrives at it. This is saying that corrupted packets arrive at the switch from the UML? > I have also tried to run tcpdump with '-e' option to see the source > and dest mac addresses and it became clear that packets currupt when > router sends them to net1 (before tcpdump shows it) And this is saying that the switch is corrupting packets, or that they are corrupted before they are sent to the destination net? Jeff |
From: Serge G. <s_g...@ma...> - 2006-04-07 11:31:18
|
> > On Thu, Apr 06, 2006 at 08:30:19PM +0400, Serge Goodenko wrote: > > UML kernel version 2.6.13-4 with built-in UML running in SKAS3 > > mode. I don't know whether UML code itself has any versioning... host > > kernel is 2.6.13-15, if it matters.. ) > > Can you try something newer, to see if this is something that still > needs fixing? > Yes, I tried kernel 2.6.16.1 and that helped, thanks... frankly a bit tired of porting my modifications to newer kernels... ) well, at least I keep feeling myself in the thick of things.. Serge MIPT Moscow, Russia |
From: Jeff D. <jd...@ad...> - 2006-04-07 14:22:18
|
On Fri, Apr 07, 2006 at 03:31:06PM +0400, Serge Goodenko wrote: > Yes, I tried kernel 2.6.16.1 and that helped, thanks... > > frankly a bit tired of porting my modifications to newer kernels... ) > well, at least I keep feeling myself in the thick of things.. 2.6.13 was quite a while ago, and UML was a lot buggier then... Jeff |