From: Lars E. <Lar...@li...> - 2004-10-03 02:36:14
|
in the upcomming new heartbeat design of linux-ha, we use relatively large udp packets (passing large xml blobs back and forth between the nodes). now, cluster development and testing is convenient on UML. but: uml seems to silently and systematically lose all fragmented udp packets, that is packages larger than mtu, and 1480 byte max for xml blobs just does not work out too good ... I investigated a little bit (using two simple perl snippets to generate/[not]receive the udp packets), and it turns out that: sending large (up to 64K; fragmented) udp packets using tuntap: HOST -> UML works. UML -> HOST nope :( UML <-> UML nope :( [ neither with mcast or other ] looking into /proc/net/snmp on the UMLs and the HOST show on the not receiving side an increase of InHdrError! (it never is reassembled into a proper udp packet) this is easy to reproduce (because it just happens all the time) tested with 2.6.6 and 2.6.8 plus respective UML patches. it seems to me that UML corrupts the ip header of fragmented udp packets somehow at sending time. I wonder, if someone uses nfs over udp on uml, this should be a long known issue and turn up loads of hits in a search. I did not find a single reference to that problem, though. the way it is now we probably need to reanimate some of our old boxes to form a real test cluster. and believe me, that is no fun :( if some kind soul would be able to fix that... would make cluster testing as we do it so much more convenient :-) thanks, Lars Ellenberg please CC me, I'm not subscribed on this list. |
From: BlaisorBlade <bla...@ya...> - 2004-10-03 15:17:19
|
On Sunday 03 October 2004 04:36, Lars Ellenberg wrote: > in the upcomming new heartbeat design of linux-ha, we use relatively > large udp packets (passing large xml blobs back and forth between the > nodes). now, cluster development and testing is convenient on UML. > > but: > uml seems to silently and systematically lose all fragmented > udp packets, that is packages larger than mtu, and 1480 byte > max for xml blobs just does not work out too good ... > I investigated a little bit (using two simple perl snippets to > generate/[not]receive the udp packets), and it turns out that: > sending large (up to 64K; fragmented) udp packets > using tuntap: > HOST -> UML works. > UML -> HOST nope :( > UML <-> UML nope :( [ neither with mcast or other ] > > looking into /proc/net/snmp on the UMLs and the HOST > show on the not receiving side an increase of > InHdrError! > (it never is reassembled into a proper udp packet) > > this is easy to reproduce (because it just happens all the time) > tested with 2.6.6 and 2.6.8 plus respective UML patches. Have you tried to use different UML transports (ethertap/uml_switch)- tcpdump'ing the traffic - other things? Changing the host version? Increasing the MTU somewhere (but seems not to work)? What is strange is that the UML code does not even parse the IP header when using tuntap - it only works at the Ethernet layer. IIRC, the fragmentation happens at the IP layer... Also, can you post the scripts you use? > it seems to me that UML corrupts the ip header of > fragmented udp packets somehow at sending time. > > I wonder, if someone uses nfs over udp on uml, this should > be a long known issue and turn up loads of hits in a search. > I did not find a single reference to that problem, though. Does NFS uses large UDP packets? > thanks, > > Lars Ellenberg > > please CC me, I'm not subscribed on this list. -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 |
From: Lars E. <Lar...@li...> - 2004-10-03 19:29:01
Attachments:
UdpRecv.pl
UdpSend.pl
|
/ 2004-10-03 17:11:26 +0200 \ BlaisorBlade: > On Sunday 03 October 2004 04:36, Lars Ellenberg wrote: > > in the upcomming new heartbeat design of linux-ha, we use relatively > > large udp packets (passing large xml blobs back and forth between the > > nodes). now, cluster development and testing is convenient on UML. > > > > but: > > uml seems to silently and systematically lose all fragmented > > udp packets, that is packages larger than mtu, and 1480 byte > > max for xml blobs just does not work out too good ... > > > I investigated a little bit (using two simple perl snippets to > > generate/[not]receive the udp packets), and it turns out that: > > > sending large (up to 64K; fragmented) udp packets > > using tuntap: > > HOST -> UML works. > > UML -> HOST nope :( > > UML <-> UML nope :( [ neither with mcast or other ] > > > > looking into /proc/net/snmp on the UMLs and the HOST > > show on the not receiving side an increase of > > InHdrError! > > (it never is reassembled into a proper udp packet) > > > > this is easy to reproduce (because it just happens all the time) > > tested with 2.6.6 and 2.6.8 plus respective UML patches. > > Have you tried to use different UML transports (ethertap/uml_switch)- tuntap with uml_switch, uml_switch -hub multicast > tcpdump'ing the traffic - other things? host tcpdump sees packets, they appear to be correct. tcpdump in the uml segfaults on the second fragment for two-fragment udp packet :-) > Changing the host version? host linux kernel: 2.4.21-suse something, 2.4.26, 2.6.6, 2.6.8 > Increasing the MTU somewhere (but seems not to work)? right. does not work. > What is strange is that the UML code does not even parse the IP header > when using tuntap - it only works at the Ethernet layer. > IIRC, the fragmentation happens at the IP layer... hm... I only describe what I see. > Also, can you post the scripts you use? attached. on one side: perl UdpRecv.pl & on other uml, host, or same system, does not matter: perl UdpSend.pl [<target ip>] btw, uml to itself via uml-lo does work... > > it seems to me that UML corrupts the ip header of > > fragmented udp packets somehow at sending time. > > > > I wonder, if someone uses nfs over udp on uml, this should > > be a long known issue and turn up loads of hits in a search. > > I did not find a single reference to that problem, though. > > Does NFS uses large UDP packets? sometimes. and yes, I just exported from uml, nfs-mounted on host, did an ls in a directory with MANY files, and it never came back (udp never reached host, host increases InHdrError) blocksize of exported file system was 1024, mtu is 1500, on would assume it should just work ... but that is still an other problem. thanks, lge |
From: Gerd K. <kr...@by...> - 2004-10-04 11:20:18
|
BlaisorBlade <bla...@ya...> writes: > > I wonder, if someone uses nfs over udp on uml, this should > > be a long known issue and turn up loads of hits in a search. > > I did not find a single reference to that problem, though. > > Does NFS uses large UDP packets? Looks like it does. I can ack that issue for the NFS case. NFS over tcp does fine (which seems to be the default, so I didn't notice until now). NFS over udp works, is very slow through and I get plenty of "nfs server not responding" + "nfs server ok" messages in the syslog. Looks like it doesn't loose all packets, but enougth to slowdown it drastically and trigger timeouts on the client side. That is (kernel) nfs server on the host machine, uml being connected via tuntap networking and mounting /home using NFS. Gerd -- return -ENOSIG; |
From: Henrik N. <um...@hn...> - 2004-10-04 11:57:13
|
On Mon, 4 Oct 2004, Gerd Knorr wrote: >> Does NFS uses large UDP packets? > > Looks like it does. The NFS message size is normally 4096 + message headers + protocol layers. The rsize/wsize mount parameters has a play in this.. (NFS data payload size within the NFS RPC message over UDP). Regards Henrik |
From: BlaisorBlade <bla...@ya...> - 2004-10-06 18:02:50
|
On Sunday 03 October 2004 04:36, Lars Ellenberg wrote: > in the upcomming new heartbeat design of linux-ha, we use relatively > large udp packets (passing large xml blobs back and forth between the > nodes). now, cluster development and testing is convenient on UML. > but: > uml seems to silently and systematically lose all fragmented > udp packets, that is packages larger than mtu, and 1480 byte > max for xml blobs just does not work out too good ... > I investigated a little bit (using two simple perl snippets to > generate/[not]receive the udp packets), and it turns out that: > sending large (up to 64K; fragmented) udp packets > using tuntap: > HOST -> UML works. > UML -> HOST nope :( > UML <-> UML nope :( [ neither with mcast or other ] > > looking into /proc/net/snmp on the UMLs and the HOST > show on the not receiving side an increase of > InHdrError! > (it never is reassembled into a proper udp packet) > > this is easy to reproduce (because it just happens all the time) > tested with 2.6.6 and 2.6.8 plus respective UML patches. > > it seems to me that UML corrupts the ip header of > fragmented udp packets somehow at sending time. I've traced this with Ethereal (v0.10.5) running on tap0 and it complains that the IP header checksum is always incorrect when the packet is fragmented. This does not happen when running both programs on the host; I've set an mtu of 1500 for "lo" fot this test. However, it seems that Ethereal always shows the UDP checksum, which is different, as incorrect for not fragmented packets, when they are sent over the "lo" link (on my 2.6.7 host kernel); by comparison, when sending them over local network it never complains. The Ethereal doc say that when capturing on an interface that supports TCP checksum offloading (i.e. hardware checksumming), this is normal for TCP checksums, so I guess this can happen for UDP checksums, too. But why the loopback driver should mark itself as capable of doing "hardware checksum"? However, it seems that actually this is the situation. In the source code, the loopback driver is marked as "needing no checksum at all because it's safe (see NETIF_F_NO_CSUM in include/linux/skbuff.h). Also, it seems that the UML code happily ignores specifying what checksum support. And this could help us. include/linux/skbuff.h describes the Checksum flags, and UML does not use them: these two commands return no output. find arch/um/ -name '*.[ch]'|xargs grep NETIF find arch/um/ -name '*.[ch]'|xargs grep CHECKSUM Actually I've never done any work *at all* on the networking code, so this is just a wild guess. > the way it is now we probably need to reanimate some of our old > boxes to form a real test cluster. and believe me, that is no fun :( I've tried UML 2.4, and it does not seem to experience this bug: it does not increases the host error count in /proc/net/snmp, UdpRecv receives all packet sizes (I stopped the test at 49100 bytes), and even Ethereal shows correct datas. The test were run sending the packets from UML to the Host, as you say. So this could help you for now, while we try to find a clue about this. Quite frankly, I must say that I'm not seeing any network kernel hacker here (correct me if I'm wrong), so it will take some time to debug it. Maybe Gerd Knorr is an exception, actually. > if some kind soul would be able to fix that... > would make cluster testing as we do it > so much more convenient :-) -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 |
From: Lars E. <Lar...@li...> - 2004-10-06 18:48:20
|
> So this could help you for now, while we try to find a clue about this.= Quite=20 > frankly, I must say that I'm not seeing any network kernel hacker here=20 > (correct me if I'm wrong), so it will take some time to debug it. Maybe= Gerd=20 > Knorr is an exception, actually. Well, then I take Andi Kleen and Lars Marowsky-Br=E9e into CC for now. Lars, because I expect him to be interessted in having UML as full featured cluster simulation tool available, and Andi because I hope he might know the network code much better than me... FYI, full thread can be found for example at http://thread.gmane.org/gmane.linux.uml.devel/4607 Thanks, Lars Ellenberg / 2004-10-06 19:50:46 +0200 \ BlaisorBlade: > On Sunday 03 October 2004 04:36, Lars Ellenberg wrote: > > in the upcomming new heartbeat design of linux-ha, we use relatively > > large udp packets (passing large xml blobs back and forth between the > > nodes). now, cluster development and testing is convenient on UML. >=20 > > but: > > uml seems to silently and systematically lose all fragmented > > udp packets, that is packages larger than mtu, and 1480 byte > > max for xml blobs just does not work out too good ... >=20 > > I investigated a little bit (using two simple perl snippets to > > generate/[not]receive the udp packets), and it turns out that: >=20 > > sending large (up to 64K; fragmented) udp packets > > using tuntap: > > HOST -> UML works. > > UML -> HOST nope :( > > UML <-> UML nope :( [ neither with mcast or other ] > > > > looking into /proc/net/snmp on the UMLs and the HOST > > show on the not receiving side an increase of > > InHdrError! > > (it never is reassembled into a proper udp packet) > > > > this is easy to reproduce (because it just happens all the time) > > tested with 2.6.6 and 2.6.8 plus respective UML patches. > > > > it seems to me that UML corrupts the ip header of > > fragmented udp packets somehow at sending time. >=20 > I've traced this with Ethereal (v0.10.5) running on tap0 and it complai= ns that=20 > the IP header checksum is always incorrect when the packet is fragmente= d.=20 > This does not happen when running both programs on the host; I've set a= n mtu=20 > of 1500 for "lo" fot this test. >=20 > However, it seems that Ethereal always shows the UDP checksum, which is= =20 > different, as incorrect for not fragmented packets, when they are sent = over=20 > the "lo" link (on my 2.6.7 host kernel); by comparison, when sending th= em=20 > over local network it never complains. The Ethereal doc say that when=20 > capturing on an interface that supports TCP checksum offloading (i.e.=20 > hardware checksumming), this is normal for TCP checksums, so I guess th= is can=20 > happen for UDP checksums, too. >=20 > But why the loopback driver should mark itself as capable of doing "har= dware=20 > checksum"? However, it seems that actually this is the situation. In th= e=20 > source code, the loopback driver is marked as "needing no checksum at a= ll=20 > because it's safe (see NETIF_F_NO_CSUM in include/linux/skbuff.h). >=20 > Also, it seems that the UML code happily ignores specifying what checks= um=20 > support. And this could help us. >=20 > include/linux/skbuff.h describes the Checksum flags, and UML does not u= se=20 > them: these two commands return no output. >=20 > find arch/um/ -name '*.[ch]'|xargs grep NETIF > find arch/um/ -name '*.[ch]'|xargs grep CHECKSUM >=20 > Actually I've never done any work *at all* on the networking code, so t= his is=20 > just a wild guess. > > the way it is now we probably need to reanimate some of our old > > boxes to form a real test cluster. and believe me, that is no fun :( >=20 > I've tried UML 2.4, and it does not seem to experience this bug: it doe= s not=20 > increases the host error count in /proc/net/snmp, UdpRecv receives all=20 > packet sizes (I stopped the test at 49100 bytes), and even Ethereal sho= ws=20 > correct datas. The test were run sending=20 > the packets from UML to the Host, as you say. >=20 > So this could help you for now, while we try to find a clue about this.= Quite=20 > frankly, I must say that I'm not seeing any network kernel hacker here=20 > (correct me if I'm wrong), so it will take some time to debug it. Maybe= Gerd=20 > Knorr is an exception, actually. >=20 > > if some kind soul would be able to fix that... > > would make cluster testing as we do it > > so much more convenient :-) >=20 > --=20 > Paolo Giarrusso, aka Blaisorblade > Linux registered user n. 292729 |
From: Andi K. <ak...@su...> - 2004-10-06 20:38:49
|
On Wed, Oct 06, 2004 at 08:48:23PM +0200, Lars Ellenberg wrote: > > So this could help you for now, while we try to find a clue about this. Quite > > frankly, I must say that I'm not seeing any network kernel hacker here > > (correct me if I'm wrong), so it will take some time to debug it. Maybe Gerd > > Knorr is an exception, actually. > > Well, then I take Andi Kleen and Lars Marowsky-Br?e into CC for now. > Lars, because I expect him to be interessted in having UML as full > featured cluster simulation tool available, and Andi because I hope he > might know the network code much better than me... > > FYI, full thread can be found for example at > http://thread.gmane.org/gmane.linux.uml.devel/4607 Paolo's analysis is basically correct. loopback sets this flag for better performance. Actually in 2.6 it probably doesn't help very much anymore because TCP can do checksum copy RX now, and that would get the checksum basically for free. But it's still there and may still make things slightly faster. If UML taps the packets from lo it will see incorrect checksums. Using a tun or ethertap device would avoid this. In the worst case you could also just delete the flag from the loopback interface, it's only an optimization. -Andi |
From: Lars E. <Lar...@li...> - 2004-10-06 21:49:33
|
/ 2004-10-06 22:35:39 +0200 \ Andi Kleen: > On Wed, Oct 06, 2004 at 08:48:23PM +0200, Lars Ellenberg wrote: > > > So this could help you for now, while we try to find a clue about this. Quite > > > frankly, I must say that I'm not seeing any network kernel hacker here > > > (correct me if I'm wrong), so it will take some time to debug it. Maybe Gerd > > > Knorr is an exception, actually. > > > > Well, then I take Andi Kleen and Lars Marowsky-Br?e into CC for now. > > Lars, because I expect him to be interessted in having UML as full > > featured cluster simulation tool available, and Andi because I hope he > > might know the network code much better than me... > > > > FYI, full thread can be found for example at > > http://thread.gmane.org/gmane.linux.uml.devel/4607 > > Paolo's analysis is basically correct. loopback sets this flag > for better performance. Actually in 2.6 it probably doesn't help > very much anymore because TCP can do checksum copy RX now, and that > would get the checksum basically for free. But it's still there > and may still make things slightly faster. > > If UML taps the packets from lo it will see incorrect checksums. > > Using a tun or ethertap device would avoid this. In the worst > case you could also just delete the flag from the loopback > interface, it's only an optimization. > > -Andi unfortunately ethertap transport does not work either, at least if UML is 2.6.6 and host kernel is 2.4.21-suse-whatever... I did not try other combinations yet, but I doubt that changes a thing. you suggest that we remove NET_IF_F_NO_CSUM from lo in the host? ok, I'll try recompile my host then, and followup if that helps. lge |
From: BlaisorBlade <bla...@ya...> - 2004-10-07 18:41:20
|
On Wednesday 06 October 2004 23:48, Lars Ellenberg wrote: > / 2004-10-06 22:35:39 +0200 > > \ Andi Kleen: > > On Wed, Oct 06, 2004 at 08:48:23PM +0200, Lars Ellenberg wrote: > > > > So this could help you for now, while we try to find a clue about > > > > this. Quite frankly, I must say that I'm not seeing any network > > > > kernel hacker here (correct me if I'm wrong), so it will take some > > > > time to debug it. Maybe Gerd Knorr is an exception, actually. > > > > > > Well, then I take Andi Kleen and Lars Marowsky-Br?e into CC for now. > > > Lars, because I expect him to be interessted in having UML as full > > > featured cluster simulation tool available, and Andi because I hope he > > > might know the network code much better than me... > > > > > > FYI, full thread can be found for example at > > > http://thread.gmane.org/gmane.linux.uml.devel/4607 > > > > Paolo's analysis is basically correct. loopback sets this flag > > for better performance. Actually in 2.6 it probably doesn't help > > very much anymore because TCP can do checksum copy RX now, and that > > would get the checksum basically for free. But it's still there > > and may still make things slightly faster. > > > > If UML taps the packets from lo it will see incorrect checksums. It does not, so the solution is not the right one. > > Using a tun or ethertap device would avoid this. In the worst > > case you could also just delete the flag from the loopback > > interface, it's only an optimization. > > > > -Andi > > unfortunately ethertap transport does not work either, > at least if UML is 2.6.6 and host kernel is 2.4.21-suse-whatever... > I did not try other combinations yet, but I doubt that changes a thing. > > you suggest that we remove NET_IF_F_NO_CSUM from lo in the host? > ok, I'll try recompile my host then, and followup if that helps. No, I think he spoke about the guest; also he misunderstood a bit the problem, since the packets do not go through the "lo" interface inside UML. -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 |
From: BlaisorBlade <bla...@ya...> - 2004-10-07 18:41:56
|
On Wednesday 06 October 2004 22:35, Andi Kleen wrote: > On Wed, Oct 06, 2004 at 08:48:23PM +0200, Lars Ellenberg wrote: > > > So this could help you for now, while we try to find a clue about this. > > > Quite frankly, I must say that I'm not seeing any network kernel hacker > > > here (correct me if I'm wrong), so it will take some time to debug it. > > > Maybe Gerd Knorr is an exception, actually. > > Well, then I take Andi Kleen and Lars Marowsky-Br?e into CC for now. > > Lars, because I expect him to be interessted in having UML as full > > featured cluster simulation tool available, and Andi because I hope he > > might know the network code much better than me... > > FYI, full thread can be found for example at > > http://thread.gmane.org/gmane.linux.uml.devel/4607 > Paolo's analysis is basically correct. loopback sets this flag > for better performance. Actually in 2.6 it probably doesn't help > very much anymore because TCP can do checksum copy RX now, and that > would get the checksum basically for free. But it's still there > and may still make things slightly faster. First thing: thanks a lot for your quick answer. My discussion about "lo" was slightly unrelated to the exact problem, and a bit confusing... I was at first surprised from Ethereal complaining about the host kernel, so I thought I could have a buggy Ethereal, and then went checking that it's a Linux optimization, indeed. > If UML taps the packets from lo it will see incorrect checksums. > Using a tun or ethertap device would avoid this. > In the worst > case you could also just delete the flag from the loopback > interface, it's only an optimization. No, inside the Uml kernel they go through a virtual "ethN" interface, which uses special code. That driver, in turn, will use either ethertap, or TAP (it sends Ethernet frames), or even other mechanism. You can find it (in 2.6.9-rc2 at least) in arch/um/drivers/net_*.c and arch/um/os-Linux/drivers/*tap*.c. The code in *_kern.c files links against the kernel API and includes, *_user.c against the host userspace includes. And the problem is, probably, that the UML network drivers never declare their checksumming status, as I said in the previous mail: [quote] include/linux/skbuff.h describes the Checksum flags, and UML does not use them: these two commands return no (relevant) output. find arch/um/ -name '*.[ch]'|xargs grep NETIF find arch/um/ -name '*.[ch]'|xargs grep CHECKSUM [/quote] And not even these ones: find arch/um/ -name '*.[ch]'|xargs grep NETIF find arch/um/ -name '*.[ch]'|xargs grep CHECKSUM Also, it's possible that there are even other bugs... -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 |
From: Lars E. <Lar...@li...> - 2004-10-07 20:25:47
|
/ 2004-10-07 20:41:51 +0200 \ BlaisorBlade: > On Wednesday 06 October 2004 22:35, Andi Kleen wrote: > > On Wed, Oct 06, 2004 at 08:48:23PM +0200, Lars Ellenberg wrote: > > > FYI, full thread can be found for example at > > > http://thread.gmane.org/gmane.linux.uml.devel/4607 > > > Paolo's analysis is basically correct. loopback sets this flag > > for better performance. Actually in 2.6 it probably doesn't help > > very much anymore because TCP can do checksum copy RX now, and that > > would get the checksum basically for free. But it's still there > > and may still make things slightly faster. > First thing: thanks a lot for your quick answer. > > My discussion about "lo" was slightly unrelated to the exact problem, and a > bit confusing... > > I was at first surprised from Ethereal complaining about the host kernel, so I > thought I could have a buggy Ethereal, and then went checking that it's a > Linux optimization, indeed. > > > If UML taps the packets from lo it will see incorrect checksums. > > Using a tun or ethertap device would avoid this. > > > In the worst > > case you could also just delete the flag from the loopback > > interface, it's only an optimization. > > No, inside the Uml kernel they go through a virtual "ethN" interface, which > uses special code. That driver, in turn, will use either ethertap, or TAP (it > sends Ethernet frames), or even other mechanism. > > You can find it (in 2.6.9-rc2 at least) in arch/um/drivers/net_*.c and > arch/um/os-Linux/drivers/*tap*.c. The code in *_kern.c files links against > the kernel API and includes, *_user.c against the host userspace includes. > > And the problem is, probably, that the UML network drivers never declare their > checksumming status, as I said in the previous mail: > > [quote] > include/linux/skbuff.h describes the Checksum flags, and UML does not use > them: these two commands return no (relevant) output. > > find arch/um/ -name '*.[ch]'|xargs grep NETIF > find arch/um/ -name '*.[ch]'|xargs grep CHECKSUM > [/quote] > > And not even these ones: > > find arch/um/ -name '*.[ch]'|xargs grep NETIF > find arch/um/ -name '*.[ch]'|xargs grep CHECKSUM > > Also, it's possible that there are even other bugs... now, what I found: arch/um/drivers/net_kern.c: struct sk_buff *ether_adjust_skb(struct sk_buff *skb, int extra) { if((skb != NULL) && (skb_tailroom(skb) < extra)){ struct sk_buff *skb2; skb2 = skb_copy_expand(skb, 0, extra, GFP_ATOMIC); dev_kfree_skb(skb); skb = skb2; } if(skb != NULL) skb_put(skb, extra); return(skb); } net/core/skbuff.c: * BUG ALERT: ip_summed is not copied. Why does this work? Is it used * only by netfilter in the cases when checksum is recalculated? --ANK */ struct sk_buff *skb_copy_expand(const struct sk_buff *skb, int newheadroom, int newtailroom, int gfp_mask) { does that trigger something in someones brain maybe? someone "sees" it? otherwise I keep poking around... lge |
From: Lars E. <Lar...@li...> - 2004-10-11 17:54:24
|
/ 2004-10-07 22:24:54 +0200 \ Lars Ellenberg: > now, what I found: > arch/um/drivers/net_kern.c: > struct sk_buff *ether_adjust_skb(struct sk_buff *skb, int extra) > { > if((skb != NULL) && (skb_tailroom(skb) < extra)){ > struct sk_buff *skb2; > > skb2 = skb_copy_expand(skb, 0, extra, GFP_ATOMIC); > dev_kfree_skb(skb); > skb = skb2; > } > if(skb != NULL) skb_put(skb, extra); > return(skb); > } > > net/core/skbuff.c: > * BUG ALERT: ip_summed is not copied. Why does this work? Is it used > * only by netfilter in the cases when checksum is recalculated? --ANK > */ > struct sk_buff *skb_copy_expand(const struct sk_buff *skb, > int newheadroom, int newtailroom, int gfp_mask) > { > > > does that trigger something in someones brain maybe? > someone "sees" it? > > otherwise I keep poking around... since UML => Host does not work, but Host => UML does, this suggests that the bug is somewhere on the sending side. I was not able to track it down. but, I just patched out the "verify checksum" from the receiving part, and get 16000 byte through now, sometimes more (this seems to be a buffer issue). so for now, I just don't care for the ip checksum on my UMLs, and I have to live with no fragmented UDP from UML => Host. but between my UMLs, I have now up to 16k UDP, that should be enough for the moment beeing. bute but effective: dont care for checksums. at first I had additional printks in the code wherever it said "goto inhdr_error", to find where exactly it breaks. only the now #if 0'ed one triggered. Lars Ellenberg --- linux-2.6.6/net/ipv4/ip_input.c.orig 2004-10-11 19:35:57.000000000 +0200 +++ linux-2.6.6/net/ipv4/ip_input.c 2004-10-11 19:53:58.000000000 +0200 @@ -403,8 +403,10 @@ iph = skb->nh.iph; +#if 0 if (ip_fast_csum((u8 *)iph, iph->ihl) != 0) goto inhdr_error; +#endif { __u32 len = ntohs(iph->tot_len); |
From: Lars E. <Lar...@li...> - 2004-10-12 00:02:34
|
/ 2004-10-11 19:55:12 +0200 \ Lars Ellenberg: > since UML => Host does not work, but > Host => UML does, this suggests that > the bug is somewhere on the sending side. > > I was not able to track it down. just so you know, finally: ======================= --- linux-2.6.6/arch/um/include/sysdep-i386/checksum.h.orig 2004-10-12 01:50:49.000000000 +0200 +++ linux-2.6.6/arch/um/include/sysdep-i386/checksum.h 2004-10-12 01:50:58.000000000 +0200 @@ -102,8 +102,7 @@ are modified, we must also specify them as outputs, or gcc will assume they contain their original values. */ : "=r" (sum), "=r" (iph), "=r" (ihl) - : "1" (iph), "2" (ihl) - : "memory"); + : "1" (iph), "2" (ihl)); return(sum); } ======================= that's all, folks. only a missing memory barrier. WTF :-/ same patch applies to 2.6.8.1 - uml and probably all other umls. since that is a one-to-one copy anyways, maybe UML should better use the original (in include/asm-i386/checksum.h) right away?? there may be similar bugs hiding in various areas of uml... Thanks for now, keep it going... btw, anyone wants to give me a hint how to tune it best? maybe how to up the mtu of the UML "nics"? Lars Ellenberg "NLRge your UML-UDP" ... |
From: Andi K. <ak...@su...> - 2004-10-12 00:18:40
|
> will assume they contain their original values. */ > : "=r" (sum), "=r" (iph), "=r" (ihl) > - : "1" (iph), "2" (ihl) > - : "memory"); > + : "1" (iph), "2" (ihl)); > return(sum); > } > > ======================= That's reverted, right? > > that's all, folks. only a missing memory barrier. > > WTF :-/ This was fixed in mainline some time ago (several months probably more) The problem only started with newer gccs that optimize more aggressively. > original (in include/asm-i386/checksum.h) right away?? > there may be similar bugs hiding in various areas of uml... Sounds like a good idea. > > Thanks for now, > keep it going... > > btw, > anyone wants to give me a hint how to tune it best? > maybe how to up the mtu of the UML "nics"? Don't go over 4K because the VM doesn't like >order 0 allocations very much. But in general bigger is better. -Andi |
From: BlaisorBlade <bla...@ya...> - 2004-10-12 01:10:55
|
On Tuesday 12 October 2004 02:11, Andi Kleen wrote: > > will assume they contain their original values. */ > > > > : "=r" (sum), "=r" (iph), "=r" (ihl) > > > > - : "1" (iph), "2" (ihl) > > - : "memory"); > > + : "1" (iph), "2" (ihl)); > > return(sum); > > } > > > > ======================= > > That's reverted, right? > > > that's all, folks. only a missing memory barrier. > > > > WTF :-/ > > This was fixed in mainline some time ago (several months probably more) > The problem only started with newer gccs that optimize more aggressively. > > > original (in include/asm-i386/checksum.h) right away?? > > there may be similar bugs hiding in various areas of uml... > > Sounds like a good idea. Agreed, but I've to check if the include does not have any conflict. And I don't have the time until after 2.6.9, because I must address more urgent issues. Obviously the one-liner itself is being sent to Andrew Morton. > > Thanks for now, > > keep it going... > > > > btw, > > anyone wants to give me a hint how to tune it best? > > maybe how to up the mtu of the UML "nics"? > > Don't go over 4K because the VM doesn't like >order 0 allocations > very much. But in general bigger is better. Sadly there is a problem: since we use TAP and emulate whole Ethernet frames, the code does not allow to increase the MTU to > 1500 bytes. I think this cannot be fixed currently, but if you think this is wrong, please let us now. Obviously we could add another interface emulation with bigger MTU, but I've no ideas about which ones to emulate. -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 |
From: Lars E. <Lar...@li...> - 2004-10-15 15:25:32
|
/ 2004-10-12 03:10:59 +0200 \ BlaisorBlade: > > > btw, > > > anyone wants to give me a hint how to tune it best? > > > maybe how to up the mtu of the UML "nics"? > > > > Don't go over 4K because the VM doesn't like >order 0 allocations > > very much. But in general bigger is better. > > Sadly there is a problem: since we use TAP and emulate whole Ethernet frames, > the code does not allow to increase the MTU to > 1500 bytes. I think this > cannot be fixed currently, but if you think this is wrong, please let us now. > > Obviously we could add another interface emulation with bigger MTU, but I've > no ideas about which ones to emulate. now, if I config eth0=daemon,FE:FD:00:00:00:01,,/tmp/uml0.ctl and then ifconfig eth0 mtu 8192, uml does not complain (but does not work, either). and if I then patch uml_switch from uml_utilities_20040114 (which is what I have here, did not check whether there is anything newer out there) =================== diff -ru tools.orig/uml_router/port.c tools/uml_router/port.c --- tools.orig/uml_router/port.c 2003-03-12 16:19:03.000000000 +0100 +++ tools/uml_router/port.c 2004-10-15 16:59:07.000000000 +0200 @@ -14,7 +14,7 @@ unsigned char src[ETH_ALEN]; unsigned char proto[2]; } header; - unsigned char data[1500]; + unsigned char data[9000]; }; struct port { =================== it even works! I guess with mcast or other non-daemon transport (not yet tried), it will just work, too (maybe you need to adjust the mtu on the host). for the first time I really get 65507 byte udp packages through uml <-> uml. (not that it makes any sense to use udp with that large messages ...) anything larger won't work anyways... Lars Ellenberg |
From: BlaisorBlade <bla...@ya...> - 2004-10-12 00:26:46
|
On Tuesday 12 October 2004 02:03, Lars Ellenberg wrote: > / 2004-10-11 19:55:12 +0200 > > \ Lars Ellenberg: > > since UML => Host does not work, but > > Host => UML does, this suggests that > > the bug is somewhere on the sending side. > > > > I was not able to track it down. > > just so you know, finally: > ======================= > --- linux-2.6.6/arch/um/include/sysdep-i386/checksum.h.orig 2004-10-12 > 01:50:49.000000000 +0200 +++ > linux-2.6.6/arch/um/include/sysdep-i386/checksum.h 2004-10-12 > 01:50:58.000000000 +0200 @@ -102,8 +102,7 @@ > are modified, we must also specify them as outputs, or gcc > will assume they contain their original values. */ > > : "=r" (sum), "=r" (iph), "=r" (ihl) > > - : "1" (iph), "2" (ihl) > - : "memory"); > + : "1" (iph), "2" (ihl)); > return(sum); > } The patch you posted REMOVES a memory barrier - you reversed it. I actually checked the barrier is missing in the source code; but the strange thing is that you modified checksum.h.orig and not checksum.h! Are you sure that you compiled the corrected header? However, the patch is correct, and I assume you posted it the right way. I'm going to send it to Andrew Morton for 2.6.9 (hoping he wants to accept all these patches - they are rushing for 2.6.9). > that's all, folks. only a missing memory barrier. Thanks a lot, folks! Yes, the patch is right. > WTF :-/ > > same patch applies to 2.6.8.1 - uml and probably all other umls. > since that is a one-to-one copy anyways, maybe UML should better use the > original (in include/asm-i386/checksum.h) right away?? I could do this later - too much checks for the other header code for a one-minute patch. > there may be similar bugs hiding in various areas of uml... I think there are even worse ones. > btw, > anyone wants to give me a hint how to tune it best? > maybe how to up the mtu of the UML "nics"? Impossible with the current code - it emulates an ethernet card, so I don't think you can do anything for this. However I hope it does not cause too much problems. Search for the network howtos: someone explained how to set the interface packet scheduler (maybe even Jeff Dike on the main UML site). Bye -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 |
From: Lars E. <Lar...@li...> - 2004-10-12 14:03:29
|
/ 2004-10-12 02:27:00 +0200 \ BlaisorBlade: > On Tuesday 12 October 2004 02:03, Lars Ellenberg wrote: > > / 2004-10-11 19:55:12 +0200 > > > > \ Lars Ellenberg: > > > since UML => Host does not work, but > > > Host => UML does, this suggests that > > > the bug is somewhere on the sending side. > > > > > > I was not able to track it down. > > > > just so you know, finally: > > > ======================= > > --- linux-2.6.6/arch/um/include/sysdep-i386/checksum.h.orig 2004-10-12 > > 01:50:49.000000000 +0200 +++ > > linux-2.6.6/arch/um/include/sysdep-i386/checksum.h 2004-10-12 > > 01:50:58.000000000 +0200 @@ -102,8 +102,7 @@ > > are modified, we must also specify them as outputs, or gcc > > will assume they contain their original values. */ > > > > : "=r" (sum), "=r" (iph), "=r" (ihl) > > > > - : "1" (iph), "2" (ihl) > > - : "memory"); > > + : "1" (iph), "2" (ihl)); > > return(sum); > > } > The patch you posted REMOVES a memory barrier - you reversed it. I actually > checked the barrier is missing in the source code; but the strange thing is > that you modified checksum.h.orig and not checksum.h! nope. I just did "diff -u linux-2.6.6/arch/um/include/sysdep-i386/checksum.h{,.orig}" instead of "diff -u linux-2.6.6/arch/um/include/sysdep-i386/checksum.h{.orig,}" and did not notice. Hey, it was in the middle of the night ;-) thanks, lge |