Re: [Accel-ppp-users] discarding message with invalid tid 0
Status: Beta
Brought to you by:
xebd
|
From: Guillaume N. <g....@al...> - 2018-10-12 15:21:14
|
On Sat, Oct 06, 2018 at 10:33:18AM +0200, Alarig Le Lay wrote: > On ven. 5 oct. 17:41:12 2018, Guillaume Nault wrote: > > Which kernel version is your Gentoo? Could you send a pcap of a > > successful tunnel establishment (one from the Gentoo and one from the > > FreeBSD, as you did in your previous message)? > > My Gentoo runs 4.14.65. > The offload on it is currently configured as following: > lns02 ~ # ethtool -k eth2 | grep offload > tcp-segmentation-offload: on > udp-fragmentation-offload: off > generic-segmentation-offload: on > generic-receive-offload: on > large-receive-offload: off [fixed] > rx-vlan-offload: off [fixed] > tx-vlan-offload: off [fixed] > l2-fwd-offload: off [fixed] > hw-tc-offload: off [fixed] > esp-hw-offload: off [fixed] > esp-tx-csum-hw-offload: off [fixed] > rx-udp_tunnel-port-offload: off [fixed] > > The pcap for the Gentoo (LNS) and the FreeBSD (BGP) are here: > https://bulbizarre.swordarmor.fr/garbage/documents/l2tp-bgp-up.pcap > https://bulbizarre.swordarmor.fr/garbage/documents/l2tp-lns-up.pcap > Ok, so I have taken a look at these network captures. Sorry for the delay. In the pcap taken on the LNS, the outer UDP checksum is partial. That is, it is filled with the pseudo-header's sum only, instead of the full checksum covering the whole data. That means you have checksum offload activated on this host. The driver is supposed to complete the checksum later, before it sends the packet on the wire. Your first LNS network trace exhibited the same behaviour (which is perfectly normal as tcpdump captures packets before the driver computes the checksum). So nothing changed on this side. However, the pcap files taken on the router differ. On the new pcap, the checksum seen on the router is correct. That is, the partial checksum has been properly completed before the packet was sent. But on the original pcap, the router did see the same partial checksum as seen on the LNS capture. So the driver sent it as-is, without finalising the computation. The problem is probably not related to accel-ppp or l2tp at all. For sending data on the control channel, a plain UDP socket is used. Therefore I believe that any UDP (and probably TCP) packet sent by this host in the original setup would have invalid checksum. There are a few places where the bug might come from. Either the host doesn't tell the driver that checksum isn't fully computed, or the driver advertises checksum offload capabilities without actually implementing it. If you use virtualisation, the virtual NIC might advertise offload support and rely on the physical NIC driver to actually perform the computation. If the physical NIC doesn't offer this feature, the virtual NIC should provide software fallback. I suppose there is a bug somewhere in this chain. So I imagine something changed in your setup wrt. checksum offload, either on the LNS or the hypervisor. > > I guess something is still wrong. It would be strange if it was the > > router that was at fault. > > My guess is that, by disabling offloads, the router stopped checking > > the UDP checksum of transit packets. If the LAC does not verify UDP > > checksums, then invalid checksums don't prevent the LAC and LNS from > > communicating. If you capture trafic on the router, do you still see > > invalid checksums for packets originated by the LNS? And correct > > checksums for packets sent by the other LNS? > > On the other LNS, I see [no chksum] for the L2TP packets, but [udp sum > ok] for the packets inside the L2TP. >From what I can see from the pcap, your provider disables checksums of L2TP data packets (outer header). This is common behaviour. Original packets are sent as-is, so the inner headers and checksums aren't modified. Guillaume |