Thread: [Linuxptp-users] Multicast group (re)join issue vs IGMP snooping
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: Janusz U. <j.u...@el...> - 2022-02-17 16:26:27
|
Hi, Does the issue still concern PTP for Linux: "Dear IEEE 1588 implementers: don't forget about IGMP if you do multicast! « The PTP guy (logdown.com) " http://theptpguy.logdown.com/posts/2015/08/31/dear-ieee-1588-implementers-remember-about-igmp ? From our observations for IPv4 (L4) E2E it still applies to ptp4l. On Cisco and Netgear switches (even L3 set as L2 switch) IGMP snooping must be disabled for proper PTP multicast work. ptp4l joins to multicast group once on start (only once, ie. it triggers Linux kernel once to send IGMP packet): udp.c: err = setsockopt(fd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &req, sizeof(req)); What is impact such implementation when interface goes down and up again, or on network recrossing? There is no periodic rejoin package observed. For IPv6 likely there is no problem but we have not tested. Maybe it is Linux kernel issue. However for comparison ptpd2 implements periodic multicast group (re)join (optionmaster_igmp_refresh_interval): src/dep/net.c:netInitMulticastIPv4() https://github.com/ptpd/ptpd/blob/master/src/dep/net.c#L605 https://github.com/ptpd/ptpd/blob/master/src/dep/net.c#L2278 https://github.com/ptpd/ptpd/blob/master/src/protocol.c#L607 https://github.com/ptpd/ptpd/blob/master/src/protocol.c#L1120 Other proprietary PTP devices send IGMP packets periodically every 1-2s and work with the switches despite IGMP snooping enabled... best regards Janusz |
From: Keller, J. E <jac...@in...> - 2022-02-17 23:29:22
|
On 2/17/2022 8:26 AM, Janusz Użycki wrote: > Hi, > > Does the issue still concern PTP for Linux: > "Dear IEEE 1588 implementers: don't forget about IGMP if you do > multicast! « The PTP guy (logdown.com) " > http://theptpguy.logdown.com/posts/2015/08/31/dear-ieee-1588-implementers-remember-about-igmp > ? > > From our observations for IPv4 (L4) E2E it still applies to ptp4l. On > Cisco and Netgear switches (even L3 set as L2 switch) IGMP snooping must > be disabled for proper PTP multicast work. > ptp4l joins to multicast group once on start (only once, ie. it triggers > Linux kernel once to send IGMP packet): > udp.c: err = setsockopt(fd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &req, > sizeof(req)); > What is impact such implementation when interface goes down and up > again, or on network recrossing? There is no periodic rejoin package > observed. I think we at least need to re-send these when the link changes, though we perhaps were thinking that the kernel does this for us? Maybe thats a bug or maybe we have to set an option? There is no mention about whether we would need to re-issue this socket option in the "man 7 ip" manual page. > For IPv6 likely there is no problem but we have not tested. > Maybe it is Linux kernel issue. However for comparison ptpd2 implements > periodic multicast group (re)join (optionmaster_igmp_refresh_interval): > src/dep/net.c:netInitMulticastIPv4() > https://github.com/ptpd/ptpd/blob/master/src/dep/net.c#L605 > https://github.com/ptpd/ptpd/blob/master/src/dep/net.c#L2278 > https://github.com/ptpd/ptpd/blob/master/src/protocol.c#L607 > https://github.com/ptpd/ptpd/blob/master/src/protocol.c#L1120 Other > proprietary PTP devices send IGMP packets periodically every 1-2s and > work with the switches despite IGMP snooping enabled... > This sounds like what we should be doing as well. > best regards > Janusz > > > > > _______________________________________________ > Linuxptp-users mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxptp-users |
From: Richard C. <ric...@gm...> - 2022-02-17 23:49:35
|
On Thu, Feb 17, 2022 at 11:29:03PM +0000, Keller, Jacob E wrote: > I think we at least need to re-send these when the link changes, We do, because the link down/up causes the transport to be closed and opened again. > though > we perhaps were thinking that the kernel does this for us? Maybe thats a > bug or maybe we have to set an option? > > There is no mention about whether we would need to re-issue this socket > option in the "man 7 ip" manual page. >From the application POV, there is only joining the group. After that, there is nothing more to do. > > Maybe it is Linux kernel issue. Then take it up on the netdev list please. > However for comparison ptpd2 implements > > periodic multicast group (re)join (optionmaster_igmp_refresh_interval): > > src/dep/net.c:netInitMulticastIPv4() > > https://github.com/ptpd/ptpd/blob/master/src/dep/net.c#L605 > > https://github.com/ptpd/ptpd/blob/master/src/dep/net.c#L2278 > > https://github.com/ptpd/ptpd/blob/master/src/protocol.c#L607 > > https://github.com/ptpd/ptpd/blob/master/src/protocol.c#L1120 Other > > proprietary PTP devices send IGMP packets periodically every 1-2s and Crazy to spam the network like that. > > work with the switches despite IGMP snooping enabled... > > > > This sounds like what we should be doing as well. No, I don't agree. This is no different than ARP. If the kernel needs to re-send IGMP, then let it. Cisco's PTP switch has issues. Have you looked at the residence times? I don't feel like implementing workarounds for buggy hardware. Thanks, Richard |
From: Keller, J. E <jac...@in...> - 2022-02-18 01:37:05
|
> -----Original Message----- > From: Richard Cochran <ric...@gm...> > Sent: Thursday, February 17, 2022 3:49 PM > To: Keller, Jacob E <jac...@in...> > Cc: lin...@li... > Subject: Re: [Linuxptp-users] Multicast group (re)join issue vs IGMP snooping > > On Thu, Feb 17, 2022 at 11:29:03PM +0000, Keller, Jacob E wrote: > > > I think we at least need to re-send these when the link changes, > > We do, because the link down/up causes the transport to be closed and > opened again. > Ok, but from the sound of the original email on this thread, this isn't happening today? Thanks, Jake |
From: Jakub R. <j.r...@el...> - 2022-02-18 09:08:29
|
> 18.02.2022 00:51 Richard Cochran <ric...@gm...> wrote: > On Thu, Feb 17, 2022 at 05:26:15PM +0100, Janusz Użycki wrote: > > > From our observations for IPv4 (L4) E2E it still applies to ptp4l. On Cisco > > and Netgear switches (even L3 set as L2 switch) IGMP snooping must be > > disabled for proper PTP multicast work. > > So their "IGMP snooping" feature is broken. Ask them to fix it. I disagree. As far as I can tell, IGMP resending is done by kernel from version 4.9 upon notification. So basically, ptp4l will not work with any switch using IGMP in any way on older kernels. We probably should assume that devices with older kernel are all EOL but it is not uncommon for them to be still used. So maybe some annotation on what kernel version will ptp4l work or what are required kernel modules. > Cisco's PTP switch has issues. Have you looked at the residence > times? I don't feel like implementing workarounds for buggy >hardware. And is depending on kernel features for ptp4l to work properly a good way? I do not think it is hardware issue here, but rather dependence on kernel version and/or modules. Best regards Jakub |
From: Jakub R. <j.r...@el...> - 2022-02-23 11:49:56
|
> 23.02.2022 08:55 Miroslav Lichvar <mli...@re...> wrote: > > > On Tue, Feb 22, 2022 at 03:59:00PM +0100, Jakub Raczyński wrote: > > So first question - I checked linuxptp source and it seems that once PTP has ever synchronized, it will always synchronize ntpd, whether linuxptp is connected to Master device or is in holdover. Is it indented behavior? Is it targeted for custom PTP devices that have much better holdover than ntpd could ever have? > > No, ntpd should be getting samples only when the port is in the slave > state. Did you start phc2sys with the -a option? > > -- > Miroslav Lichvar Actually no, this device is set with slaveOnly flag thus -a for phc2sys was deemed as not needed. I run phc2sys with following parameters "phc2sys -s lan1 -E ntpshm -M 4 -w" and '-w' flag is said to be incompatible with '-a' flag. Should this be reason for that strange behavior? Best regards Jakub |
From: Miroslav L. <mli...@re...> - 2022-02-23 11:57:12
|
On Wed, Feb 23, 2022 at 12:49:37PM +0100, Jakub Raczyński wrote: > Actually no, this device is set with slaveOnly flag thus -a for phc2sys was deemed as not needed. I run phc2sys with following parameters "phc2sys -s lan1 -E ntpshm -M 4 -w" and '-w' flag is said to be incompatible with '-a' flag. > Should this be reason for that strange behavior? Yes, without -a phc2sys is not monitoring the port state. It doesn't matter if it's phc2sys or ntpd via SHM correcting the clock. -- Miroslav Lichvar |
From: Jakub R. <j.r...@el...> - 2022-02-23 12:01:10
|
> 23.02.2022 12:56 Miroslav Lichvar <mli...@re...> wrote: > > Should this be reason for that strange behavior? > > Yes, without -a phc2sys is not monitoring the port state. It doesn't > matter if it's phc2sys or ntpd via SHM correcting the clock. > > -- > Miroslav Lichvar Ok, that makes sense. Thanks for your help. Jakub |
From: Keller, J. E <jac...@in...> - 2022-02-18 17:43:12
|
> -----Original Message----- > From: Jakub Raczyński <j.r...@el...> > Sent: Friday, February 18, 2022 1:08 AM > To: lin...@li... > Subject: Re: [Linuxptp-users] Multicast group (re)join issue vs IGMP snooping > > > 18.02.2022 00:51 Richard Cochran <ric...@gm...> wrote: > > On Thu, Feb 17, 2022 at 05:26:15PM +0100, Janusz Użycki wrote: > > > > > From our observations for IPv4 (L4) E2E it still applies to ptp4l. On Cisco > > > and Netgear switches (even L3 set as L2 switch) IGMP snooping must be > > > disabled for proper PTP multicast work. > > > > So their "IGMP snooping" feature is broken. Ask them to fix it. > > I disagree. As far as I can tell, IGMP resending is done by kernel from version 4.9 > upon notification. So basically, ptp4l will not work with any switch using IGMP in > any way on older kernels. We probably should assume that devices with older > kernel are all EOL but it is not uncommon for them to be still used. So maybe > some annotation on what kernel version will ptp4l work or what are required > kernel modules. > > > Cisco's PTP switch has issues. Have you looked at the residence > > times? I don't feel like implementing workarounds for buggy >hardware. > > And is depending on kernel features for ptp4l to work properly a good way? I do > not think it is hardware issue here, but rather dependence on kernel version > and/or modules. > ptp4l already has a lot of "it works better on newer kernel" bits, due to interfacing with Linux kernel interfaces so directly. I think the earliest technically working version is something from the 3.x series (3.5?) but I don't see a problem with documenting a newer version as recommended with the justifications of which features we rely on. I think I agree with Richard that this is the kernel's problem, and that we shouldn't try to forcibly re-send IGMP messages ourselves (in fact, there is no real interface to do this in the socket options.. There's simply a join and leave group option, but no mention of needing to re-join periodically or what attempting to join an already joined group would do). Thanks, Jake > Best regards > Jakub > > > _______________________________________________ > Linuxptp-users mailing list > Lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxptp-users |
From: Richard C. <ric...@gm...> - 2022-02-18 22:10:08
|
On Fri, Feb 18, 2022 at 05:42:57PM +0000, Keller, Jacob E wrote: > (in fact, there is no real interface to do this in the socket > options.. There's simply a join and leave group option, but no > mention of needing to re-join periodically or what attempting to > join an already joined group would do). Exactly. You can't expect user space to blindly retry joining a group with zero feedback as to whether the join was successful or not. Doing so is voodoo engineering. If there is a bug, then it is a kernel bug. Thanks, Richard |
From: Richard C. <ric...@gm...> - 2022-02-18 22:12:55
|
On Fri, Feb 18, 2022 at 10:08:20AM +0100, Jakub Raczyński wrote: > I disagree. As far as I can tell, IGMP resending is done by kernel > from version 4.9 upon notification. So basically, ptp4l will not > work with any switch using IGMP in any way on older kernels. What do you mean? Are kernels 4.10+ working correctly, but 4.9 and earlier not working? Thanks, Richard |
From: Keller, J. E <jac...@in...> - 2022-02-18 23:05:18
|
> -----Original Message----- > From: Richard Cochran <ric...@gm...> > Sent: Friday, February 18, 2022 2:13 PM > To: Jakub Raczyński <j.r...@el...> > Cc: lin...@li... > Subject: Re: [Linuxptp-users] Multicast group (re)join issue vs IGMP snooping > > On Fri, Feb 18, 2022 at 10:08:20AM +0100, Jakub Raczyński wrote: > > > I disagree. As far as I can tell, IGMP resending is done by kernel > > from version 4.9 upon notification. So basically, ptp4l will not > > work with any switch using IGMP in any way on older kernels. > > What do you mean? > > Are kernels 4.10+ working correctly, but 4.9 and earlier not working? > > Thanks, > Richard > FWIW, it looks like there are some sysctls for this, from ip sysctl documentation: igmpv2_unsolicited_report_interval - INTEGER The interval in milliseconds in which the next unsolicited IGMPv1 or IGMPv2 report retransmit will take place. Default: 10000 (10 seconds) igmpv3_unsolicited_report_interval - INTEGER The interval in milliseconds in which the next unsolicited IGMPv3 report retransmit will take place. Default: 1000 (1 seconds) It looks like the default for IGMPv3 was changed from 10 to 1second in cab70040dfd9 ("net: igmp: Reduce Unsolicited report interval to 1s when using IGMPv3") and made configurable in 2690048c01f3 ("net: igmp: Allow user-space configuration of igmp unsolicited report interval") As far as I can tell, the default settings since a long time is to re-transmit an unsolicited join request with a random interval between 0 and 10 seconds as of the 2.6 git history. I couldn't find any data from before that. So I think Richard is quite right, if there is a bug, it is in the kernel, and as far as I can tell it is at least intended that the kernel will re-transmit these as necessary. Thanks, Jake |
From: Richard C. <ric...@gm...> - 2022-02-19 18:22:13
|
On Fri, Feb 18, 2022 at 11:05:05PM +0000, Keller, Jacob E wrote: > As far as I can tell, the default settings since a long time is to > re-transmit an unsolicited join request with a random interval > between 0 and 10 seconds as of the 2.6 git history. I couldn't find > any data from before that. IOW, there is no kernel bug and no linuxptp bug either. Thanks, Richard |
From: Jakub R. <j.r...@el...> - 2022-02-18 23:05:45
|
> On Fri, Feb 18, 2022 at 10:08:20AM +0100, Jakub Raczyński wrote: > > > I disagree. As far as I can tell, IGMP resending is done by kernel > > from version 4.9 upon notification. So basically, ptp4l will not > > work with any switch using IGMP in any way on older kernels. > > What do you mean? > > Are kernels 4.10+ working correctly, but 4.9 and earlier not working? I cannot say on which precisely, would need more studying of Linux kernel. Seems that newer version of 4.9 has this implemented. Basically, problem originates from this https://unix.stackexchange.com/questions/523529/should-the-linux-kernel-perform-an-igmp-rejoin-on-link-up And ptp4l does not take care of that and neither are some kernel versions. So basically ptp4l will only work on switch with IGMP snooping if it has been started after the switch, while any restart of switch will cause all PTP traffic to be blocked permanently. The question is whether ptp4l should worry or that or not. I must agree that would be a bit of "voodoo engineering". But some products seem to have locked kernel version, that is EOL for some time, yet no updated are available. For user, it is hard to debug why is it a case (why PTP multicast, both L2 and L4, is blocked on switch), so at least some big warning about this would be appreciated. Best regards Jakub |
From: Richard C. <ric...@gm...> - 2022-02-19 18:25:11
|
On Sat, Feb 19, 2022 at 12:05:29AM +0100, Jakub Raczyński wrote: > I cannot say on which precisely, would need more studying of Linux kernel. Seems that newer version of 4.9 has this implemented. > Basically, problem originates from this > https://unix.stackexchange.com/questions/523529/should-the-linux-kernel-perform-an-igmp-rejoin-on-link-up > And ptp4l does not take care of that and neither are some kernel versions. You are mistaken. ptp4l closes its sockets on link down and then opens new sockets (joining once again) on link up. > The question is whether ptp4l should worry or that or not. I must > agree that would be a bit of "voodoo engineering". But some products > seem to have locked kernel version, that is EOL for some time, yet > no updated are available. For user, it is hard to debug why is it a > case (why PTP multicast, both L2 and L4, is blocked on switch), so > at least some big warning about this would be appreciated. It isn't my job to debug your switches. Please take the issue up the switch vendors. Thanks, Richard |
From: Jakub R. <j.r...@el...> - 2022-02-20 16:52:31
|
> 19.02.2022 19:24 Richard Cochran <ric...@gm...> wrote: > You are mistaken. ptp4l closes its sockets on link down and then > opens new sockets (joining once again) on link up. You misunderstand. I am gonna repeat myself: "So basically ptp4l will only work on switch with IGMP snooping if it has been started after the switch, while any restart of switch will cause all PTP traffic to be blocked permanently." So whatever you are blaming for that issue is not my concern, maybe kernel or some module should react to that but does not. Saying that, ptp4l is UNUSABLE on switches with IGMP snooping (many vendors tried and few kernel versions) on devices with older kernels. More tests are required for newer kernels. Best regards Jakub Raczynski |
From: Miroslav L. <mli...@re...> - 2022-02-21 08:44:05
|
On Sun, Feb 20, 2022 at 05:52:17PM +0100, Jakub Raczyński wrote: > So whatever you are blaming for that issue is not my concern, maybe kernel or some module should react to that but does not. Saying that, ptp4l is UNUSABLE on switches with IGMP snooping (many vendors tried and few kernel versions) on devices with older kernels. More tests are required for newer kernels. It might help if you could post some logs that show the problem with corresponding packet capture. If this was a linuxptp or kernel issue impacting a large number of switches, I think we would know about it much sooner. -- Miroslav Lichvar |
From: Richard C. <ric...@gm...> - 2022-02-21 14:53:17
|
On Sun, Feb 20, 2022 at 05:52:17PM +0100, Jakub Raczyński wrote: > You misunderstand. I am gonna repeat myself: > "So basically ptp4l will only work on switch with IGMP snooping if it has been started after the switch, while any restart of switch will cause all PTP traffic to be blocked permanently." You have described a bug in the switch. You have three choices: 1. Ask the vendor to fix their switch FW. 2. Don't re-start the switch when snooping is enabled. 3. Don't make snooping a persistent setting, but rather enable it via scripting after restarting the switch. I really don't see any linuxptp or kernel issue at all. I definitely will not accept hacks in my project for this kind of switch misbehavior. Sorry, Richard |
From: Richard C. <ric...@gm...> - 2022-02-21 15:14:23
|
On Mon, Feb 21, 2022 at 06:53:09AM -0800, Richard Cochran wrote: > 1. Ask the vendor to fix their switch FW. > > 2. Don't re-start the switch when snooping is enabled. > > 3. Don't make snooping a persistent setting, but rather enable it via > scripting after restarting the switch. > 4. Disable snooping on your switch. Thanks, Richard |
From: Richard C. <ric...@gm...> - 2022-02-17 23:51:20
|
On Thu, Feb 17, 2022 at 05:26:15PM +0100, Janusz Użycki wrote: > From our observations for IPv4 (L4) E2E it still applies to ptp4l. On Cisco > and Netgear switches (even L3 set as L2 switch) IGMP snooping must be > disabled for proper PTP multicast work. So their "IGMP snooping" feature is broken. Ask them to fix it. Thanks, Richard |
From: Jakub R. <j.r...@el...> - 2022-02-22 14:59:12
Attachments:
slave_log.bin
|
Greetings, I would like to ask a question/report an issue. Our setup: PTP Master & Slave based on imx6ul module, connected via simple switch. We set ptp4l to synchronize ntpd. Issue: When we synchronize PTP device to Master and then disconnect ethernet cable from PTP Slave (from device, not from switch), the PTP Slave device changes portState to 'Faulty' and time gets stuck, but still synchronized ntpd. So first question - I checked linuxptp source and it seems that once PTP has ever synchronized, it will always synchronize ntpd, whether linuxptp is connected to Master device or is in holdover. Is it indented behavior? Is it targeted for custom PTP devices that have much better holdover than ntpd could ever have? Second question - is this bug caused by ethernet driver or is it linuxptp issue? I am not asking to check imx6ul driver but rather answer whether it should have been prevented by linuxptp. It is related to first question a bit, as it would not happen if linuxptp marked 'Faulty' Master as invalid ntpd source. Attaching log and explanation - i used command 'watch' to log PTP and ntpd state before/during/after ethernet cable was disconnected. When cable is disconnected portState switches to 'Faulty' and time gets stuck; it gets unstuck when cable in reconnected (all included in log file) Thanks for the help. Best regards Jakub Raczynski |
From: Keller, J. E <jac...@in...> - 2022-02-22 22:02:02
|
> -----Original Message----- > From: Jakub Raczyński <j.r...@el...> > Sent: Tuesday, February 22, 2022 6:59 AM > To: lin...@li... > Subject: [Linuxptp-users] ntpd synchronization issue to ptp source > > So first question - I checked linuxptp source and it seems that once PTP has ever > synchronized, it will always synchronize ntpd, whether linuxptp is connected to > Master device or is in holdover. Is it indented behavior? Is it targeted for custom > PTP devices that have much better holdover than ntpd could ever have? > I would expect most PTP devices to have a better holdover time than ntpd. |
From: Miroslav L. <mli...@re...> - 2022-02-23 07:55:21
|
On Tue, Feb 22, 2022 at 03:59:00PM +0100, Jakub Raczyński wrote: > So first question - I checked linuxptp source and it seems that once PTP has ever synchronized, it will always synchronize ntpd, whether linuxptp is connected to Master device or is in holdover. Is it indented behavior? Is it targeted for custom PTP devices that have much better holdover than ntpd could ever have? No, ntpd should be getting samples only when the port is in the slave state. Did you start phc2sys with the -a option? -- Miroslav Lichvar |
From: Jakub R. <j.r...@el...> - 2022-02-22 16:10:19
|
> 21.02.2022 09:43 Miroslav Lichvar <mli...@re...> wrote: > It might help if you could post some logs that show the problem with > corresponding packet capture. Sure thing. So here is a setup - device 10.10.2.236 is PTP Master, 10.10.2.237 is PTP Slave, 10.10.4.1 is switch with IGMP snooping. There is also simple switch used in use, connected to 10.10.4.1 switch and PC with wireshark. Case 1: Both PTP Master and Slave are connected to simple switch. PTP Slave is started first, then later PTP Master. As intended, PTP Slave keeps sending IGMP join request until it connects to any PTP Master. Case 2: PTP Master and PTP Slave are running, PTP Master is connected to simple switch, PTP Slave is behind turned off IGMP switch. PTP Master packets are visible in wireshark. After that switch with IGMP is started. UNINTENDED - ptp4l Master nor Slave will ever send IGMP join so multicast is blocked. Case 3: Reversed connection between PTP Master & Slave - monitoring PTP Slave, PTP Master was turned off and turned on later. UNINTENDED - as in case 2. Case 4: Both PTP Slave & Master are running, monitoring PTP Master, restarting IGMP switch. None of PTP packets can make it through switch. Case 5: Continuation of case 4, but we plug out and then plug in both PTP Master and Slave. Both devices send IGMP and can synchronize through IGMP switch. I believe this whole behavior is combination of ethernet driver and kernel. But I want to ask, should ptp4l monitor this rather than send only IGMP join packets at the start? Because I can see devices blatantly ignoring 'Membership queries' under specific circumstances (showed in the logs). So should ptp4l really depend on kernel or other drivers to work in such cases? Best regards Jakub Raczynski |
From: Denny P. <den...@me...> - 2022-02-22 18:43:11
|
The concept of Multicast membership is owned by the kernel. ptp4l does not, and can not, send IGMP packets. Denny > On Feb 22, 2022, at 08:10, Jakub Raczyński <j.r...@el...> wrote: > > But I want to ask, should ptp4l monitor this rather than send only IGMP join packets at the start? Because I can see devices blatantly ignoring 'Membership queries' under specific circumstances (showed in the logs). So should ptp4l really depend on kernel or other drivers to work in such cases? |