Re: [Linuxptp-users] Hardware PTP clock synchronization
PTP IEEE 1588 stack for Linux
Brought to you by:
rcochran
From: Keller, J. E <jac...@in...> - 2013-08-02 17:54:09
|
Hi Alexander, On Fri, 2013-08-02 at 09:33 +0300, Гаврилов Александр wrote: > Thanks for your interest in my problem! > Would it be possible for you to attempt using LinuxPTP 1.3 (released today!)? or the git tree? I know you said you were using version 1.2, but it appears based on your error log that you are not using the version which fixes the tx timestamp timeout issue by using poll(). Please just for clarity re-test using LinuxPTP 1.3? :) > lspci output: > Intel 82576 Gigabit ethernet controller (rev 01) > Intel 82576 Gigabit ethernet controller (rev 01) > > ethtool output: > ethtool -T p16p1 > Time stamping parameters for p16p1: > Capabilities: > hardware-transmit (SOF_TIMESTAMPING_TX_HARDWARE) > software-transmit (SOF_TIMESTAMPING_TX_SOFTWARE) > hardware-receive (SOF_TIMESTAMPING_RX_HARDWARE) > software-receive (SOF_TIMESTAMPING_RX_SOFTWARE) > software-system-clock (SOF_TIMESTAMPING_SOFTWARE) > hardware-raw-clock (SOF_TIMESTAMPING_RAW_HARDWARE) > PTP Hardware Clock: 0 > Hardware Transmit Timestamp Modes: > off (HWTSTAMP_TX_OFF) > on (HWTSTAMP_TX_ON) > Hardware Receive Filter Modes: > none (HWTSTAMP_FILTER_NONE) > ptpv1-l4-sync (HWTSTAMP_FILTER_PTP_V1_L4_SYNC) > ptpv1-l4-delay-req (HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ) > ptpv2-l4-sync (HWTSTAMP_FILTER_PTP_V2_L4_SYNC) > ptpv2-l4-delay-req (HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ) > ptpv2-l2-sync (HWTSTAMP_FILTER_PTP_V2_L2_SYNC) > ptpv2-l2-delay-req (HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ) > ptpv2-event (HWTSTAMP_FILTER_PTP_V2_EVENT) > > > You are on a modern kernel. Please do not use -p /dev/ptpX option. Let the program automatically determine it. > > Ok, it works. > Was hoping this would solve the problem... :( Ok. > > How about just increasing the tx_timestamp_timeout configuration option to 100 or 1000? > > Set a tx_timestamp_timeout value to 1000 in configuration file. > > > Also please re-try with -P, for peer to peer delay mechanism. > I realize now that you are using a switch which probably only does one mode.. > ptp4l -2 -i p16p1 -P -m -H -s > ptp4l[728.271]: selected /dev/ptp0 as PTP clock > ptp4l[728.288]: port 1: INITIALIZING to LISTENING on INITIALIZE > ptp4l[728.288]: port 0: INITIALIZING to LISTENING on INITIALIZE > ptp4l[729.462]: port 1: new foreign master ece555.fffe.2de639-2 > ptp4l[733.452]: selected best master clock ece555.fffe.2de639 > ptp4l[733.452]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE I just noticed this. This is weird.. > ptp4l[734.458]: recvmsg tx timestamp failed: Resource temporarily unavailable You get the tx timestamp failed error instead of the polling failed error. That means that the LinuxPTP attempts to poll and then still fails to receive the message. I am very unsure why this happens. I believe this is the right place to look regarding the issue. > ptp4l[734.458]: port 1: send peer delay request failed > ptp4l[734.458]: port 1: UNCALIBRATED to FAULTY on FAULT_DETECTED > > > I wrote the original igb ptp code, but I never tested the 82576 > because I don't have one. I am not sure if Intel tested this either. > Richard wrote the original patch which added PTP support into the driver, and then a developer here on my team updated it to support newer devices. I know the validation team for the driver did some testing, but the 82576 is not very good for PTP. (details on why below) > > Basically, the 82576 is not a very reliable part for timestamping... > What kind of adapter would you recommend instead of 82576? (gigabit). > Richard suggested the i210, and I also would recommend this part. Obviously I am somewhat biased as I am an Intel engineer. However this part definitely is better than the 82576 because it supports timestamping all packets (vastly reducing issues) On Fri, 2013-08-02 at 09:48 +0300, Гаврилов Александр wrote: > Basically, the 82576 is not a very reliable part for timestamping. > > Why? > > Sincerely, Alexander. > > The 82576 was one of the first parts Intel ND did with 1588 support. The hardware was done years before we had a stable PTP kernel support, and before a lot of things were understood well, and therefor the design of the MAC chip does not really work well for 1588 support. Mainly, when timestamping it can only store one timestamp for RX and TX and the driver has to process packets fast enough, otherwise dropped timestamps can occur. This is the issue you are seeing. Newer parts have solved this issue by enabling the device to insert a timestamp for each packet directly into the packet buffer. > > In any case, thank you very much! > I asked Matthew Vick, the developer who worked on the igb driver, and did the refreshed PTP work. He has since moved onto other work so he does not have much time to work on this. However, he did some touch testing of the 82576, and was able to get it to work.76e10e95 I believe the likely issue you are seeing is related to the switch you are working against (he tried back to back). Do you know the Sync transmit rate? It is normally once per second. However some switches set this to a much higher rate. This causes issues because the 82576 part cannot really handle timestamping packets if they arrive too quickly together, which would be the result of dropped timestamps. some questions, and other information that may help: 1) if you are comfortable instrumenting LinuxPTP code, could you please add some pr_err() calls into sk.c's sk_receive call around line 221? This is really bothering me that the poll isn't timing out and is indeed waking up with something on the error queue when there is nothing there. 2) could you get a dmesg log for your device, and also an ethtool -i on your device so I know the igb version number you are using... 3) what is the sync rate of the switch? once per second? or faster? 4) could you send me a packet dump of the packets transmitted? via tcpdump.. I am very concerned because you are in an incredibly weird state.. It looks like poll() succeeds but then recvmsg fails, which is just not good. I don't know what could be causing this... Hope we can debug this and resolve the issue.. Thanks, Jake |