|
From: Christopher W. <us...@wi...> - 2021-11-08 21:28:25
|
On 11/8/2021 12:38 PM, Vladimir Oltean wrote: > On Mon, Nov 08, 2021 at 12:11:11PM -0800, Christopher Wingert wrote: >> Hi, >> >> I am working with a Aquantia AQC 107 ethernet interface. After the announce >> message is sent on FD_GENERAL, a poll() of the the FD_GENERAL descriptor >> generates a POLLERR. I see 3 delay messages go out the interface on >> FD_EVENT (previous to the announce message) without issue (no socket error >> on read on the FD_EVENT descriptor). >> >> The only difference i see between the two sockets is how the sock_filter is >> setup. >> >> I am thinking this is an issue with the Aquantia driver, as the same command >> on a Mellanox Connect X5 works fine. >> >> Has anyone seen this issue or have a clue as to where I should start? >> >> Thanks! >> Chris >> >> >> ptp4l command line : ptp4l -i els1 -H -P -2 -m >> Kernel is 4.18 >> I downloaded the latest Atlantic driver from the Marvell website 2.4.14.0 >> I have upgraded the AQC 107 firmware to 3.1.121 > I've no experience with this driver whatsoever, but generally, what > ptp4l receives on the error queue of a socket is a TX timestamp. What is > surprising is that there's a TX timestamp for a general (not event) > message, because ptp4l does not ask these to be timestamped. > > Apart from the error messages, does the system otherwise behave ok? > > You can try to read from the general message socket into a packet buffer > and hexdump it, put it in tcpdump and see what it is. Then the next step > might be to process its control messages (cmsg), although my first guess > would be that TX timestamping is what's going on. > > There are plenty of things that could go wrong in a driver (especially > in one you downloaded from the vendor's website and not from kernel.org). > If you're handy with the source code, you can check what is the > condition based on which this driver offers hardware TX timestamps to > the stack. It should be if skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP > is set for that packet, AND hardware TX timestamping was requested > through HWTSTAMP_TX_ON. Thank you for the quick response! This is what the current version from git looks like on the 107 without any code changes (3 delay requests, 1 announce), this loops indefinitely and MASTER never gets enabled. ptp4l[506134.862]: selected /dev/ptp11 as PTP clock ptp4l[506134.889]: port 1 (els1): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[506134.889]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[506134.889]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[506141.948]: port 1 (els1): LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES ptp4l[506141.948]: selected local clock ac1f6b.fffe.dce92d as best master ptp4l[506141.948]: port 1 (els1): assuming the grand master role ptp4l[506141.950]: port 1 (els1): unexpected socket error ptp4l[506141.950]: port 1 (els1): MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) I changed raw.c function raw_send() to the below code to get the timestamp on both sockets. /* * Get the time stamp right away. */ // return event == TRANS_EVENT ? sk_receive(fd, pkt, len, NULL, hwts, MSG_ERRQUEUE) : cnt; if ( event == TRANS_EVENT ) return sk_receive(fd, pkt, len, NULL, hwts, MSG_ERRQUEUE); if ( event == TRANS_GENERAL ) return sk_receive(fd, pkt, len, NULL, hwts, MSG_ERRQUEUE); return cnt; This is the result. ptp4l[506201.215]: selected /dev/ptp11 as PTP clock ptp4l[506201.245]: port 1 (els1): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[506201.245]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[506201.245]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[506208.757]: port 1 (els1): LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES ptp4l[506208.757]: selected local clock ac1f6b.fffe.dce92d as best master ptp4l[506208.757]: port 1 (els1): assuming the grand master role ptp4l[506208.759]: poll for tx timestamp woke up on non ERR event ptp4l[506208.759]: port 1 (els1): send announce failed ptp4l[506208.759]: port 1 (els1): MASTER to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) Unless there is something wrong in my code change, it doesn't seem to be a timestamp. Are you saying that every POLLERR should be combined with a message in the Error Queue? |