From: Xin L. <luc...@gm...> - 2021-06-29 21:41:47
|
On Tue, Jun 29, 2021 at 3:57 PM Jon Maloy <jm...@re...> wrote: > > > On 28/06/2021 15:16, Xin Long wrote: > > On Mon, Jun 28, 2021 at 3:03 PM Xin Long <luc...@gm...> wrote: > >> On Sun, Jun 27, 2021 at 3:44 PM Erin Shepherd <eri...@e4...> wrote: > >>> Xin Long <luc...@gm...> writes: > >>> > >>>> Currently, when userspace reads a datagram with a buffer that is > >>>> smaller than this datagram, the data will be truncated and only > >>>> part of it can be received by users. It doesn't seem right that > >>>> users don't know the datagram size and have to use a huge buffer > >>>> to read it to avoid the truncation. > >>>> > >>>> This patch to fix it by keeping the skb in rcv queue until the > >>>> whole data is read by users. Only the last msg of the datagram > >>>> will be marked with MSG_EOR, just as TCP/SCTP does. > Makes sense to me. > >>> I agree that the current behavior is suboptimal, but: > >>> > >>> * Isn't this the same behavior that other datagram socket types > >>> exhibit? It seems like this would make TIPC behave inconsistently > >>> compared to other transports > >> Yes, SCTP. > >> Do you see any reliable datagram transports not doing this? > >> > >>> * Isn't this a compatibility break with existing software? Particularly > >>> existing software will not expect to receive trailers of overlong > >>> datagrams > >> I talked to Jon about this, he seems okay with this. > >> > >>> It feels like this behavior should be activated either with a > >>> setsockopt(2) call or a new MSG_* flag passed to recv > >> Anyway, It may not be worth a new sockopt. > >> I'm thinking to pass MSG_EOR into sendmsg: > >> sendmsg(MSG_EOR). > > sorry, I meant recvmsg(); > > Still not sure I understand what you are suggesting here. Do you mean > that if we add MSG_EOR as a flag to recvmsg() that means we *don't* > want the remainder of the message, i.e., it is ok to truncate it? > > Or do you mean the opposite? Yes, Jon, I mean the opposite. when MSG_EOR is set, we will go with what this patch does, but to delete MSG_EOR if this is not the last part of the data, and keep MSG_EOR if this is the last part of the data. when MSG_EOR is not set, the msg will be truncated as before. > > In the first case, we don't solve any compatibility issue, if that is > the purpose. The programmer still has to add code to get the current > behavior. > > In the latter case we would be on the 100% safe side, although I have a > real hard time to see that this could be a real issue. Why would anybody > deliberately design an application for having messages truncated. > > ///jon > > > >> to indicate we don't want the truncating msg. > >> > >> When the msg flag returns with no MSG_EOR, it means there's more data to read. > >> > >> Thanks. > >>> - Erin > >>> > >>>> Signed-off-by: Xin Long <luc...@gm...> > >>>> --- > >>>> net/tipc/socket.c | 30 +++++++++++++++++++++--------- > >>>> 1 file changed, 21 insertions(+), 9 deletions(-) > >>>> > >>>> diff --git a/net/tipc/socket.c b/net/tipc/socket.c > >>>> index 34a97ea36cc8..504e59838b8b 100644 > >>>> --- a/net/tipc/socket.c > >>>> +++ b/net/tipc/socket.c > >>>> @@ -1880,6 +1880,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > >>>> bool connected = !tipc_sk_type_connectionless(sk); > >>>> struct tipc_sock *tsk = tipc_sk(sk); > >>>> int rc, err, hlen, dlen, copy; > >>>> + struct tipc_skb_cb *skb_cb; > >>>> struct sk_buff_head xmitq; > >>>> struct tipc_msg *hdr; > >>>> struct sk_buff *skb; > >>>> @@ -1903,6 +1904,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > >>>> if (unlikely(rc)) > >>>> goto exit; > >>>> skb = skb_peek(&sk->sk_receive_queue); > >>>> + skb_cb = TIPC_SKB_CB(skb); > >>>> hdr = buf_msg(skb); > >>>> dlen = msg_data_sz(hdr); > >>>> hlen = msg_hdr_sz(hdr); > >>>> @@ -1922,18 +1924,27 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > >>>> > >>>> /* Capture data if non-error msg, otherwise just set return value */ > >>>> if (likely(!err)) { > >>>> - copy = min_t(int, dlen, buflen); > >>>> - if (unlikely(copy != dlen)) > >>>> - m->msg_flags |= MSG_TRUNC; > >>>> - rc = skb_copy_datagram_msg(skb, hlen, m, copy); > >>>> + int offset = skb_cb->bytes_read; > >>>> + > >>>> + copy = min_t(int, dlen - offset, buflen); > >>>> + rc = skb_copy_datagram_msg(skb, hlen + offset, m, copy); > >>>> + if (unlikely(rc)) > >>>> + goto exit; > >>>> + if (unlikely(offset + copy < dlen)) { > >>>> + if (!(flags & MSG_PEEK)) > >>>> + skb_cb->bytes_read = offset + copy; > >>>> + } else { > >>>> + m->msg_flags |= MSG_EOR; > >>>> + skb_cb->bytes_read = 0; > >>>> + } > >>>> } else { > >>>> copy = 0; > >>>> rc = 0; > >>>> - if (err != TIPC_CONN_SHUTDOWN && connected && !m->msg_control) > >>>> + if (err != TIPC_CONN_SHUTDOWN && connected && !m->msg_control) { > >>>> rc = -ECONNRESET; > >>>> + goto exit; > >>>> + } > >>>> } > >>>> - if (unlikely(rc)) > >>>> - goto exit; > >>>> > >>>> /* Mark message as group event if applicable */ > >>>> if (unlikely(grp_evt)) { > >>>> @@ -1956,9 +1967,10 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > >>>> tipc_node_distr_xmit(sock_net(sk), &xmitq); > >>>> } > >>>> > >>>> - tsk_advance_rx_queue(sk); > >>>> + if (!skb_cb->bytes_read) > >>>> + tsk_advance_rx_queue(sk); > >>>> > >>>> - if (likely(!connected)) > >>>> + if (likely(!connected) || skb_cb->bytes_read) > >>>> goto exit; > >>>> > >>>> /* Send connection flow control advertisement when applicable */ > >>>> -- > >>>> 2.27.0 > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> tipc-discussion mailing list > >>>> tip...@li... > >>>> https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > > _______________________________________________ > > tipc-discussion mailing list > > tip...@li... > > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > |