From: Xin L. <luc...@gm...> - 2021-06-26 03:40:34
|
Currently, when userspace reads a datagram with a buffer that is smaller than this datagram, the data will be truncated and only part of it can be received by users. It doesn't seem right that users don't know the datagram size and have to use a huge buffer to read it to avoid the truncation. This patch to fix it by keeping the skb in rcv queue until the whole data is read by users. Only the last msg of the datagram will be marked with MSG_EOR, just as TCP/SCTP does. Signed-off-by: Xin Long <luc...@gm...> --- net/tipc/socket.c | 30 +++++++++++++++++++++--------- 1 file changed, 21 insertions(+), 9 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 34a97ea36cc8..504e59838b8b 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -1880,6 +1880,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, bool connected = !tipc_sk_type_connectionless(sk); struct tipc_sock *tsk = tipc_sk(sk); int rc, err, hlen, dlen, copy; + struct tipc_skb_cb *skb_cb; struct sk_buff_head xmitq; struct tipc_msg *hdr; struct sk_buff *skb; @@ -1903,6 +1904,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, if (unlikely(rc)) goto exit; skb = skb_peek(&sk->sk_receive_queue); + skb_cb = TIPC_SKB_CB(skb); hdr = buf_msg(skb); dlen = msg_data_sz(hdr); hlen = msg_hdr_sz(hdr); @@ -1922,18 +1924,27 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, /* Capture data if non-error msg, otherwise just set return value */ if (likely(!err)) { - copy = min_t(int, dlen, buflen); - if (unlikely(copy != dlen)) - m->msg_flags |= MSG_TRUNC; - rc = skb_copy_datagram_msg(skb, hlen, m, copy); + int offset = skb_cb->bytes_read; + + copy = min_t(int, dlen - offset, buflen); + rc = skb_copy_datagram_msg(skb, hlen + offset, m, copy); + if (unlikely(rc)) + goto exit; + if (unlikely(offset + copy < dlen)) { + if (!(flags & MSG_PEEK)) + skb_cb->bytes_read = offset + copy; + } else { + m->msg_flags |= MSG_EOR; + skb_cb->bytes_read = 0; + } } else { copy = 0; rc = 0; - if (err != TIPC_CONN_SHUTDOWN && connected && !m->msg_control) + if (err != TIPC_CONN_SHUTDOWN && connected && !m->msg_control) { rc = -ECONNRESET; + goto exit; + } } - if (unlikely(rc)) - goto exit; /* Mark message as group event if applicable */ if (unlikely(grp_evt)) { @@ -1956,9 +1967,10 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, tipc_node_distr_xmit(sock_net(sk), &xmitq); } - tsk_advance_rx_queue(sk); + if (!skb_cb->bytes_read) + tsk_advance_rx_queue(sk); - if (likely(!connected)) + if (likely(!connected) || skb_cb->bytes_read) goto exit; /* Send connection flow control advertisement when applicable */ -- 2.27.0 |
From: Xin L. <luc...@gm...> - 2021-07-16 21:44:18
|
Currently, when userspace reads a datagram with a buffer that is smaller than this datagram, the data will be truncated and only part of it can be received by users. It doesn't seem right that users don't know the datagram size and have to use a huge buffer to read it to avoid the truncation. This patch to fix it by keeping the skb in rcv queue until the whole data is read by users. Only the last msg of the datagram will be marked with MSG_EOR, just as TCP/SCTP does. Note that this will work as above only when MSG_EOR is set in the flags parameter of recvmsg(), so that it won't break any old user applications. Signed-off-by: Xin Long <luc...@gm...> Acked-by: Jon Maloy <jm...@re...> --- net/tipc/socket.c | 36 +++++++++++++++++++++++++++--------- 1 file changed, 27 insertions(+), 9 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 34a97ea36cc8..9b0b311c7ec1 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -1880,6 +1880,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, bool connected = !tipc_sk_type_connectionless(sk); struct tipc_sock *tsk = tipc_sk(sk); int rc, err, hlen, dlen, copy; + struct tipc_skb_cb *skb_cb; struct sk_buff_head xmitq; struct tipc_msg *hdr; struct sk_buff *skb; @@ -1903,6 +1904,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, if (unlikely(rc)) goto exit; skb = skb_peek(&sk->sk_receive_queue); + skb_cb = TIPC_SKB_CB(skb); hdr = buf_msg(skb); dlen = msg_data_sz(hdr); hlen = msg_hdr_sz(hdr); @@ -1922,18 +1924,33 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, /* Capture data if non-error msg, otherwise just set return value */ if (likely(!err)) { - copy = min_t(int, dlen, buflen); - if (unlikely(copy != dlen)) - m->msg_flags |= MSG_TRUNC; - rc = skb_copy_datagram_msg(skb, hlen, m, copy); + int offset = skb_cb->bytes_read; + + copy = min_t(int, dlen - offset, buflen); + rc = skb_copy_datagram_msg(skb, hlen + offset, m, copy); + if (unlikely(rc)) + goto exit; + if (unlikely(offset + copy < dlen)) { + if (flags & MSG_EOR) { + if (!(flags & MSG_PEEK)) + skb_cb->bytes_read = offset + copy; + } else { + m->msg_flags |= MSG_TRUNC; + skb_cb->bytes_read = 0; + } + } else { + if (flags & MSG_EOR) + m->msg_flags |= MSG_EOR; + skb_cb->bytes_read = 0; + } } else { copy = 0; rc = 0; - if (err != TIPC_CONN_SHUTDOWN && connected && !m->msg_control) + if (err != TIPC_CONN_SHUTDOWN && connected && !m->msg_control) { rc = -ECONNRESET; + goto exit; + } } - if (unlikely(rc)) - goto exit; /* Mark message as group event if applicable */ if (unlikely(grp_evt)) { @@ -1956,9 +1973,10 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, tipc_node_distr_xmit(sock_net(sk), &xmitq); } - tsk_advance_rx_queue(sk); + if (!skb_cb->bytes_read) + tsk_advance_rx_queue(sk); - if (likely(!connected)) + if (likely(!connected) || skb_cb->bytes_read) goto exit; /* Send connection flow control advertisement when applicable */ -- 2.27.0 |
From: Xin L. <luc...@gm...> - 2021-06-28 19:03:20
|
On Sun, Jun 27, 2021 at 3:44 PM Erin Shepherd <eri...@e4...> wrote: > > Xin Long <luc...@gm...> writes: > > > Currently, when userspace reads a datagram with a buffer that is > > smaller than this datagram, the data will be truncated and only > > part of it can be received by users. It doesn't seem right that > > users don't know the datagram size and have to use a huge buffer > > to read it to avoid the truncation. > > > > This patch to fix it by keeping the skb in rcv queue until the > > whole data is read by users. Only the last msg of the datagram > > will be marked with MSG_EOR, just as TCP/SCTP does. > > I agree that the current behavior is suboptimal, but: > > * Isn't this the same behavior that other datagram socket types > exhibit? It seems like this would make TIPC behave inconsistently > compared to other transports Yes, SCTP. Do you see any reliable datagram transports not doing this? > * Isn't this a compatibility break with existing software? Particularly > existing software will not expect to receive trailers of overlong > datagrams I talked to Jon about this, he seems okay with this. > > It feels like this behavior should be activated either with a > setsockopt(2) call or a new MSG_* flag passed to recv Anyway, It may not be worth a new sockopt. I'm thinking to pass MSG_EOR into sendmsg: sendmsg(MSG_EOR). to indicate we don't want the truncating msg. When the msg flag returns with no MSG_EOR, it means there's more data to read. Thanks. > > - Erin > > > Signed-off-by: Xin Long <luc...@gm...> > > --- > > net/tipc/socket.c | 30 +++++++++++++++++++++--------- > > 1 file changed, 21 insertions(+), 9 deletions(-) > > > > diff --git a/net/tipc/socket.c b/net/tipc/socket.c > > index 34a97ea36cc8..504e59838b8b 100644 > > --- a/net/tipc/socket.c > > +++ b/net/tipc/socket.c > > @@ -1880,6 +1880,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > > bool connected = !tipc_sk_type_connectionless(sk); > > struct tipc_sock *tsk = tipc_sk(sk); > > int rc, err, hlen, dlen, copy; > > + struct tipc_skb_cb *skb_cb; > > struct sk_buff_head xmitq; > > struct tipc_msg *hdr; > > struct sk_buff *skb; > > @@ -1903,6 +1904,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > > if (unlikely(rc)) > > goto exit; > > skb = skb_peek(&sk->sk_receive_queue); > > + skb_cb = TIPC_SKB_CB(skb); > > hdr = buf_msg(skb); > > dlen = msg_data_sz(hdr); > > hlen = msg_hdr_sz(hdr); > > @@ -1922,18 +1924,27 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > > > > /* Capture data if non-error msg, otherwise just set return value */ > > if (likely(!err)) { > > - copy = min_t(int, dlen, buflen); > > - if (unlikely(copy != dlen)) > > - m->msg_flags |= MSG_TRUNC; > > - rc = skb_copy_datagram_msg(skb, hlen, m, copy); > > + int offset = skb_cb->bytes_read; > > + > > + copy = min_t(int, dlen - offset, buflen); > > + rc = skb_copy_datagram_msg(skb, hlen + offset, m, copy); > > + if (unlikely(rc)) > > + goto exit; > > + if (unlikely(offset + copy < dlen)) { > > + if (!(flags & MSG_PEEK)) > > + skb_cb->bytes_read = offset + copy; > > + } else { > > + m->msg_flags |= MSG_EOR; > > + skb_cb->bytes_read = 0; > > + } > > } else { > > copy = 0; > > rc = 0; > > - if (err != TIPC_CONN_SHUTDOWN && connected && !m->msg_control) > > + if (err != TIPC_CONN_SHUTDOWN && connected && !m->msg_control) { > > rc = -ECONNRESET; > > + goto exit; > > + } > > } > > - if (unlikely(rc)) > > - goto exit; > > > > /* Mark message as group event if applicable */ > > if (unlikely(grp_evt)) { > > @@ -1956,9 +1967,10 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > > tipc_node_distr_xmit(sock_net(sk), &xmitq); > > } > > > > - tsk_advance_rx_queue(sk); > > + if (!skb_cb->bytes_read) > > + tsk_advance_rx_queue(sk); > > > > - if (likely(!connected)) > > + if (likely(!connected) || skb_cb->bytes_read) > > goto exit; > > > > /* Send connection flow control advertisement when applicable */ > > -- > > 2.27.0 > > > > > > > > _______________________________________________ > > tipc-discussion mailing list > > tip...@li... > > https://lists.sourceforge.net/lists/listinfo/tipc-discussion |
From: Xin L. <luc...@gm...> - 2021-06-28 19:16:37
|
On Mon, Jun 28, 2021 at 3:03 PM Xin Long <luc...@gm...> wrote: > > On Sun, Jun 27, 2021 at 3:44 PM Erin Shepherd <eri...@e4...> wrote: > > > > Xin Long <luc...@gm...> writes: > > > > > Currently, when userspace reads a datagram with a buffer that is > > > smaller than this datagram, the data will be truncated and only > > > part of it can be received by users. It doesn't seem right that > > > users don't know the datagram size and have to use a huge buffer > > > to read it to avoid the truncation. > > > > > > This patch to fix it by keeping the skb in rcv queue until the > > > whole data is read by users. Only the last msg of the datagram > > > will be marked with MSG_EOR, just as TCP/SCTP does. > > > > I agree that the current behavior is suboptimal, but: > > > > * Isn't this the same behavior that other datagram socket types > > exhibit? It seems like this would make TIPC behave inconsistently > > compared to other transports > Yes, SCTP. > Do you see any reliable datagram transports not doing this? > > > * Isn't this a compatibility break with existing software? Particularly > > existing software will not expect to receive trailers of overlong > > datagrams > I talked to Jon about this, he seems okay with this. > > > > > It feels like this behavior should be activated either with a > > setsockopt(2) call or a new MSG_* flag passed to recv > Anyway, It may not be worth a new sockopt. > I'm thinking to pass MSG_EOR into sendmsg: > sendmsg(MSG_EOR). sorry, I meant recvmsg(); > to indicate we don't want the truncating msg. > > When the msg flag returns with no MSG_EOR, it means there's more data to read. > > Thanks. > > > > - Erin > > > > > Signed-off-by: Xin Long <luc...@gm...> > > > --- > > > net/tipc/socket.c | 30 +++++++++++++++++++++--------- > > > 1 file changed, 21 insertions(+), 9 deletions(-) > > > > > > diff --git a/net/tipc/socket.c b/net/tipc/socket.c > > > index 34a97ea36cc8..504e59838b8b 100644 > > > --- a/net/tipc/socket.c > > > +++ b/net/tipc/socket.c > > > @@ -1880,6 +1880,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > > > bool connected = !tipc_sk_type_connectionless(sk); > > > struct tipc_sock *tsk = tipc_sk(sk); > > > int rc, err, hlen, dlen, copy; > > > + struct tipc_skb_cb *skb_cb; > > > struct sk_buff_head xmitq; > > > struct tipc_msg *hdr; > > > struct sk_buff *skb; > > > @@ -1903,6 +1904,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > > > if (unlikely(rc)) > > > goto exit; > > > skb = skb_peek(&sk->sk_receive_queue); > > > + skb_cb = TIPC_SKB_CB(skb); > > > hdr = buf_msg(skb); > > > dlen = msg_data_sz(hdr); > > > hlen = msg_hdr_sz(hdr); > > > @@ -1922,18 +1924,27 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > > > > > > /* Capture data if non-error msg, otherwise just set return value */ > > > if (likely(!err)) { > > > - copy = min_t(int, dlen, buflen); > > > - if (unlikely(copy != dlen)) > > > - m->msg_flags |= MSG_TRUNC; > > > - rc = skb_copy_datagram_msg(skb, hlen, m, copy); > > > + int offset = skb_cb->bytes_read; > > > + > > > + copy = min_t(int, dlen - offset, buflen); > > > + rc = skb_copy_datagram_msg(skb, hlen + offset, m, copy); > > > + if (unlikely(rc)) > > > + goto exit; > > > + if (unlikely(offset + copy < dlen)) { > > > + if (!(flags & MSG_PEEK)) > > > + skb_cb->bytes_read = offset + copy; > > > + } else { > > > + m->msg_flags |= MSG_EOR; > > > + skb_cb->bytes_read = 0; > > > + } > > > } else { > > > copy = 0; > > > rc = 0; > > > - if (err != TIPC_CONN_SHUTDOWN && connected && !m->msg_control) > > > + if (err != TIPC_CONN_SHUTDOWN && connected && !m->msg_control) { > > > rc = -ECONNRESET; > > > + goto exit; > > > + } > > > } > > > - if (unlikely(rc)) > > > - goto exit; > > > > > > /* Mark message as group event if applicable */ > > > if (unlikely(grp_evt)) { > > > @@ -1956,9 +1967,10 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > > > tipc_node_distr_xmit(sock_net(sk), &xmitq); > > > } > > > > > > - tsk_advance_rx_queue(sk); > > > + if (!skb_cb->bytes_read) > > > + tsk_advance_rx_queue(sk); > > > > > > - if (likely(!connected)) > > > + if (likely(!connected) || skb_cb->bytes_read) > > > goto exit; > > > > > > /* Send connection flow control advertisement when applicable */ > > > -- > > > 2.27.0 > > > > > > > > > > > > _______________________________________________ > > > tipc-discussion mailing list > > > tip...@li... > > > https://lists.sourceforge.net/lists/listinfo/tipc-discussion |
From: Jon M. <jm...@re...> - 2021-06-29 19:57:37
|
On 28/06/2021 15:16, Xin Long wrote: > On Mon, Jun 28, 2021 at 3:03 PM Xin Long <luc...@gm...> wrote: >> On Sun, Jun 27, 2021 at 3:44 PM Erin Shepherd <eri...@e4...> wrote: >>> Xin Long <luc...@gm...> writes: >>> >>>> Currently, when userspace reads a datagram with a buffer that is >>>> smaller than this datagram, the data will be truncated and only >>>> part of it can be received by users. It doesn't seem right that >>>> users don't know the datagram size and have to use a huge buffer >>>> to read it to avoid the truncation. >>>> >>>> This patch to fix it by keeping the skb in rcv queue until the >>>> whole data is read by users. Only the last msg of the datagram >>>> will be marked with MSG_EOR, just as TCP/SCTP does. Makes sense to me. >>> I agree that the current behavior is suboptimal, but: >>> >>> * Isn't this the same behavior that other datagram socket types >>> exhibit? It seems like this would make TIPC behave inconsistently >>> compared to other transports >> Yes, SCTP. >> Do you see any reliable datagram transports not doing this? >> >>> * Isn't this a compatibility break with existing software? Particularly >>> existing software will not expect to receive trailers of overlong >>> datagrams >> I talked to Jon about this, he seems okay with this. >> >>> It feels like this behavior should be activated either with a >>> setsockopt(2) call or a new MSG_* flag passed to recv >> Anyway, It may not be worth a new sockopt. >> I'm thinking to pass MSG_EOR into sendmsg: >> sendmsg(MSG_EOR). > sorry, I meant recvmsg(); Still not sure I understand what you are suggesting here. Do you mean that if we add MSG_EOR as a flag to recvmsg() that means we *don't* want the remainder of the message, i.e., it is ok to truncate it? Or do you mean the opposite? In the first case, we don't solve any compatibility issue, if that is the purpose. The programmer still has to add code to get the current behavior. In the latter case we would be on the 100% safe side, although I have a real hard time to see that this could be a real issue. Why would anybody deliberately design an application for having messages truncated. ///jon >> to indicate we don't want the truncating msg. >> >> When the msg flag returns with no MSG_EOR, it means there's more data to read. >> >> Thanks. >>> - Erin >>> >>>> Signed-off-by: Xin Long <luc...@gm...> >>>> --- >>>> net/tipc/socket.c | 30 +++++++++++++++++++++--------- >>>> 1 file changed, 21 insertions(+), 9 deletions(-) >>>> >>>> diff --git a/net/tipc/socket.c b/net/tipc/socket.c >>>> index 34a97ea36cc8..504e59838b8b 100644 >>>> --- a/net/tipc/socket.c >>>> +++ b/net/tipc/socket.c >>>> @@ -1880,6 +1880,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, >>>> bool connected = !tipc_sk_type_connectionless(sk); >>>> struct tipc_sock *tsk = tipc_sk(sk); >>>> int rc, err, hlen, dlen, copy; >>>> + struct tipc_skb_cb *skb_cb; >>>> struct sk_buff_head xmitq; >>>> struct tipc_msg *hdr; >>>> struct sk_buff *skb; >>>> @@ -1903,6 +1904,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, >>>> if (unlikely(rc)) >>>> goto exit; >>>> skb = skb_peek(&sk->sk_receive_queue); >>>> + skb_cb = TIPC_SKB_CB(skb); >>>> hdr = buf_msg(skb); >>>> dlen = msg_data_sz(hdr); >>>> hlen = msg_hdr_sz(hdr); >>>> @@ -1922,18 +1924,27 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, >>>> >>>> /* Capture data if non-error msg, otherwise just set return value */ >>>> if (likely(!err)) { >>>> - copy = min_t(int, dlen, buflen); >>>> - if (unlikely(copy != dlen)) >>>> - m->msg_flags |= MSG_TRUNC; >>>> - rc = skb_copy_datagram_msg(skb, hlen, m, copy); >>>> + int offset = skb_cb->bytes_read; >>>> + >>>> + copy = min_t(int, dlen - offset, buflen); >>>> + rc = skb_copy_datagram_msg(skb, hlen + offset, m, copy); >>>> + if (unlikely(rc)) >>>> + goto exit; >>>> + if (unlikely(offset + copy < dlen)) { >>>> + if (!(flags & MSG_PEEK)) >>>> + skb_cb->bytes_read = offset + copy; >>>> + } else { >>>> + m->msg_flags |= MSG_EOR; >>>> + skb_cb->bytes_read = 0; >>>> + } >>>> } else { >>>> copy = 0; >>>> rc = 0; >>>> - if (err != TIPC_CONN_SHUTDOWN && connected && !m->msg_control) >>>> + if (err != TIPC_CONN_SHUTDOWN && connected && !m->msg_control) { >>>> rc = -ECONNRESET; >>>> + goto exit; >>>> + } >>>> } >>>> - if (unlikely(rc)) >>>> - goto exit; >>>> >>>> /* Mark message as group event if applicable */ >>>> if (unlikely(grp_evt)) { >>>> @@ -1956,9 +1967,10 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, >>>> tipc_node_distr_xmit(sock_net(sk), &xmitq); >>>> } >>>> >>>> - tsk_advance_rx_queue(sk); >>>> + if (!skb_cb->bytes_read) >>>> + tsk_advance_rx_queue(sk); >>>> >>>> - if (likely(!connected)) >>>> + if (likely(!connected) || skb_cb->bytes_read) >>>> goto exit; >>>> >>>> /* Send connection flow control advertisement when applicable */ >>>> -- >>>> 2.27.0 >>>> >>>> >>>> >>>> _______________________________________________ >>>> tipc-discussion mailing list >>>> tip...@li... >>>> https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > _______________________________________________ > tipc-discussion mailing list > tip...@li... > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > |
From: Xin L. <luc...@gm...> - 2021-06-29 21:41:47
|
On Tue, Jun 29, 2021 at 3:57 PM Jon Maloy <jm...@re...> wrote: > > > On 28/06/2021 15:16, Xin Long wrote: > > On Mon, Jun 28, 2021 at 3:03 PM Xin Long <luc...@gm...> wrote: > >> On Sun, Jun 27, 2021 at 3:44 PM Erin Shepherd <eri...@e4...> wrote: > >>> Xin Long <luc...@gm...> writes: > >>> > >>>> Currently, when userspace reads a datagram with a buffer that is > >>>> smaller than this datagram, the data will be truncated and only > >>>> part of it can be received by users. It doesn't seem right that > >>>> users don't know the datagram size and have to use a huge buffer > >>>> to read it to avoid the truncation. > >>>> > >>>> This patch to fix it by keeping the skb in rcv queue until the > >>>> whole data is read by users. Only the last msg of the datagram > >>>> will be marked with MSG_EOR, just as TCP/SCTP does. > Makes sense to me. > >>> I agree that the current behavior is suboptimal, but: > >>> > >>> * Isn't this the same behavior that other datagram socket types > >>> exhibit? It seems like this would make TIPC behave inconsistently > >>> compared to other transports > >> Yes, SCTP. > >> Do you see any reliable datagram transports not doing this? > >> > >>> * Isn't this a compatibility break with existing software? Particularly > >>> existing software will not expect to receive trailers of overlong > >>> datagrams > >> I talked to Jon about this, he seems okay with this. > >> > >>> It feels like this behavior should be activated either with a > >>> setsockopt(2) call or a new MSG_* flag passed to recv > >> Anyway, It may not be worth a new sockopt. > >> I'm thinking to pass MSG_EOR into sendmsg: > >> sendmsg(MSG_EOR). > > sorry, I meant recvmsg(); > > Still not sure I understand what you are suggesting here. Do you mean > that if we add MSG_EOR as a flag to recvmsg() that means we *don't* > want the remainder of the message, i.e., it is ok to truncate it? > > Or do you mean the opposite? Yes, Jon, I mean the opposite. when MSG_EOR is set, we will go with what this patch does, but to delete MSG_EOR if this is not the last part of the data, and keep MSG_EOR if this is the last part of the data. when MSG_EOR is not set, the msg will be truncated as before. > > In the first case, we don't solve any compatibility issue, if that is > the purpose. The programmer still has to add code to get the current > behavior. > > In the latter case we would be on the 100% safe side, although I have a > real hard time to see that this could be a real issue. Why would anybody > deliberately design an application for having messages truncated. > > ///jon > > > >> to indicate we don't want the truncating msg. > >> > >> When the msg flag returns with no MSG_EOR, it means there's more data to read. > >> > >> Thanks. > >>> - Erin > >>> > >>>> Signed-off-by: Xin Long <luc...@gm...> > >>>> --- > >>>> net/tipc/socket.c | 30 +++++++++++++++++++++--------- > >>>> 1 file changed, 21 insertions(+), 9 deletions(-) > >>>> > >>>> diff --git a/net/tipc/socket.c b/net/tipc/socket.c > >>>> index 34a97ea36cc8..504e59838b8b 100644 > >>>> --- a/net/tipc/socket.c > >>>> +++ b/net/tipc/socket.c > >>>> @@ -1880,6 +1880,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > >>>> bool connected = !tipc_sk_type_connectionless(sk); > >>>> struct tipc_sock *tsk = tipc_sk(sk); > >>>> int rc, err, hlen, dlen, copy; > >>>> + struct tipc_skb_cb *skb_cb; > >>>> struct sk_buff_head xmitq; > >>>> struct tipc_msg *hdr; > >>>> struct sk_buff *skb; > >>>> @@ -1903,6 +1904,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > >>>> if (unlikely(rc)) > >>>> goto exit; > >>>> skb = skb_peek(&sk->sk_receive_queue); > >>>> + skb_cb = TIPC_SKB_CB(skb); > >>>> hdr = buf_msg(skb); > >>>> dlen = msg_data_sz(hdr); > >>>> hlen = msg_hdr_sz(hdr); > >>>> @@ -1922,18 +1924,27 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > >>>> > >>>> /* Capture data if non-error msg, otherwise just set return value */ > >>>> if (likely(!err)) { > >>>> - copy = min_t(int, dlen, buflen); > >>>> - if (unlikely(copy != dlen)) > >>>> - m->msg_flags |= MSG_TRUNC; > >>>> - rc = skb_copy_datagram_msg(skb, hlen, m, copy); > >>>> + int offset = skb_cb->bytes_read; > >>>> + > >>>> + copy = min_t(int, dlen - offset, buflen); > >>>> + rc = skb_copy_datagram_msg(skb, hlen + offset, m, copy); > >>>> + if (unlikely(rc)) > >>>> + goto exit; > >>>> + if (unlikely(offset + copy < dlen)) { > >>>> + if (!(flags & MSG_PEEK)) > >>>> + skb_cb->bytes_read = offset + copy; > >>>> + } else { > >>>> + m->msg_flags |= MSG_EOR; > >>>> + skb_cb->bytes_read = 0; > >>>> + } > >>>> } else { > >>>> copy = 0; > >>>> rc = 0; > >>>> - if (err != TIPC_CONN_SHUTDOWN && connected && !m->msg_control) > >>>> + if (err != TIPC_CONN_SHUTDOWN && connected && !m->msg_control) { > >>>> rc = -ECONNRESET; > >>>> + goto exit; > >>>> + } > >>>> } > >>>> - if (unlikely(rc)) > >>>> - goto exit; > >>>> > >>>> /* Mark message as group event if applicable */ > >>>> if (unlikely(grp_evt)) { > >>>> @@ -1956,9 +1967,10 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > >>>> tipc_node_distr_xmit(sock_net(sk), &xmitq); > >>>> } > >>>> > >>>> - tsk_advance_rx_queue(sk); > >>>> + if (!skb_cb->bytes_read) > >>>> + tsk_advance_rx_queue(sk); > >>>> > >>>> - if (likely(!connected)) > >>>> + if (likely(!connected) || skb_cb->bytes_read) > >>>> goto exit; > >>>> > >>>> /* Send connection flow control advertisement when applicable */ > >>>> -- > >>>> 2.27.0 > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> tipc-discussion mailing list > >>>> tip...@li... > >>>> https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > > _______________________________________________ > > tipc-discussion mailing list > > tip...@li... > > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > |
From: Jon M. <jm...@re...> - 2021-06-30 14:33:28
|
On 29/06/2021 17:41, Xin Long wrote: > On Tue, Jun 29, 2021 at 3:57 PM Jon Maloy <jm...@re...> wrote: >> [...] > Yes, Jon, I mean the opposite. > > when MSG_EOR is set, we will go with what this patch does, > but to delete MSG_EOR if this is not the last part of the data, > and keep MSG_EOR if this is the last part of the data. > > when MSG_EOR is not set, the msg will be truncated as before. Yes, that would be a safe behavior. Is SCTP doing this? ///jon > >> In the first case, we don't solve any compatibility issue, if that is >> the purpose. The programmer still has to add code to get the current >> behavior. >> >> In the latter case we would be on the 100% safe side, although I have a >> real hard time to see that this could be a real issue. Why would anybody >> deliberately design an application for having messages truncated. >> >> ///jon >> >> >>>> to indicate we don't want the truncating msg. >>>> >>>> When the msg flag returns with no MSG_EOR, it means there's more data to read. >>>> >>>> Thanks. >>>>> - Erin >>>>> >>>>>> Signed-off-by: Xin Long <luc...@gm...> >>>>>> --- >>>>>> net/tipc/socket.c | 30 +++++++++++++++++++++--------- >>>>>> 1 file changed, 21 insertions(+), 9 deletions(-) >>>>>> >>>>>> diff --git a/net/tipc/socket.c b/net/tipc/socket.c >>>>>> index 34a97ea36cc8..504e59838b8b 100644 >>>>>> --- a/net/tipc/socket.c >>>>>> +++ b/net/tipc/socket.c >>>>>> @@ -1880,6 +1880,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, >>>>>> bool connected = !tipc_sk_type_connectionless(sk); >>>>>> struct tipc_sock *tsk = tipc_sk(sk); >>>>>> int rc, err, hlen, dlen, copy; >>>>>> + struct tipc_skb_cb *skb_cb; >>>>>> struct sk_buff_head xmitq; >>>>>> struct tipc_msg *hdr; >>>>>> struct sk_buff *skb; >>>>>> @@ -1903,6 +1904,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, >>>>>> if (unlikely(rc)) >>>>>> goto exit; >>>>>> skb = skb_peek(&sk->sk_receive_queue); >>>>>> + skb_cb = TIPC_SKB_CB(skb); >>>>>> hdr = buf_msg(skb); >>>>>> dlen = msg_data_sz(hdr); >>>>>> hlen = msg_hdr_sz(hdr); >>>>>> @@ -1922,18 +1924,27 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, >>>>>> >>>>>> /* Capture data if non-error msg, otherwise just set return value */ >>>>>> if (likely(!err)) { >>>>>> - copy = min_t(int, dlen, buflen); >>>>>> - if (unlikely(copy != dlen)) >>>>>> - m->msg_flags |= MSG_TRUNC; >>>>>> - rc = skb_copy_datagram_msg(skb, hlen, m, copy); >>>>>> + int offset = skb_cb->bytes_read; >>>>>> + >>>>>> + copy = min_t(int, dlen - offset, buflen); >>>>>> + rc = skb_copy_datagram_msg(skb, hlen + offset, m, copy); >>>>>> + if (unlikely(rc)) >>>>>> + goto exit; >>>>>> + if (unlikely(offset + copy < dlen)) { >>>>>> + if (!(flags & MSG_PEEK)) >>>>>> + skb_cb->bytes_read = offset + copy; >>>>>> + } else { >>>>>> + m->msg_flags |= MSG_EOR; >>>>>> + skb_cb->bytes_read = 0; >>>>>> + } >>>>>> } else { >>>>>> copy = 0; >>>>>> rc = 0; >>>>>> - if (err != TIPC_CONN_SHUTDOWN && connected && !m->msg_control) >>>>>> + if (err != TIPC_CONN_SHUTDOWN && connected && !m->msg_control) { >>>>>> rc = -ECONNRESET; >>>>>> + goto exit; >>>>>> + } >>>>>> } >>>>>> - if (unlikely(rc)) >>>>>> - goto exit; >>>>>> >>>>>> /* Mark message as group event if applicable */ >>>>>> if (unlikely(grp_evt)) { >>>>>> @@ -1956,9 +1967,10 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, >>>>>> tipc_node_distr_xmit(sock_net(sk), &xmitq); >>>>>> } >>>>>> >>>>>> - tsk_advance_rx_queue(sk); >>>>>> + if (!skb_cb->bytes_read) >>>>>> + tsk_advance_rx_queue(sk); >>>>>> >>>>>> - if (likely(!connected)) >>>>>> + if (likely(!connected) || skb_cb->bytes_read) >>>>>> goto exit; >>>>>> >>>>>> /* Send connection flow control advertisement when applicable */ >>>>>> -- >>>>>> 2.27.0 >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> tipc-discussion mailing list >>>>>> tip...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/tipc-discussion >>> _______________________________________________ >>> tipc-discussion mailing list >>> tip...@li... >>> https://lists.sourceforge.net/lists/listinfo/tipc-discussion >>> |
From: Xin L. <luc...@gm...> - 2021-06-30 15:45:16
|
On Wed, Jun 30, 2021 at 10:33 AM Jon Maloy <jm...@re...> wrote: > > > On 29/06/2021 17:41, Xin Long wrote: > > On Tue, Jun 29, 2021 at 3:57 PM Jon Maloy <jm...@re...> wrote: > >> > [...] > > Yes, Jon, I mean the opposite. > > > > when MSG_EOR is set, we will go with what this patch does, > > but to delete MSG_EOR if this is not the last part of the data, > > and keep MSG_EOR if this is the last part of the data. > > > > when MSG_EOR is not set, the msg will be truncated as before. > > Yes, that would be a safe behavior. Is SCTP doing this? No, SCTP doesn't need to, as it doesn't truncate msg since the beginning. That's why no compatibility issue was caused. > > ///jon > > > > >> In the first case, we don't solve any compatibility issue, if that is > >> the purpose. The programmer still has to add code to get the current > >> behavior. > >> > >> In the latter case we would be on the 100% safe side, although I have a > >> real hard time to see that this could be a real issue. Why would anybody > >> deliberately design an application for having messages truncated. > >> > >> ///jon > >> > >> > >>>> to indicate we don't want the truncating msg. > >>>> > >>>> When the msg flag returns with no MSG_EOR, it means there's more data to read. > >>>> > >>>> Thanks. > >>>>> - Erin > >>>>> > >>>>>> Signed-off-by: Xin Long <luc...@gm...> > >>>>>> --- > >>>>>> net/tipc/socket.c | 30 +++++++++++++++++++++--------- > >>>>>> 1 file changed, 21 insertions(+), 9 deletions(-) > >>>>>> > >>>>>> diff --git a/net/tipc/socket.c b/net/tipc/socket.c > >>>>>> index 34a97ea36cc8..504e59838b8b 100644 > >>>>>> --- a/net/tipc/socket.c > >>>>>> +++ b/net/tipc/socket.c > >>>>>> @@ -1880,6 +1880,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > >>>>>> bool connected = !tipc_sk_type_connectionless(sk); > >>>>>> struct tipc_sock *tsk = tipc_sk(sk); > >>>>>> int rc, err, hlen, dlen, copy; > >>>>>> + struct tipc_skb_cb *skb_cb; > >>>>>> struct sk_buff_head xmitq; > >>>>>> struct tipc_msg *hdr; > >>>>>> struct sk_buff *skb; > >>>>>> @@ -1903,6 +1904,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > >>>>>> if (unlikely(rc)) > >>>>>> goto exit; > >>>>>> skb = skb_peek(&sk->sk_receive_queue); > >>>>>> + skb_cb = TIPC_SKB_CB(skb); > >>>>>> hdr = buf_msg(skb); > >>>>>> dlen = msg_data_sz(hdr); > >>>>>> hlen = msg_hdr_sz(hdr); > >>>>>> @@ -1922,18 +1924,27 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > >>>>>> > >>>>>> /* Capture data if non-error msg, otherwise just set return value */ > >>>>>> if (likely(!err)) { > >>>>>> - copy = min_t(int, dlen, buflen); > >>>>>> - if (unlikely(copy != dlen)) > >>>>>> - m->msg_flags |= MSG_TRUNC; > >>>>>> - rc = skb_copy_datagram_msg(skb, hlen, m, copy); > >>>>>> + int offset = skb_cb->bytes_read; > >>>>>> + > >>>>>> + copy = min_t(int, dlen - offset, buflen); > >>>>>> + rc = skb_copy_datagram_msg(skb, hlen + offset, m, copy); > >>>>>> + if (unlikely(rc)) > >>>>>> + goto exit; > >>>>>> + if (unlikely(offset + copy < dlen)) { > >>>>>> + if (!(flags & MSG_PEEK)) > >>>>>> + skb_cb->bytes_read = offset + copy; > >>>>>> + } else { > >>>>>> + m->msg_flags |= MSG_EOR; > >>>>>> + skb_cb->bytes_read = 0; > >>>>>> + } > >>>>>> } else { > >>>>>> copy = 0; > >>>>>> rc = 0; > >>>>>> - if (err != TIPC_CONN_SHUTDOWN && connected && !m->msg_control) > >>>>>> + if (err != TIPC_CONN_SHUTDOWN && connected && !m->msg_control) { > >>>>>> rc = -ECONNRESET; > >>>>>> + goto exit; > >>>>>> + } > >>>>>> } > >>>>>> - if (unlikely(rc)) > >>>>>> - goto exit; > >>>>>> > >>>>>> /* Mark message as group event if applicable */ > >>>>>> if (unlikely(grp_evt)) { > >>>>>> @@ -1956,9 +1967,10 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m, > >>>>>> tipc_node_distr_xmit(sock_net(sk), &xmitq); > >>>>>> } > >>>>>> > >>>>>> - tsk_advance_rx_queue(sk); > >>>>>> + if (!skb_cb->bytes_read) > >>>>>> + tsk_advance_rx_queue(sk); > >>>>>> > >>>>>> - if (likely(!connected)) > >>>>>> + if (likely(!connected) || skb_cb->bytes_read) > >>>>>> goto exit; > >>>>>> > >>>>>> /* Send connection flow control advertisement when applicable */ > >>>>>> -- > >>>>>> 2.27.0 > >>>>>> > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> tipc-discussion mailing list > >>>>>> tip...@li... > >>>>>> https://lists.sourceforge.net/lists/listinfo/tipc-discussion > >>> _______________________________________________ > >>> tipc-discussion mailing list > >>> tip...@li... > >>> https://lists.sourceforge.net/lists/listinfo/tipc-discussion > >>> > |
From: Jon M. <jm...@re...> - 2021-06-30 14:44:47
|
On 29/06/2021 16:10, Erin Shepherd wrote: > Jon Maloy <jm...@re...> writes: >> In the latter case we would be on the 100% safe side, although I have a >> real hard time to see that this could be a real issue. Why would anybody >> deliberately design an application for having messages truncated. > My concern would be more around people using the new behavior on > unsuspecting programs to do "packet smuggling" attacks > > Lets say you have Program A sending messages to Program B which contain > a header followed by some (variable length) data which can be controlled > by a third party. Program B reads messages into a 1024B buffer. > > If I were a malicious attacker, I might try to craft some data for > Program B to send which places a packet header just to appear just after > the new split point. With the new behavior, Program B would think this > was a header (legitimatedly) crafted by Program A. It is entirely > plausible that this header contains identity/trust information which > shouldn't be controllable by external third parties > > With the existing behavior, Program B probably discards these overlong > packets either becasue it sees the truncated flag is set, or because > they are malfformed. With the new behaviour, the first segment would > probably still get discarded, but follow up segments might look > plausible > > - Erin I think a sufficiently dedicated attacker always can find ways to inject packets as when the links are not encrypted. But given Xin's new suggestion we should be as safe as we can be regarding this scenario. ///jon > |