Menu

#6 Implement adaptive message length/FD vector length for recvmsg

0.1
open
Tetsujin
None
2017-07-08
2017-07-08
Tetsujin
No

Presently the recvmsg builtin is designed more or less as a direct interface to the recvmsg() system call: in particular the command has a compiled-in maximum message size and maximum number of file descriptors per message. If these limits are exceeded, data may be truncated or the command may fail. This doesn't give the caller much control over how much data is pulled from a SOCK_STREAM or any way to receive a message from SOCK_DGRAM or SOCK_SEQPACKET whose size exceeds the built-in limit.

Some details of how the different socket types behave are likely to be very platform-specific. However I believe some common ground can be established.

SOCK_STREAM sockets mostly act like simple byte-streams, except that when file descriptors are sent over the socket, the point in the stream where the file descriptors were attached becomes a kind of message boundary (at least, on Linux it does...) If recvmsg() is called with a buffer size large enough to cross one of these boundaries, the received data will not include anything beyond the last byte of data that accompanied the file descriptors when they were sent over the socket. This creates a bit of an awkward situation: on the one hand, using recvmsg to read data from a SOCK_STREAM should probably work similarly to the read builtin, and pull data from the socket only until some delimiter character is encountered, or some other condition is satisfied. On the other hand, if a single invocation of recvmsg results in multiple calls to the underlying system call, then sets of file descriptors collected in these multiple calls will be grouped together, and information about where they were placed in the stream will effectively be lost. I think these relationships are important: normally one would expect file descriptors sent over the channel to be accompanied by data describing their significance or how they should be used. Associating each file descriptor with the proper piece of descriptive data then becomes a synchronization problem. If recvmsg for SOCK_STREAM is changed to behave more like read, and to disregard "message boundaries" established by attached file descriptors, care should be taken to provide a way to re-establish the position of file descriptors in the stream.

SOCK_DGRAM for Unix domain sockets is described in manuals as an "unordered, unreliable" connection scheme, much like UDP in the internet domain: However, in practice, there is no reason for Unix Domain Sockets to ever be unordered or unreliable. On Linux, each send() produces a distinct message on the stream, and removing messages from the stream with recv() or recvmsg() is all-or-nothing: if the caller reads the first 10 bytes of a 100 byte message, the remaining bytes are simply lost. For this reason, the normal mode of operation for the recvmsg command on a datagram stream should be to read the entire message.

SOCK_SEQPACKET may be a problem: Some sources describe it as like SOCK_STREAM but with message boundaries, with the apparent implication that it's conceptually like SOCK_DGRAM but with the possibility of reading a part of a message and leaving the rest for later. However on Linux it appears that SEQPACKET for Unix Domain Sockets simply works identically to DGRAM: Each read is all-or-nothing. The best strategy therefore may be to adopt the same semantics for both socket types: attempt to read a whole message unless the caller specifies other arguments establishing how much data should be read. (But in any case a single recvmsg from a DGRAM or SEQPACKET socket should never yield data that crosses message boundaries.)

There are mechanisms provided on Linux to find out how large a message is without reading it. On Linux 3.4 and higher calling recvmsg() with MSG_TRUNC provides the size of the next packet. (But this may not be portable enough to consider using...) Another alternative may be ioctl(FIONREAD). If the packet size cannot be determined in advance, the alternative is to read the message with MSG_PEEK, and if the message winds up truncated, increase the buffer size and try again.

Likewise, if a message contains more file descriptors than the caller is prepared to accept, recvmsg() will return the flag MSG_CTRUNC: if this is encountered during MSG_PEEK then the limit can be expanded, and another attempt made.

It should be noted that these various strategies are all vulnerable to race conditions if there are multiple processes reading data from the socket. It may be worth providing a non-adaptive mode for receiving datagrams/packets that stresses atomicity.

Acceptance Criteria
When using recvmsg with a DGRAM or SEQPACKET socket, the default behavior should be to read the entire message
When using recvmsg with a STREAM socket, behavior should be similar to the "read" shell builtin: In particular, the caller should be able to specify an exact number of characters or bytes to read, specify a delimiter that should terminate the read if it's encountered, and without these options the default behavior should be to read a line.

Discussion


Log in to post a comment.