Re: [Nbd] NBD wishlist items?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 6/21/07, Wouter Verhelst <w...@ut...> wrote:
> [Paul: This is in response to a mail I sent, asking for wishlist items.
> Should probably have Cc'ed you on the original mail, but you may want to
> check this and give your opinion]
>
> On Tue, Jun 19, 2007 at 07:22:59PM -0400, Mike Snitzer wrote:
> > Wouter,
> >
> > I've got one I'd like to run by you: have the nbd-client detect that
> > the kernel's nbd connection to the nbd-server is unresponsive.
> >
> > There is a window of time where a potentially long TCP timeout
> > prevents the nbd (in kernel) from erroring out back to userspace
> > (nbd-client).  But if the nbd-client could feel that the nbd isn't
> > behaving the nbd-client could send a SIGKILL down to the kernel (nbd
> > driver already aborts TCP transmit if a SIGKILL is received).
> >
> > Any ideas on how we might pull this off?
>
> Yeah, the question of reliability of the connection is a big one, and
> one I'm not sure can be implemented properly without protocol changes.
>
> Currently, the client doesn't have a real ability to send a packet to
> the server to ask "are you still alive". Worse, the server doesn't have
> any ability to send an unsolicited message to the client; if it believes
> the client is dead, there currently is no way to check.
>
> I'm thinking it would be good to extend the protocol with two packets,
> one PING and one PONG (or so) that could be sent by either the client or
> the server, and that could allow either of them to check whether the
> other is still there. It should include a timeout of (say) 60 seconds
> (this could probably be negotiated during the handshake) during which
> the other side has to reply with the appropriate packet; if it doesn't,
> it is assumed dead and the connection will be killed.
>
> I don't think any other way can reliably allow either the client or the
> server to detect the other end's death. We're using TCP keepalive probes
> right now already, and there's the -a option to nbd-server, but both are
> not really a good solution -- the former because it takes literally days
> to discover a lost connection, the latter because it a) assumes that
> there is never a good reason for a client to be inactive for more than
> the time given on the nbd-server command line, b) only allows the server
> to detect the death of the client, never the other way around, and,
> well, c) because the implementation is broken currently :)
>
> Implementing this backwards-compatibly is going to be the hardest part,
> I guess. Perhaps opening one of the NBD devices to call a specific ioctl
> to verify whether it supports this interface, and then setting a bit in
> the field of 'reserved' bits in the handshake could work, but I'm not
> sure how this would be best done.

Thanks for the detailed response.  Yesterday I said that this
nbd-client connection monitoring likely doesn't belong in the
nbd-client but I was looking at the problem in a completely different
way than what you suggested with protocol level changes.  I think the
PING and PONG protocol for both client and server could be very good.
I look forward to Paul's thoughts on this suggestion.

For existing nbd installations, I was looking for a less intrusive
solution to checking the nbd-client (and by association /dev/nbd<x>)
was still fully connected and functional.  So as to avoid changing the
kernel et al; the following utility obviously wouldn't require
protocol changes but it only addresses a subset of the concerns with
nbd connection reliability.

This utility could be as simple as an nbd-client -monitor (or
nbd-monitor) that upon start (via: nbd-monitor -t <timeout> [-p
<nbdClientPid>] /dev/nbd<x>) would lookup the nbd-client process (with
the new /sys/block/nbd<x>/pid interface) or just use the specified
nbdClientPid.  It would then go on to use multiple processes to
periodically perform a timed read from /dev/nbd<x>.  Each iteration
would fork a child process that performs a read from /dev/nbd<x> and
the parent process would wait the specified timeout before killing the
child's pid.  If the read times out or fails (and the nbd-client's pid
is no longer running) the determined nbd-client's pid gets a SIGKILL.

Please let me know if what I'm suggesting is fundamentally flawed in
some way.  I'm going to take a stab at prototyping this nbd-monitor in
perl and give it a shot on SLES10 (where I happen to hit the TCP
timeout issue more frequently/reliably than any other kernel).

Mike