Re: [Nbd] NBD wishlist items?
Brought to you by:
yoe
|
From: Mike S. <sn...@gm...> - 2007-06-21 14:16:20
|
On 6/21/07, Wouter Verhelst <w...@ut...> wrote: > [Paul: This is in response to a mail I sent, asking for wishlist items. > Should probably have Cc'ed you on the original mail, but you may want to > check this and give your opinion] > > On Tue, Jun 19, 2007 at 07:22:59PM -0400, Mike Snitzer wrote: > > Wouter, > > > > I've got one I'd like to run by you: have the nbd-client detect that > > the kernel's nbd connection to the nbd-server is unresponsive. > > > > There is a window of time where a potentially long TCP timeout > > prevents the nbd (in kernel) from erroring out back to userspace > > (nbd-client). But if the nbd-client could feel that the nbd isn't > > behaving the nbd-client could send a SIGKILL down to the kernel (nbd > > driver already aborts TCP transmit if a SIGKILL is received). > > > > Any ideas on how we might pull this off? > > Yeah, the question of reliability of the connection is a big one, and > one I'm not sure can be implemented properly without protocol changes. > > Currently, the client doesn't have a real ability to send a packet to > the server to ask "are you still alive". Worse, the server doesn't have > any ability to send an unsolicited message to the client; if it believes > the client is dead, there currently is no way to check. > > I'm thinking it would be good to extend the protocol with two packets, > one PING and one PONG (or so) that could be sent by either the client or > the server, and that could allow either of them to check whether the > other is still there. It should include a timeout of (say) 60 seconds > (this could probably be negotiated during the handshake) during which > the other side has to reply with the appropriate packet; if it doesn't, > it is assumed dead and the connection will be killed. > > I don't think any other way can reliably allow either the client or the > server to detect the other end's death. We're using TCP keepalive probes > right now already, and there's the -a option to nbd-server, but both are > not really a good solution -- the former because it takes literally days > to discover a lost connection, the latter because it a) assumes that > there is never a good reason for a client to be inactive for more than > the time given on the nbd-server command line, b) only allows the server > to detect the death of the client, never the other way around, and, > well, c) because the implementation is broken currently :) > > Implementing this backwards-compatibly is going to be the hardest part, > I guess. Perhaps opening one of the NBD devices to call a specific ioctl > to verify whether it supports this interface, and then setting a bit in > the field of 'reserved' bits in the handshake could work, but I'm not > sure how this would be best done. Thanks for the detailed response. Yesterday I said that this nbd-client connection monitoring likely doesn't belong in the nbd-client but I was looking at the problem in a completely different way than what you suggested with protocol level changes. I think the PING and PONG protocol for both client and server could be very good. I look forward to Paul's thoughts on this suggestion. For existing nbd installations, I was looking for a less intrusive solution to checking the nbd-client (and by association /dev/nbd<x>) was still fully connected and functional. So as to avoid changing the kernel et al; the following utility obviously wouldn't require protocol changes but it only addresses a subset of the concerns with nbd connection reliability. This utility could be as simple as an nbd-client -monitor (or nbd-monitor) that upon start (via: nbd-monitor -t <timeout> [-p <nbdClientPid>] /dev/nbd<x>) would lookup the nbd-client process (with the new /sys/block/nbd<x>/pid interface) or just use the specified nbdClientPid. It would then go on to use multiple processes to periodically perform a timed read from /dev/nbd<x>. Each iteration would fork a child process that performs a read from /dev/nbd<x> and the parent process would wait the specified timeout before killing the child's pid. If the read times out or fails (and the nbd-client's pid is no longer running) the determined nbd-client's pid gets a SIGKILL. Please let me know if what I'm suggesting is fundamentally flawed in some way. I'm going to take a stab at prototyping this nbd-monitor in perl and give it a shot on SLES10 (where I happen to hit the TCP timeout issue more frequently/reliably than any other kernel). Mike |