Re: [Nbd] NBD wishlist items?
Brought to you by:
yoe
|
From: Wouter V. <w...@ut...> - 2007-06-21 11:23:49
|
[Paul: This is in response to a mail I sent, asking for wishlist items. Should probably have Cc'ed you on the original mail, but you may want to check this and give your opinion] On Tue, Jun 19, 2007 at 07:22:59PM -0400, Mike Snitzer wrote: > Wouter, > > I've got one I'd like to run by you: have the nbd-client detect that > the kernel's nbd connection to the nbd-server is unresponsive. > > There is a window of time where a potentially long TCP timeout > prevents the nbd (in kernel) from erroring out back to userspace > (nbd-client). But if the nbd-client could feel that the nbd isn't > behaving the nbd-client could send a SIGKILL down to the kernel (nbd > driver already aborts TCP transmit if a SIGKILL is received). > > Any ideas on how we might pull this off? Yeah, the question of reliability of the connection is a big one, and one I'm not sure can be implemented properly without protocol changes. Currently, the client doesn't have a real ability to send a packet to the server to ask "are you still alive". Worse, the server doesn't have any ability to send an unsolicited message to the client; if it believes the client is dead, there currently is no way to check. I'm thinking it would be good to extend the protocol with two packets, one PING and one PONG (or so) that could be sent by either the client or the server, and that could allow either of them to check whether the other is still there. It should include a timeout of (say) 60 seconds (this could probably be negotiated during the handshake) during which the other side has to reply with the appropriate packet; if it doesn't, it is assumed dead and the connection will be killed. I don't think any other way can reliably allow either the client or the server to detect the other end's death. We're using TCP keepalive probes right now already, and there's the -a option to nbd-server, but both are not really a good solution -- the former because it takes literally days to discover a lost connection, the latter because it a) assumes that there is never a good reason for a client to be inactive for more than the time given on the nbd-server command line, b) only allows the server to detect the death of the client, never the other way around, and, well, c) because the implementation is broken currently :) Implementing this backwards-compatibly is going to be the hardest part, I guess. Perhaps opening one of the NBD devices to call a specific ioctl to verify whether it supports this interface, and then setting a bit in the field of 'reserved' bits in the handshake could work, but I'm not sure how this would be best done. -- <Lo-lan-do> Home is where you have to wash the dishes. -- #debian-devel, Freenode, 2004-09-22 |