From: Steve S. <ste...@ho...> - 2002-10-11 20:44:00
Attachments:
net_rx.diff
|
Steve wrote: >The attached patch moves reactivate_fd so that it gets called only >after the fd has been drained, and no error condition has been >reported on the fd. Whoops, missed one... Steve Schmidtke _________________________________________________________________ Join the worlds largest e-mail service with MSN Hotmail. http://www.hotmail.com |
From: Jeff D. <jd...@ka...> - 2002-10-11 23:32:28
|
ste...@ho... said: > The attached patch moves reactivate_fd so that it gets called only > after the fd has been drained, and no error condition has been > reported on the fd. What problem are solving with this patch? The first chunk is definitely wrong. It can permanently shut down a device if a memory allocation fails. > Is there a way to ensure a fast writer can't keep the UML stuck in > uml_net_rx indefinitly? No. Physical boxes have the same problem, BTW. Jeff |
From: Steve S. <ste...@ho...> - 2002-10-12 01:41:58
|
jd...@ka... wrote: >ste...@ho... said: > > The attached patch moves reactivate_fd > >What problem are solving with this patch? I was tracking down a problem where if my slirp process died unexpectedly, I would get a firestorm of "Device umn read returned -9, shutting it down". I suspected that the reactivate_fd was causing, in that case, sigio_handler() to fire off interrupts, due to POLLERR being flagged, faster than dev_close() could shut the interface down. With the patch in place, that problem hasn't resurfaced. Whether or not my suspicions are true, it just seemed like the right thing to do for everyone as it makes networking slightly faster with reactivate_fd out of the loop; it only gets called once per SIGIO instead of at least twice (once for the packet data, then again for the EAGAIN (mapped to 0, of course)). >The first chunk is definitely wrong. It can permanently shut down >a device if a memory allocation fails. if dev_alloc_skb() fails, uml_net_rx() returns 0, which then falls through the else clause of 'if(err < 0)' in uml_net_interrupt() and thus reactivate_fd() gets called. Am I missing some other memory allocation? Steve Schmidtke _________________________________________________________________ MSN Photos is the easiest way to share and print your photos: http://photos.msn.com/support/worldwide.aspx |
From: Jeff D. <jd...@ka...> - 2002-10-12 02:56:45
|
ste...@ho... said: > if dev_alloc_skb() fails, uml_net_rx() returns 0, which then falls > through the else clause of 'if(err < 0)' in uml_net_interrupt() and > thus reactivate_fd() gets called. Where's that reactivate_fd? If it's not called in uml_net_rx, then it's not called at all. And the next poll() won't pick up a packet even if one's waiting, so the device is permanently dead. > I would get a firestorm of "Device umn read returned -9, shutting it > down". I suspected that the reactivate_fd was causing, in that case, > sigio_handler() to fire off interrupts, due to POLLERR being flagged, > faster than dev_close() could shut the interface down. We need better error handling, but that's not it. A permanent error like EBADF should either totally shut down the device (rather than just leaving the file descriptor disabled) or reconnect to the host. Jeff |
From: Steve S. <ste...@ho...> - 2002-10-12 03:41:23
|
Hi, jd...@ka... wrote: >Where's that reactivate_fd? If it's not called in uml_net_rx, then it's >not called at all. Jeff, are you reading your code? :) uml_net_rx() is static; it is, and pretty much can only be called by uml_net_interrupt(). These are all the exit paths I see out of uml_net_rx as orig: 1) dev_alloc_skb fails. reactivate_fd is called and 0 is returned to uml_net_interrupt, which terminates the while loop, and branches past the if, out of the function. 2) reactivate_fd is called and pkt_len > 0. the packet is injected into the IP subsystem and pkt_len (>0) is returned, restarting the while loop, and calling uml_net_rx again (choose 1,2,3,or 4 again): 3) reactivate_fd is called and pkt_len = 0. 0 is returned to uml_net_interrupt, which terminates the loop and branches past the if, out of the function. 4) reactivate_fd is called and pkt_len < 0. pkt_len (<0) is returned, terminating the while loop, and branching *into* the if, shutting down the device, and exiting the function. With the patch, these are the exit paths: 1) dev_alloc_skb fails. 0 is returned to uml_net_interrupt, which terminates the while loop, and branches into the else where reactivate_fd is called, and exits out of the function. 2) pkt_len > 0. the packet is injected into the IP subsystem and pkt_len (>0) is returned, restarting the while loop, and calling uml_net_rx again (choose 1,2,3, or 4 again): 3) pkt_len = 0. 0 is returned to uml_net_interrupt, which terminates the loop and branches into the else where reactivate_fd is called, and exits out of the function. 4) pkt_len < 0. pkt_len (<0) is returned, terminating the while loop, and branching *into* the if, shutting down the device, and exiting the function. What am I missing? How is this substantially different than what was there, besides the upside that reactivate_fd is called only if the fd is drained and no error condition occured. This means it's called once, and once only per interrupt, unless there's an error, in which case we wouldn't want to know about the fd until we've fixed whatever is causing the error in the first place. The only downside I can see is if the error is recoverable. In that case we *would* want to reactivate the fd. Are there any recoverable errors besides EINTR and EAGAIN (which are already taken care of) that should be handled? Steve Schmidtke And the next poll() won't pick up a packet even if one's >waiting, so the device is permanently dead. > > > I would get a firestorm of "Device umn read returned -9, shutting it > > down". I suspected that the reactivate_fd was causing, in that case, > > sigio_handler() to fire off interrupts, due to POLLERR being flagged, > > faster than dev_close() could shut the interface down. > >We need better error handling, but that's not it. A permanent error like >EBADF should either totally shut down the device (rather than just leaving >the file descriptor disabled) or reconnect to the host. > > Jeff _________________________________________________________________ MSN Photos is the easiest way to share and print your photos: http://photos.msn.com/support/worldwide.aspx |
From: Steve S. <ste...@ho...> - 2002-10-12 06:07:07
|
Hi, jd...@ka... wrote >A permanent error like EBADF should either totally shut down the device >(rather than just leaving the file descriptor disabled) or reconnect to the >host. just before uml_net_rx() would call dev_close(), it does: dev->flags &= ~IFF_UP; and the first thing dev_close() does is: if (!(dev->flags&IFF_UP)) return(0); which means uml_net_close doesn't get called. Is this what you mean by "rather than just leaving the descriptor disabled"? I'm guessing you are saying it's something can't be called from interrupt context, and it's gonna take hoops to shutdown a wayward device... Steve Schmidtke _________________________________________________________________ Join the worlds largest e-mail service with MSN Hotmail. http://www.hotmail.com |
From: Jeff D. <jd...@ka...> - 2002-10-13 03:17:07
|
ste...@ho... said: > Jeff, are you reading your code? :) Yeah, and that turns out to be the problem. I was reading my code, plus that first chunk without the rest of your patch :-) > Whether or not my suspicions are true, it just seemed like the right > thing to do for everyone as it makes networking slightly faster with > reactivate_fd out of the loop; it only gets called once per SIGIO > instead of at least twice (once for the packet data, then again for > the EAGAIN (mapped to 0, of course)). OK, I can buy that. We should fix that dev_close() thing. If it can't be called from interrupt context (and I see know reason offhand why it wouldn't), it would be simple to make it happen in kcontextd. Jeff |