From: Jeff D. <jd...@ka...> - 2002-09-27 00:39:17
|
net...@li... said: > similar to "init=111 failed" over and over again. "Similar to"? Surely you realize that the exact error message would be somewhat useful. > Is this a known bug with UML? It's tough to tell since you haven't said what it is. > Is there a fix in a latter UML kernel? No, because it's a bug in the switch, not the kernel. Jeff |
From: Jeff D. <jd...@ka...> - 2002-09-27 01:33:07
|
net...@li... said: > And since uml_switch still refuses to redirect its output to a file, > i had no way of logging the behavior. Hmmm, can you be more specific about that? I also recall that you couldn't background it, which I couldn't reproduce. More details on these would help get them fixed. > Either you're aware of this type of thing happening on occasion, or > you're not, right? On umlcoop, we saw the switch spitting out lots of 'Resource unavailable' errors, but they didn't kill the network. That's the only switch problem that I'm aware of. Jeff |
From: Net Llama! <net...@li...> - 2002-09-27 01:38:03
|
Jeff Dike wrote: > net...@li... said: > >>And since uml_switch still refuses to redirect its output to a file, >>i had no way of logging the behavior. > > > Hmmm, can you be more specific about that? I also recall that you couldn't > background it, which I couldn't reproduce. More details on these would > help get them fixed. Let me know what kind of details you need, and i'll gladly provide them. The only workaround that i was able to find to even get uml_switch to run was in a screen session. >>Either you're aware of this type of thing happening on occasion, or >>you're not, right? > > > On umlcoop, we saw the switch spitting out lots of 'Resource unavailable' > errors, but they didn't kill the network. That's the only switch problem > that I'm aware of. BINGO! That's what it was doing. I saw alot of "Resource unavailable....init=111" type errors flying down the screen. I guess maybe you were lucky that the network didn't die on yours, cause i assure you, it was completely dead on mine. Couldn't ping any of the UML's at all. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman net...@li... Linux Step-by-step & TyGeMo: http://netllama.ipfox.com 6:30pm up 52 days, 2:49, 3 users, load average: 0.07, 0.09, 0.08 |
From: Jeff D. <jd...@ka...> - 2002-09-27 03:38:46
|
net...@li... said: > Let me know what kind of details you need, and i'll gladly provide > them. On the backgrounding thing, I asked for a strace of the switch refusing to be backgrounded. On this one, how did you try to log it, and how did it now work? > The only workaround that i was able to find to even get uml_switch > to run was in a screen session. Why couldn't you get the errors out of the screen session? > BINGO! That's what it was doing. I saw alot of "Resource > unavailable....init=111" type errors flying down the screen. Except that "init" doesn't appear in any string in the switch. Jeff |
From: Net Llama! <net...@li...> - 2002-09-30 01:37:16
|
Jeff Dike wrote: > net...@li... said: > >>Let me know what kind of details you need, and i'll gladly provide >>them. > > > On the backgrounding thing, I asked for a strace of the switch > refusing to be backgrounded. OK, well it happened again today, except it seems to have just died, as in, the process wasn't even running all of a sudden. SO i tried to start it like this: strace uml_switch -tap tap0 > /dev/null 2>&1 but it never started, it just hung (no output whatsoever, and hitting enter didn't give me a shell prompt). Am i doing it incorrectly? > > On this one, how did you try to log it, and how did it now work? I didn't have time to log it. I just never expected it to die like it did, so when it did die, i kinda freaked, and was too busy trying to get everything back on the network. >> The only workaround that i was able to find to even get uml_switch >>to run was in a screen session. > > > Why couldn't you get the errors out of the screen session? see above. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman net...@li... Linux Step-by-step & TyGeMo: http://netllama.ipfox.com 6:25pm up 55 days, 2:44, 5 users, load average: 0.26, 0.11, 0.03 |
From: Jeff D. <jd...@ka...> - 2002-09-27 04:26:27
|
net...@li... said: > This has been working fine now for about 2 weeks until all of a sudden > today, uml_switch went into a death spiral, spitting out an error > similar to "init=111 failed" over and over again. All networking to/ > from the UML instances was lost at that point. We may have just seen something similar on umlcoop. One UML lost its network, but got it back after logging in to it, bringing its eth0 down, and bringing it back up. The switch is spitting out send_sock sending to fd 4 Resource temporarily unavailable at a rate of one every few seconds or so. That appears to be unrelated to the UML that lost its network, so I think it's irrelevant. There were no other signs of trouble, so I don't have any idea what happened there. Jeff |
From: Net L. <net...@li...> - 2002-09-27 13:23:30
|
On Fri, 27 Sep 2002, Jeff Dike wrote: > net...@li... said: > > This has been working fine now for about 2 weeks until all of a sudden > > today, uml_switch went into a death spiral, spitting out an error > > similar to "init=111 failed" over and over again. All networking to/ > > from the UML instances was lost at that point. > > We may have just seen something similar on umlcoop. One UML lost its network, > but got it back after logging in to it, bringing its eth0 down, and bringing > it back up. > > The switch is spitting out > send_sock sending to fd 4 Resource temporarily unavailable > > at a rate of one every few seconds or so. That appears to be unrelated to > the UML that lost its network, so I think it's irrelevant. > > There were no other signs of trouble, so I don't have any idea what happened > there. Hrmmm...interesting. This sounds somewhat similar to what i experienced, however it was all of the UMLs that lost their network at the same time (I use nmap to portscan them, via a cronjob). -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Lonni J Friedman net...@li... Linux Step-by-step & TyGeMo http://netllama.ipfox.com |
From: Jeff D. <jd...@ka...> - 2002-09-30 15:28:29
|
net...@li... said: > strace uml_switch -tap tap0 > /dev/null 2>&1 > > but it never started, it just hung (no output whatsoever, and hitting > enter didn't give me a shell prompt). That's because you dumped the strace output to /dev/null. Try strace -f bash -c 'uml_switch -tap tap0 > /dev/null 2>&1' Jeff |
From: Net Llama! <net...@li...> - 2002-09-27 00:44:16
|
Jeff Dike wrote: > net...@li... said: > >>similar to "init=111 failed" over and over again. > > > "Similar to"? > > Surely you realize that the exact error message would be somewhat useful. Yes, i realize that. However, since this was on a production server, i didn't really have time to fully document what was occuring. And since uml_switch still refuses to redirect its output to a file, i had no way of logging the behavior. My primary concern was restoring network accessibility to/from the UML instances, so i killed it, and restarted it. >>Is this a known bug with UML? > > > It's tough to tell since you haven't said what it is. > > >>Is there a fix in a latter UML kernel? > > > No, because it's a bug in the switch, not the kernel. Good point. So are you aware of any bugs in uml_switch which would lead to this type of behavior. Either you're aware of this type of thing happening on occasion, or you're not, right? -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L. Friedman net...@li... Linux Step-by-step & TyGeMo: http://netllama.ipfox.com 5:40pm up 52 days, 1:59, 3 users, load average: 0.11, 0.09, 0.03 |