From: Marc G. <gr...@at...> - 2009-12-28 11:29:11
|
Gordan, thanks for finding this out. I think you are right with the analysis. I'll check this and then let this patch go upstream ASAP. Thanks again. Marc P.S. Congratulations for you getting the "2009 Gluster Hacker Award". ----- "Gordan Bobic" <go...@bo...> wrote: > Gordan Bobic wrote: > > Gordan Bobic wrote: > >> Marc Grimme wrote: > >> > >>>> And just to confirm, I used the same binary on another machine > >>>> (standalone, no OSR or clustering), and it works exactly as > expected > >>>> (prints out what processes it is killing). That means that > whatever > >>>> causes killall5 to go away and never return is specific to > glfs+OSR > >>>> (since killall5 works fine on my gfs+OSR clusters). I'm not sure > where > >>>> to even begin debugging this, though, so any ideas would be > welcome. > >> > > >>> You might want to try to start it with strace. I recall something > that > >>> under some environments the browsing through /proc which is done > by > >>> killall5 freezes. And I think this is done before killing. Somehow > what > >>> does not work is a stat call on some /proc files within > /proc/<pid>. I > >>> don't recall exactly but I have something like this in mind. > >>> > >>> If you have found the pid that causes the problem perhaps we get > some > >>> new ideas on how to handle this behaviour. > >> OK, I have straced killall5, and the last few things it does is > stat > >> /proc/version (twice, it seems) and set up SIGTERM, SIGSTOP and > SIGKILL > >> signals. This appears to correspond to lines 682-692 in > killall5.c: > >> > >> mount_proc(); > >> ... > >> signal(SIGTERM, SIG_IGN); > >> signal(SIGSTOP, SIG_IGN); > >> signal(SIGKILL, SIG_IGN); > >> > >> The last thing strace reports is: > >> > >> kill(4294967295,SIGSTOP > >> > >> (note - no closing bracket) > >> > >> which seems to correspond to line 695: > >> > >> if (TEST == 0) kill(-1, SIGSTOP); > >> > >> Reading what "man 2 kill" says: > >> POSIX.1-2001 requires that kill(-1,sig) send sig to all processes > that > >> the current process may send signals to, except possibly for some > >> implementation-defined system processes. > >> > >> I have a suspicion that this may well be the cause of the problems. > > >> killall5 doesn't iterate through all the processes to kill! > According to > >> this, sending "kill(-1, <signal>)" sends the signal to all the > processes > >> that we have permissions to terminate without explicitly specifying > the > >> processes to terminate! Since killall5 is running as root at this > point, > >> this means all processes, with the possible exception of "some > >> implementation-defined system processes". Right now my bet would be > on > >> this killing glusterfsd (which is in fact running in userspace, and > thus > >> is extremely unlikely to be exempt). > >> > >> This brings up another issue - it sounds like the -x option may be > > >> ineffective, too, even on the normal GFS related processes. If the > > >> signals get sent to all processes, then this would include the the > > >> processes specified by -x, regardless. This leads me to suspect > that > >> unless these processes are explicitly excluded in the kernel > >> implementation, they are not spared the killing at this stage. > Looking > >> at the ps output - fenced, groupd, aisexec and ccsd, for example, > don't > >> show up in square brackets, which implies they aren't running in > kernel > >> space (although that isn't really definitive, only indicative, > AFAIK). > >> So, this may be affected by the bug, too - but this may not be > obvious > >> because once they die, the node will get fenced by the other nodes, > > >> which will end up doing something similar. Or maybe these processes > > >> simply catch and ignore the signals if they are being used (e.g. if > gfs > >> is mounted), or something like that. Anyway, that is just > hypothesis at > >> this point, but it's probably worth checking if you have a suitable > test > >> environment handy (I don't have a non-production gfs cluster handy > at > >> the moment). > >> > >> Anyway, I'm going to comment out line 695 and see how that goes. In > > >> theory, this seems superfluous anyway, since the iteration through > /proc > >> for processes to kill should catch everything anyway, and in fact, > it is > >> this iteration that -x relies on for it's functionality! Otherwise > > >> kill(-1) will just blow everything away and preempt anything -x > might do > >> in the first place! > >> > >> Am I missing something obvious here? Is there a flaw in my > analysis? > > > > Sorry, small ammendment - line 695 only sends SIGSTOP. Since it > resumes > > the processes afterwards, this may not affect all processes, e.g. > those > > required by gfs. But if it sends a stop to glusterfsd, it's almost > > certain that rootfs will in fact block, so it is definitely an issue > for > > that. Since SIGSTOP cannot be caught or ignored by the process > itself, > > killall5 will have to be explicitly modified to do this differently, > > > e.g. using a double-pass through /proc, specifically without > including > > glusterfsd in the list of processes to signal. > > Attached is a proposed patch that tries to work around this specific > issue. It seems to work the machine no longer locks up on killall5, > which is a good sign, and a definitive improvement. :^) > > Please review. > > Now the problem is that md devices get stopped shortly afterwards, > just > after the "INIT: no more processes left in this run-level" message. > Now > I have to figure out what does that, since these must remain running > until the shutdown sequence reaches the OSR initroot... But that's > something for a separate thread. > > Gordan > > > [Text File:killall5.c.patch] -- Marc Grimme Tel: +49 89 4523538-14 Fax: +49 89 9901766-0 E-Mail: gr...@at... ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 | 85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org Registergericht: Amtsgericht Muenchen, Registernummer: HRB 168930, USt.-Id.: DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas Merz (Vors.) | Vorsitzender des Aufsichtsrats: Dr. Martin Buss |