From: Tim M. <Tim...@nd...> - 2014-09-05 16:54:28
|
In regard to: Re: [Lprng-devel] [PATCH] Forget pid of child that have...: > our customer reported an error: on LPRng exit some totally unrelated > processes got killed. The only explanation for that I was able to find was > the LPRng way of handling child processes. It seems to me, that though the > children have been waited for, i.e. their pids are free to be reused, thir > pids are still stored in Process_list. And if they are (and the LPRng > daemon has the right to kill them), they will be killed at exit. That perfectly describes a scenario we've seen many times in the last few years. When LPRng has been running a few weeks or months, doing an sudo /etc/init.d/lpd stop will often kill dozens of unrelated processes, including my shell, my sshd, etc. You've found, and hopefully fixed, a problem that has been an annoyance for us for a long time. Thanks! Tim > On 2014-9-5 10:26, walter harms wrote: > >> thx for the patch, >> how did you notice that bug ? >> >> re, >> wh >> >> Am 03.09.2014 10:51, schrieb Ales Novak: >>> We need to forget about the child that had exited and had already >>> been successfully waited for. Otherwise the LPRng will try to kill >>> that pid on its exit - but the pid may be assigned to completely >>> another process. >>> >>> The patch is adding the function forget_child(pid_t), which walks >>> through the list of childs and tries to find the specified one >>> and remove it. The function is called after successfull waitpid() >>> call. >>> >>> Signed-off-by: Ales Novak <al...@su...> >>> --- >>> src/common/child.c | 31 +++++++++++++++++++++++++++++++ >>> 1 file changed, 31 insertions(+) >>> >>> diff --git a/src/common/child.c b/src/common/child.c >>> index 8c4ee3d..5476c75 100644 >>> --- a/src/common/child.c >>> +++ b/src/common/child.c >>> @@ -34,6 +34,36 @@ >>> # include <sys/ttold.h> >>> #endif >>> >>> +/* >>> + * When the child was successfully waited on, it stayed in the >>> + * Process_list and henceforth the lpd tried to kill it when >>> + * cleaning up. But the pid may have been assigned to another >>> + * process! >>> + * >>> + * So what is neccessary is to walk through the process list, >>> + * and remove the pid which has just exited (or have successfully >>> + * been waited for, to be precise). >>> + * >>> + * The removal is done by replacing the record by the last one in >>> + * the list and decrementing the record count. >>> + */ >>> +static void forget_child(pid_t pid) >>> +{ >>> + int i; >>> + >>> + for( i = 0; i < Process_list.count; ++i ){ >>> + if (pid == Cast_ptr_to_int(Process_list.list[i])) >>> + break; >>> + } >>> + >>> + if (i < Process_list.count) { >>> + DEBUG2("forget_child: found the child with pid %d", pid); >>> + Process_list.list[i] = Process_list.list[Process_list.count-1]; >>> + Process_list.count --; >>> + } else { >>> + DEBUG2("forget_child: child with pid %d not found", pid); >>> + } >>> +} >>> >>> /* >>> * Patrick Powell >>> @@ -51,6 +81,7 @@ pid_t plp_waitpid (pid_t pid, plp_status_t *statusPtr, int options) >>> report = waitpid(pid, statusPtr, options ); >>> DEBUG2("plp_waitpid: returned %d, status %s", report, >>> Decode_status( statusPtr ) ); >>> + if (report > 0) forget_child(pid); >>> return report; >>> } >>> >> > > -- Tim Mooney Tim...@nd... Enterprise Computing & Infrastructure 701-231-1076 (Voice) Room 242-J6, Quentin Burdick Building 701-231-8541 (Fax) North Dakota State University, Fargo, ND 58105-5164 |