From: Ales N. <al...@su...> - 2014-09-03 08:51:55
|
We need to forget about the child that had exited and had already been successfully waited for. Otherwise the LPRng will try to kill that pid on its exit - but the pid may be assigned to completely another process. The patch is adding the function forget_child(pid_t), which walks through the list of childs and tries to find the specified one and remove it. The function is called after successfull waitpid() call. Signed-off-by: Ales Novak <al...@su...> --- src/common/child.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/src/common/child.c b/src/common/child.c index 8c4ee3d..5476c75 100644 --- a/src/common/child.c +++ b/src/common/child.c @@ -34,6 +34,36 @@ # include <sys/ttold.h> #endif +/* + * When the child was successfully waited on, it stayed in the + * Process_list and henceforth the lpd tried to kill it when + * cleaning up. But the pid may have been assigned to another + * process! + * + * So what is neccessary is to walk through the process list, + * and remove the pid which has just exited (or have successfully + * been waited for, to be precise). + * + * The removal is done by replacing the record by the last one in + * the list and decrementing the record count. + */ +static void forget_child(pid_t pid) +{ + int i; + + for( i = 0; i < Process_list.count; ++i ){ + if (pid == Cast_ptr_to_int(Process_list.list[i])) + break; + } + + if (i < Process_list.count) { + DEBUG2("forget_child: found the child with pid %d", pid); + Process_list.list[i] = Process_list.list[Process_list.count-1]; + Process_list.count --; + } else { + DEBUG2("forget_child: child with pid %d not found", pid); + } +} /* * Patrick Powell @@ -51,6 +81,7 @@ pid_t plp_waitpid (pid_t pid, plp_status_t *statusPtr, int options) report = waitpid(pid, statusPtr, options ); DEBUG2("plp_waitpid: returned %d, status %s", report, Decode_status( statusPtr ) ); + if (report > 0) forget_child(pid); return report; } -- 1.8.1.4 |
From: Ales N. <al...@su...> - 2014-09-05 08:34:16
|
Hello, our customer reported an error: on LPRng exit some totally unrelated processes got killed. The only explanation for that I was able to find was the LPRng way of handling child processes. It seems to me, that though the children have been waited for, i.e. their pids are free to be reused, thir pids are still stored in Process_list. And if they are (and the LPRng daemon has the right to kill them), they will be killed at exit. On 2014-9-5 10:26, walter harms wrote: > thx for the patch, > how did you notice that bug ? > > re, > wh > > Am 03.09.2014 10:51, schrieb Ales Novak: >> We need to forget about the child that had exited and had already >> been successfully waited for. Otherwise the LPRng will try to kill >> that pid on its exit - but the pid may be assigned to completely >> another process. >> >> The patch is adding the function forget_child(pid_t), which walks >> through the list of childs and tries to find the specified one >> and remove it. The function is called after successfull waitpid() >> call. >> >> Signed-off-by: Ales Novak <al...@su...> >> --- >> src/common/child.c | 31 +++++++++++++++++++++++++++++++ >> 1 file changed, 31 insertions(+) >> >> diff --git a/src/common/child.c b/src/common/child.c >> index 8c4ee3d..5476c75 100644 >> --- a/src/common/child.c >> +++ b/src/common/child.c >> @@ -34,6 +34,36 @@ >> # include <sys/ttold.h> >> #endif >> >> +/* >> + * When the child was successfully waited on, it stayed in the >> + * Process_list and henceforth the lpd tried to kill it when >> + * cleaning up. But the pid may have been assigned to another >> + * process! >> + * >> + * So what is neccessary is to walk through the process list, >> + * and remove the pid which has just exited (or have successfully >> + * been waited for, to be precise). >> + * >> + * The removal is done by replacing the record by the last one in >> + * the list and decrementing the record count. >> + */ >> +static void forget_child(pid_t pid) >> +{ >> + int i; >> + >> + for( i = 0; i < Process_list.count; ++i ){ >> + if (pid == Cast_ptr_to_int(Process_list.list[i])) >> + break; >> + } >> + >> + if (i < Process_list.count) { >> + DEBUG2("forget_child: found the child with pid %d", pid); >> + Process_list.list[i] = Process_list.list[Process_list.count-1]; >> + Process_list.count --; >> + } else { >> + DEBUG2("forget_child: child with pid %d not found", pid); >> + } >> +} >> >> /* >> * Patrick Powell >> @@ -51,6 +81,7 @@ pid_t plp_waitpid (pid_t pid, plp_status_t *statusPtr, int options) >> report = waitpid(pid, statusPtr, options ); >> DEBUG2("plp_waitpid: returned %d, status %s", report, >> Decode_status( statusPtr ) ); >> + if (report > 0) forget_child(pid); >> return report; >> } >> > -- Ales Novak |
From: Tim M. <Tim...@nd...> - 2014-09-05 16:54:28
|
In regard to: Re: [Lprng-devel] [PATCH] Forget pid of child that have...: > our customer reported an error: on LPRng exit some totally unrelated > processes got killed. The only explanation for that I was able to find was > the LPRng way of handling child processes. It seems to me, that though the > children have been waited for, i.e. their pids are free to be reused, thir > pids are still stored in Process_list. And if they are (and the LPRng > daemon has the right to kill them), they will be killed at exit. That perfectly describes a scenario we've seen many times in the last few years. When LPRng has been running a few weeks or months, doing an sudo /etc/init.d/lpd stop will often kill dozens of unrelated processes, including my shell, my sshd, etc. You've found, and hopefully fixed, a problem that has been an annoyance for us for a long time. Thanks! Tim > On 2014-9-5 10:26, walter harms wrote: > >> thx for the patch, >> how did you notice that bug ? >> >> re, >> wh >> >> Am 03.09.2014 10:51, schrieb Ales Novak: >>> We need to forget about the child that had exited and had already >>> been successfully waited for. Otherwise the LPRng will try to kill >>> that pid on its exit - but the pid may be assigned to completely >>> another process. >>> >>> The patch is adding the function forget_child(pid_t), which walks >>> through the list of childs and tries to find the specified one >>> and remove it. The function is called after successfull waitpid() >>> call. >>> >>> Signed-off-by: Ales Novak <al...@su...> >>> --- >>> src/common/child.c | 31 +++++++++++++++++++++++++++++++ >>> 1 file changed, 31 insertions(+) >>> >>> diff --git a/src/common/child.c b/src/common/child.c >>> index 8c4ee3d..5476c75 100644 >>> --- a/src/common/child.c >>> +++ b/src/common/child.c >>> @@ -34,6 +34,36 @@ >>> # include <sys/ttold.h> >>> #endif >>> >>> +/* >>> + * When the child was successfully waited on, it stayed in the >>> + * Process_list and henceforth the lpd tried to kill it when >>> + * cleaning up. But the pid may have been assigned to another >>> + * process! >>> + * >>> + * So what is neccessary is to walk through the process list, >>> + * and remove the pid which has just exited (or have successfully >>> + * been waited for, to be precise). >>> + * >>> + * The removal is done by replacing the record by the last one in >>> + * the list and decrementing the record count. >>> + */ >>> +static void forget_child(pid_t pid) >>> +{ >>> + int i; >>> + >>> + for( i = 0; i < Process_list.count; ++i ){ >>> + if (pid == Cast_ptr_to_int(Process_list.list[i])) >>> + break; >>> + } >>> + >>> + if (i < Process_list.count) { >>> + DEBUG2("forget_child: found the child with pid %d", pid); >>> + Process_list.list[i] = Process_list.list[Process_list.count-1]; >>> + Process_list.count --; >>> + } else { >>> + DEBUG2("forget_child: child with pid %d not found", pid); >>> + } >>> +} >>> >>> /* >>> * Patrick Powell >>> @@ -51,6 +81,7 @@ pid_t plp_waitpid (pid_t pid, plp_status_t *statusPtr, int options) >>> report = waitpid(pid, statusPtr, options ); >>> DEBUG2("plp_waitpid: returned %d, status %s", report, >>> Decode_status( statusPtr ) ); >>> + if (report > 0) forget_child(pid); >>> return report; >>> } >>> >> > > -- Tim Mooney Tim...@nd... Enterprise Computing & Infrastructure 701-231-1076 (Voice) Room 242-J6, Quentin Burdick Building 701-231-8541 (Fax) North Dakota State University, Fargo, ND 58105-5164 |