|
From: Tim M. <Tim...@nd...> - 2014-09-05 16:54:28
|
In regard to: Re: [Lprng-devel] [PATCH] Forget pid of child that have...:
> our customer reported an error: on LPRng exit some totally unrelated
> processes got killed. The only explanation for that I was able to find was
> the LPRng way of handling child processes. It seems to me, that though the
> children have been waited for, i.e. their pids are free to be reused, thir
> pids are still stored in Process_list. And if they are (and the LPRng
> daemon has the right to kill them), they will be killed at exit.
That perfectly describes a scenario we've seen many times in the last
few years. When LPRng has been running a few weeks or months, doing an
sudo /etc/init.d/lpd stop
will often kill dozens of unrelated processes, including my shell, my
sshd, etc.
You've found, and hopefully fixed, a problem that has been an annoyance
for us for a long time.
Thanks!
Tim
> On 2014-9-5 10:26, walter harms wrote:
>
>> thx for the patch,
>> how did you notice that bug ?
>>
>> re,
>> wh
>>
>> Am 03.09.2014 10:51, schrieb Ales Novak:
>>> We need to forget about the child that had exited and had already
>>> been successfully waited for. Otherwise the LPRng will try to kill
>>> that pid on its exit - but the pid may be assigned to completely
>>> another process.
>>>
>>> The patch is adding the function forget_child(pid_t), which walks
>>> through the list of childs and tries to find the specified one
>>> and remove it. The function is called after successfull waitpid()
>>> call.
>>>
>>> Signed-off-by: Ales Novak <al...@su...>
>>> ---
>>> src/common/child.c | 31 +++++++++++++++++++++++++++++++
>>> 1 file changed, 31 insertions(+)
>>>
>>> diff --git a/src/common/child.c b/src/common/child.c
>>> index 8c4ee3d..5476c75 100644
>>> --- a/src/common/child.c
>>> +++ b/src/common/child.c
>>> @@ -34,6 +34,36 @@
>>> # include <sys/ttold.h>
>>> #endif
>>>
>>> +/*
>>> + * When the child was successfully waited on, it stayed in the
>>> + * Process_list and henceforth the lpd tried to kill it when
>>> + * cleaning up. But the pid may have been assigned to another
>>> + * process!
>>> + *
>>> + * So what is neccessary is to walk through the process list,
>>> + * and remove the pid which has just exited (or have successfully
>>> + * been waited for, to be precise).
>>> + *
>>> + * The removal is done by replacing the record by the last one in
>>> + * the list and decrementing the record count.
>>> + */
>>> +static void forget_child(pid_t pid)
>>> +{
>>> + int i;
>>> +
>>> + for( i = 0; i < Process_list.count; ++i ){
>>> + if (pid == Cast_ptr_to_int(Process_list.list[i]))
>>> + break;
>>> + }
>>> +
>>> + if (i < Process_list.count) {
>>> + DEBUG2("forget_child: found the child with pid %d", pid);
>>> + Process_list.list[i] = Process_list.list[Process_list.count-1];
>>> + Process_list.count --;
>>> + } else {
>>> + DEBUG2("forget_child: child with pid %d not found", pid);
>>> + }
>>> +}
>>>
>>> /*
>>> * Patrick Powell
>>> @@ -51,6 +81,7 @@ pid_t plp_waitpid (pid_t pid, plp_status_t *statusPtr, int options)
>>> report = waitpid(pid, statusPtr, options );
>>> DEBUG2("plp_waitpid: returned %d, status %s", report,
>>> Decode_status( statusPtr ) );
>>> + if (report > 0) forget_child(pid);
>>> return report;
>>> }
>>>
>>
>
>
--
Tim Mooney Tim...@nd...
Enterprise Computing & Infrastructure 701-231-1076 (Voice)
Room 242-J6, Quentin Burdick Building 701-231-8541 (Fax)
North Dakota State University, Fargo, ND 58105-5164
|