Menu

#143 Orphan background processes

workingwiki
open
None
6
2013-07-10
2011-12-05
Lee Worden
No

There seem to be background jobs in which make and the processor-intensive programs it calls are still running after pe-make has terminated. I think this is not supposed to happen, and may cause us to have background processes running that are no longer visible in the background jobs listings on the wiki, or else it's reporting them as terminated when they're not, or something. This looks like a bug that should be fixed.

Discussion

  • Lee Worden

    Lee Worden - 2012-02-02

    Confirmed. I'm looking at one right now, in which make is running and pe-make isn't. WW reports the job as "not running (status unknown)". I think it'll continue to do that after make finishes, because pe-make is expected to record the exit status and it won't.

    So two questions:
    1. what happened to pe-make? the reaper? or does pe-make have a flaw that should be eliminated? if not, if I knew what was killing it I might be able to catch the condition and at least leave a record and/or kill the make process on the way down.

    2. if it's going to happen like this, I should improve the interface. I could check for the make process as well as the pe-make, I guess, so that I could report "is running" while it's running. Then when make finishes, pe-make won't be there to record the exit status, so then I'd have to report "not running (status unknown)" but at least in that case it'd be true.

     

    Last edit: Lee Worden 2013-04-11
  • Lee Worden

    Lee Worden - 2012-04-19

    We're getting something like this on our SGE setup as well - processes that are running, though the background session directory does not exist. I'm presuming this is because WW thinks it killed the job and destroyed the session directory. This may be a separate bug in which we need better handling in case we don't successfully kill the job. But I don't actually know what the cause is.

     
  • Lee Worden

    Lee Worden - 2013-05-07

    We have these on the cluster nodes now (we're running both foreground and background WW processes distributed across the yushan cluster now).

     
  • Lee Worden

    Lee Worden - 2013-05-07

    On another ticket, Andrei A. wrote:

    Update: if one clears the working directory or try to change the text of source-files during freezing the WW-page and while an incorrect foreground job is running, "forever" jobs might appear on the server.

    Andrei: why do you say so? It seems more likely to me that they happen at random times. Did you have experience suggesting that they are connected to clear or edit operations, freezes, or background jobs? All these things are supposed to be protected from each other by a locking algorithm.

     
  • Anonymous

    Anonymous - 2013-05-07

    Is it possible to just "collect" the problem? Ie., have some sort of background process that scans for jobs that have lost their pe-make and kills them?

     
  • Lee Worden

    Lee Worden - 2013-05-07

    Sure, but it wouldn't help us kill the bug

     
  • Lee Worden

    Lee Worden - 2013-07-10
    • Priority: 5 --> 6
     
  • Lee Worden

    Lee Worden - 2013-07-10

    I have a test program that produces a ridiculously large amount of output, and apparently fails to die when pe-make kills its children. Useful for testing. Need to sit down with it and test it until I see how to fix it.

     

Anonymous
Anonymous

Add attachments
Cancel