Menu

#290 ocount leaves orphan process on error

None
closed-fixed
nobody
None
5
2020-07-20
2019-07-11
Carl Love
No

If ocount encounters and error, for example

ocount ./test64
perf_event_open failed with Permission denied
Caught runtime error while setting up counters
Internal Error. Perf event setup failed.
Error running ocount

it leaves behind an orphaned process.

The issue seems to be related to commit

commit 249fe0a4bb69e5bd2e9ee0a0667d925a86d4337c
Author: William Cohen wcohen@redhat.com
Date: Tue Aug 9 22:25:52 2016 -0400

    Only start the application if the perf events setup was successful

The commit changed the order of operations. Do to the change variable app_started is not set to true unless the perf event stuff succeeds. I believe it needs to be set right after the fork in function start_counting(void), in file pe_counting/ocount.cpp. Note I posted a message and patch about the issue to the oprofile-list@lists.sf.net on Wed, 10 Jul 2019.

Discussion

  • Carl Love

    Carl Love - 2019-07-11

    I have attached the message/patch sent to the mailing list.

     
  • William Cohen

    William Cohen - 2019-07-23

    Could you separate the the test and the actual fix in the patch into two separate patches? Right now they are all combined and done want that for the actual commit. The logic looks reasonable but would like to test thing a bit with the split patches.

     
  • Carl Love

    Carl Love - 2019-07-24

    Per Will's request, I am attaching the two patches separately. The first patch is the fix for the issue.

     
  • Carl Love

    Carl Love - 2019-07-24

    The second patch can be used to force the error seen by the user. It is for testing purposes only and is not be committted to the repository.

    Will, let me know if you can read the patches OK or not. Thanks.

     
  • William Cohen

    William Cohen - 2019-07-25
    • status: open --> open-fixed
    • Group: -->
     
    • Carl Love

      Carl Love - 2019-07-25

      On Thu, 2019-07-25 at 14:39 +0000, William Cohen wrote:

      • status: open --> open-fixed
      • Group:  --> 
      • Comment:

      I massaged the patch as it wasn't formatted quite right and didn't
      apply with a "git am".  The logic seems reasonable and I verified
      that the patch fixed the stated problem.  It is now in the upstrem
      oprofile git repository. Thanks.

      Will:

      Thanks. 

      It was generated via git gui so not sure why it wouldn't apply with git
      am.

                      Carl Love
      
       
  • William Cohen

    William Cohen - 2019-07-25

    I massaged the patch as it wasn't formatted quite right and didn't apply with a "git am". The logic seems reasonable and I verified that the patch fixed the stated problem. It is now in the upstrem oprofile git repository. Thanks.

     
  • Carl Love

    Carl Love - 2019-07-25

    Patch was committed by Will Cohen. The commit entry is:

    commit 8aacb573cd2c4a2f8160d99ff100ad0aa5e7859d
    Author: Carl Love cel@us.ibm.com
    Date: Thu Jul 25 10:24:16 2019 -0400

    Only start the application if the perf events setup was successful
    
    Changes the order of starting the application and performance events.
    
    Given this change we have a new issue.  The issue is the routine
    start_counting() calls fork, creating app_PID process.
    
    The parent then tries to setup the performance events, then if the
    performance events were setup correctly, app_PID is then told to start
    before exiting.  If the performance counter setup fails, the app_PID is
    left running.  The app_PID is never told to start the workload, which is
    correct but we don't record the fact that app_PID is running.  The
    error path then fails to kill app_PID in routine main(), in file
    oprofile-git/pe_counting, at about lie 909 because the if statement
    
     if (startApp && app_started && (run_result != APP_ABNORMAL_END)) {
    
    is false because app_started is False.
    
    The fix, I believe, is to set app_started to True in the parent code if
    the fork was successful.  With this fix, there is no orphan processes
    left after ocount exits.
    

    The issue can be closed.

     
  • William Cohen

    William Cohen - 2020-07-20
    • status: open-fixed --> closed-fixed
     

Log in to post a comment.