Menu

#2420 imm: IMMND on PL hangs when headless

5.17.07
fixed
None
defect
imm
nd
major
2017-07-27
2017-04-11
Hung Nguyen
No

IMMND on PL hangs at waitpid() after coordinator removal.

When pbe process is in D State (Uninterruptible sleep (usually IO)), waitpid() will be hung if WNOHANG is not specified.

    LOG_WA("SC were absent and PBE appears hung, sending SIGKILL");
    kill(cb->pbePid, SIGKILL);
    waitpid(cb->pbePid, NULL, 0);

The bug is introduced by [#2296].

Solution: Use waitpid() with WNOHANG specified. Check for pbe/sync process exiting before sending introduce message during headless.

Related

Tickets: #2296
Tickets: #2420
Wiki: ChangeLog-5.17.07

Discussion

  • Hung Nguyen

    Hung Nguyen - 2017-04-11
    • status: accepted --> review
     
  • Hung Nguyen

    Hung Nguyen - 2017-04-20
    • status: review --> fixed
    • Milestone: 5.0.2 --> 5.17.06
     
  • Hung Nguyen

    Hung Nguyen - 2017-04-20

    5.17.08 (develop) [code:11325e]

    commit 11325e3b7643c4d0500771ef7e022fcc47f1d31a
    Author: Hung Nguyen <hung.d.nguyen@dektech.com.au>
    Date:   Thu Apr 20 14:37:18 2017 +0700
    
        imm: Use waitpid with WNOHANG to check for sync process and pbe process [#2420]
    
        Use waitpid with WNOHANG to check for sync process and pbe process.
        The processes are checked before resending the intro message.
        The intro message is only sent when those processes exit.
    



    5.17.06 (release) [code:51233a]

    commit 51233a54a11809ac48e27c043361b0ac95c5b71a
    Author: Hung Nguyen <hung.d.nguyen@dektech.com.au>
    Date:   Thu Apr 20 14:37:18 2017 +0700
    
        imm: Use waitpid with WNOHANG to check for sync process and pbe process [#2420]
    
        Use waitpid with WNOHANG to check for sync process and pbe process.
        The processes are checked before resending the intro message.
        The intro message is only sent when those processes exit.
    



    default (mecurial) [staging:2aa1ed]

    changeset:   8773:2aa1edbd41e9
    user:        Hung Nguyen <hung.d.nguyen@dektech.com.au>
    date:        Tue Apr 11 19:05:48 2017 +0700
    summary:     imm: Use waitpid with WNOHANG to check for sync process and pbe process [#2420]
    
     

    Related

    Commit: [2aa1ed]
    Tickets: #2420
    Commit: [11325e]
    Commit: [51233a]

  • Anders Widell

    Anders Widell - 2017-07-01
    • Milestone: 5.17.06 --> 5.17.08
     

Log in to post a comment.