#170 From time to time /proc deadlocks

open-fixed
5
2010-03-13
2008-09-05
John Hughes
No

For quite some time (maybe even on the old 2.4 based kernel) we've occasionally seen a problem where the /proc filesystem seems to be deadlocked - any attempt to read /proc hangs.

When this happens rebooting one the nodes (not any node, it has to be the "right" one) will free up the system and things will continue as normal.

Today I just noticed that when I rebooted the node that was "causing" the problem I had the following messages on the init node:

Node 6 has gone down!!!
Assertion failed! origin_lock != ((void *)0), cluster/ssi/vproc/dvp_pvpsops.c, pvpsop_get_execnode, line=376
nm_add_node: Node 6 added

Is this a clue?

Discussion

  • Roger Tsang

    Roger Tsang - 2008-09-24

    Related to this origin_lock assertion is a possible race in vproc_origin_list traversal supposedly fixed by pragma #ifdef VOD_HLIST since SSI-1.9.x, but the fix introduced a possible deadlock bug and should be fixed in 1.9.6.

    Lock ordering pre-1.9.6 (with #ifdef VOD_HLIST):
    -> vproc_origin_cleanup (down_read origin list)
    -> vproc_origin_fgpgrp_cleanup
    -> pvpop_getctty
    -> rpvpop_start_op
    -> pvpopsop_get_execnode
    -> vproc_lock_origin_node
    -> vproc_origin_find (down_read origin list)

     
  • Roger Tsang

    Roger Tsang - 2008-10-02
    • status: open --> open-accepted
     
  • Roger Tsang

    Roger Tsang - 2008-10-02

    a fix is going into CVS

     
  • John Hughes

    John Hughes - 2008-11-20

    With current CVS (20/11/2008) I still see this bug, on coming in to work I found my (non-init) node stuck, apparently in the screensaver, and when I tried to see what was going on from the initnode each time I did a stat on "/proc/1" it hung. stat on other things in /proc was working - stat /proc/self or stat /proc/$$ for example.

    When I turned off the node that was stuck the hung "stat" operations on the initnode sprang back to life and I see messages like this in the log:

    Node 6 has gone down!!!
    Assertion failed! origin_lock != ((void *)0), cluster/ssi/vproc/dvp_pvpsops.c, pvpsop_get_execnode, line=379
    Assertion failed! origin_lock != ((void *)0), cluster/ssi/vproc/dvp_pvpsops.c, pvpsop_get_execnode, line=379
    Assertion failed! origin_lock != ((void *)0), cluster/ssi/vproc/dvp_pvpsops.c, pvpsop_get_execnode, line=379
    nm_add_node: Node 6 added

     
  • John Hughes

    John Hughes - 2008-11-20

    Sorry, wasn't clear above - this is not straight CVS, it's my port of current CVS to 2.6.12. However I'm pretty sure this part of the port is good.

     
  • John Hughes

    John Hughes - 2008-12-03

    Well. I finally found a (crazy) way to duplicate this - launch a windows app with wine and hit control-c before it gets going. Eventually it will provoke the hang.

    A "bta A" trace of the running processes is attached.

     
  • John Hughes

    John Hughes - 2008-12-03

    output of bta A when /proc is hung

     
  • Roger Tsang

    Roger Tsang - 2009-03-28

    please try latest CVS (March 24th)

     
  • Roger Tsang

    Roger Tsang - 2009-10-27

    bug fixed?

     
  • Roger Tsang

    Roger Tsang - 2010-03-13
    • assigned_to: nobody --> rogertsang
    • status: open-accepted --> open-fixed
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks