#98 Cluster Hangs on write()

v1.9.1
closed-fixed
nobody
Filesystem (49)
5
2007-08-13
2005-09-07
Anonymous
No

On my 11 node cluster running FC3 and OpenSSi 1.9.1
the cluster hangs on write() if the file is bigger than
approx. 300MB. mounting the cfs disk with sync option
solves the problem, but provides bad performance.
I found another entry in the list telling that with a manual
sync everthing springs to live again.

Discussion

  • Nobody/Anonymous

    Logged In: NO

    What is the underlying filesystem and device that you were
    writing to? Is this a remote CFS write or write to local
    device? Are you writing to device-mapper device? Can you
    print out the backtrace of the process from kdb on the node
    that you are writing from?

     
  • Nobody/Anonymous

    Logged In: NO

    An occational sync solves the problem and a possible
    woarkaround is to do a sync, say every minute or so. But its
    not a very convinient way.
    Any ideas ?

     
  • Nobody/Anonymous

    Logged In: NO

    >What is the underlying filesystem and device that you were
    >writing to? Is this a remote CFS write or write to local
    >device?
    Usually its a Remote device, a node writes back to the
    masternode. It allways happens wenn two nodes write at the
    same time.

    > Are you writing to device-mapper device?
    If you ask wether the device is a distributed block device,
    the answer is no.

    >Can you print out the backtrace of the process from kdb on
    the node
    >that you are writing from?

    because it usually happens with parallel processes, I will
    see to do that.

     
  • Nobody/Anonymous

    Logged In: NO

    >What is the underlying filesystem and device that you were
    >writing to? Is this a remote CFS write or write to local
    >device?
    Usually its a Remote device, a node writes back to the
    masternode. It allways happens wenn two nodes write at the
    same time.

    > Are you writing to device-mapper device?
    If you ask wether the device is a distributed block device,
    the answer is no.

    >Can you print out the backtrace of the process from kdb on
    the node
    >that you are writing from?

    because it usually happens with parallel processes, I will
    see to do that.

     
  • Nobody/Anonymous

    Logged In: NO

    >What is the underlying filesystem and device that you were
    >writing to? Is this a remote CFS write or write to local
    >device?
    Usually its a Remote device, a node writes back to the
    masternode. It allways happens wenn two nodes write at the
    same time.

    > Are you writing to device-mapper device?
    If you ask wether the device is a distributed block device,
    the answer is no.

    >Can you print out the backtrace of the process from kdb on
    the node
    >that you are writing from?

    because it usually happens with parallel processes, I will
    see to do that.

     
  • Nobody/Anonymous

    Logged In: NO

    >What is the underlying filesystem and device that you were
    >writing to? Is this a remote CFS write or write to local
    >device?
    Usually its a Remote device, a node writes back to the
    masternode. It allways happens wenn two nodes write at the
    same time.

    > Are you writing to device-mapper device?
    If you ask wether the device is a distributed block device,
    the answer is no.

    >Can you print out the backtrace of the process from kdb on
    the node
    >that you are writing from?

    because it usually happens with parallel processes, I will
    see to do that.

     
  • Roger Tsang

    Roger Tsang - 2005-09-29
    • labels: --> Filesystem
    • milestone: --> v1.9.1
     
  • Nobody/Anonymous

    Logged In: NO

    It seems the problem stems from pdflush kthread not being
    woken on non-initnodes; when the CFS server (CFS mount) is
    not on the initnode. In fact it even looks like the pdflush
    process is woken for background_writeout on the initnode,
    but should be woken on the CFS server (non-initnode). So
    when writing very large files - bigger than
    vm.dirty_threshold of the remote node - a number of
    icssvr_daemon's writing to the same file would hang the
    cluster waiting for IO in balance_dirty_pages and the CFS
    client would be hung waiting for rcfsd_write to complete. I
    can't reproduce this when writing to the CFS server on the
    initnode. There either is a failure at writeback_inodes
    call or pdflush_operation call to pdflush's
    background_writeout function.

     
  • Nobody/Anonymous

    Logged In: NO

    One way to reproduce this is mount a physical partition on a
    non-initnode non-chard. Copy to it a large file twice the
    size of that node's vm.dirty_threshold. Wait until dirty
    pages in /proc/meminfo rises above vm.dirty_threshold limit
    and for the copy process to hang at blk_congestion_wait. Do
    a sync at that node. Look up /proc/meminfo on that node,
    dirty pages should still be above vm.dirty_threshold.

     
  • Roger Tsang

    Roger Tsang - 2007-03-25
    • status: open --> open-fixed
     
  • Roger Tsang

    Roger Tsang - 2007-08-13
    • status: open-fixed --> closed-fixed
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks