#4659 Crash in Tcl_FSGetFileSystemForPath

obsolete: 8.5.8
pending-fixed
8
2012-06-25
2010-07-02
No

We have a multi-threaded tclkit application that runs several scripts on different threads and it crashes in Tcl_FSGetFileSystemForPath on the line "fsRecPtr = fsRecPtr->nextPtr".

From investigation, it appears that 'theFilesystemEpoch' is being updated within the loop and the threads file system list is being re-cached leaving fsRecPtr pointing at freed memory. It is very difficult to track, but it appears that Tcl_FSMountsChanged is being called from within Tcl_FSGetFileSystemForPath. The path being accessed is the 'wrapped' init.tcl.

Discussion

  • Donal K. Fellows

    • priority: 5 --> 8
     
  • Don Porter

    Don Porter - 2012-05-29

    This is reported against 8.5.8. Has it been reconfirmed
    with either 8.5.11 release, or current state of core-8-5-branch?

     
  • Don Porter

    Don Porter - 2012-05-29
    • assigned_to: vincentdarley --> dgp
     
  • Don Porter

    Don Porter - 2012-06-11
    • status: open --> pending-out-of-date
     
  • Don Porter

    Don Porter - 2012-06-11

    The issue appears to be care needed by callers of
    FsGetFirstFilesystem(). Any caller that keeps and
    makes use of the value it returns needs to take care
    not to continue using that value whenever another
    call to FsGetFirstFilesystem() might be made in the
    same thread. Since TclFSEnsureEpochOk() does
    just that, it's clear that Tcl_FsGetFileSystemForPath()
    has the potential for this kind of trouble. The need to
    make both those calls and make sure they both report
    results on the same epoch is a bit of a puzzle.

     
  • Don Porter

    Don Porter - 2012-06-11
    • status: pending-out-of-date --> open
     
  • Don Porter

    Don Porter - 2012-06-11

    First draft fix committed to branch bug-3024359. Needs tests and testing.

     
  • Don Porter

    Don Porter - 2012-06-12

    Second draft committed, but it's now clear these patches
    are broken. Although they preserve the FilesystemRecord
    linked lists while they are in use, avoiding the reported crashes,
    the result is that when a search of these preserved records
    succeeds, the incorrect epoch value is recorded, so that a
    record retrieved from an outdated epoch is stored as if it
    were current.

     
  • Don Porter

    Don Porter - 2012-06-18

    This seems to be a controlled way to test
    for the bug. Doesn't easily convert to a test
    for the test suite.

    package require Thread
    thread::create {while 1 {set env(HOME) /home/user[incr i]}}
    while 1 {file system foo}

     
  • Don Porter

    Don Porter - 2012-06-18

    Next draft fix committed. This one appears to solve
    the problem, at least as far as that test demonstrates.

     
  • Don Porter

    Don Porter - 2012-06-21

    Some simplifications of the internals of fs path values
    just committed to the 8.5 and trunk branches has the
    effect of also making the posted test pass.

    I have my doubts these changes are a complete fix for
    the issue of managing the mount epoch properly (still
    pursuing that on the bug-3024359 branch) but they may
    be sufficient to repair the problems observed by the
    original poster.

     
  • Don Porter

    Don Porter - 2012-06-21

    Aha! Strike that last comment. Still seeing
    crashes on 8.5 branch. Developing fix is
    still on the bug-3024359 branch.

     
  • Don Porter

    Don Porter - 2012-06-21

    The crashing is much less frequent though, which
    suggests that the FsGetFirstFilesystem() call in the
    (now deleted) helper routine TclFsInternalToNormalized()
    was the one most frequently causing a bump in the
    per-thread filesystemEpoch value at a time that could cause
    trouble.

     
  • Don Porter

    Don Porter - 2012-06-22

    Revised the code to restore the crashing. The bug is still
    present on 8-5-branch, but was able to hide by getting lucky
    with memory allocation patterns. Now it cannot hide. Bugs
    that don't hide are easier to fix.

     
  • Don Porter

    Don Porter - 2012-06-25

    Fix committed to core-8-5-branch for release in 8.5.12.

    Confirmation that it fixes the reported bug would be welcome.

     
  • Don Porter

    Don Porter - 2012-06-25
    • status: open --> pending-fixed
     
  • Don Porter

    Don Porter - 2012-06-25

    Fix merged to trunk as well.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks