Menu

#3028 Do not use readdir_r

obsolete: 8.5a3
closed-accepted
5
2005-01-09
2005-01-04
Joe English
No

If Tcl is compiled with --enable-threads, it uses
readdir_r() (or readdir64_r) instead of readdir().
This is unnecessary: according to the Single Unix
Specification, "The pointer returned by readdir()
points to data which may be overwritten by another call
to readdir() on the same directory stream. This data is
not overwritten by another call to readdir() on a
different directory stream." In other words: readdir()
is perfectly thread-safe as long as you do not share
the same DIR pointer across different threads (which
Tcl does not do).

In addition, Tcl does not allocate the correct amount
of buffer space for the path name (in fact, this is
not even possible in general): the officially
recommended mechanism is to set name_max =
pathconf(dirname, _PC_NAME_MAX), then allocate a buffer
of 'sizeof(struct dirent) + name_max + 1' bytes, since
the maximum length of a filename (if there is a
maximum) is filesystem-specific.

This should prevent buffer overruns (assuming
readdir_r() is implemented according to spec), but it
can still truncate the filename -- on Linux, for
example, it is documented that "[f]iles with name
lengths longer than [...] _PC_NAME_MAX may exist in the
given directory".

Related issues: 1071701, 1001325, 723502.

Attached patch (to follow) gets rid of the whole mess
and just uses readdir().

Discussion

  • Joe English

    Joe English - 2005-01-04

    Logged In: YES
    user_id=68433

    More notes:

    In the current codebase, TclpReaddir() calls
    readdir/readdir_r/readdir64/readdir64_r, depending on -D
    flags determined at configure-time. TclpReaddir() appears
    in the internal unix-specific stubs table, but is not
    referenced directly in the current codebase. Instead,
    tclUnixFile.c and tclUnixFCmd.c call TclOSreaddir(), which
    is #defined as one of: TclpReaddir() (if TCL_THREADS
    defined), readdir64() (if TCL_THREADS undefined and
    HAVE_STRUCT_DIRENT64 defined), or readdir (if both symbols
    are undefined).

    In addition, unix/tclUnixPort.h has #define readdir(x)
    TclpReaddir(x) if TCL_THREADS is defined. The only place
    "readdir" appears literally in the current codebase is in
    the implementation of TclpReaddir (which must take care to
    #undefine it first).

    Proposed simplification: don't re-#define readdir at all,
    and only use TclOSreaddir to select readdir() / readdir64()
    (IOW, -DTCL_THREADS makes no difference). TclpReaddir
    still appears in the internal stubs table, but is never
    referenced (directly or indirectly) by anything else in the
    core.

     
  • Joe English

    Joe English - 2005-01-04

    Logged In: YES
    user_id=68433

    More notes:

    It's not clear that the core needs to distinguish between
    readdir() and readdir64(), either. The only struct dirent
    field the core looks at is d_name, so LFS issues (like
    64-bit inodes or file sizes) aren't applicable.

    OTOH, there may be systems where readdir() does not work at
    all on large files (GLIBC's readdir was buggy in this regard
    at one point, and might still be).

     
  • Joe English

    Joe English - 2005-01-05

    Patch to CVS HEAD (4 Jan 2005), remove readdir_r and related #ifdeffery

     
  • Joe English

    Joe English - 2005-01-05

    Logged In: YES
    user_id=68433

    Patch attached; tested with CVS HEAD, on Mandrake Linux
    10.0, --enable-symbols --enable-threads --disable-shared; no
    test failures.

    One more note: The SunOS man pages insist that readdir() is
    not MT-safe and should not be used in multithreaded
    applications. As near as I am able to determine, this is
    simply not true.

    The API itself has no thread-safety issues as long as the
    program does not use the same DIR pointer in multiple
    threads (i.e., it's no more MT-unsafe than, say, lseek() or
    write()). The SuS requires that independant calls to
    readdir() on distinct DIR pointers not interfere with one
    another; all of the readdir() implementations I've been able
    to find (glibc, various BSDs, IRIX) are thread-safe subject
    to the above constraint; according to Google the traditional
    implementation is MT-safe as well; and if you think about
    it, it would be more work to implement readdir() in a way
    that's *not* MT-safe (subject to the above constraint) than
    it would be to implement one that is.

    Considering the problems with readdir_r, both historical and
    inherent, using it in favor of the traditional readdir()
    seems quite strongly contraindicated. Let's lose the
    #ifdeffery here; it's doing more harm than good.

     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    IIRC, readdir64 is needed even though we never look at a
    field where it matters, because the OS call doesn't know
    that we won't look. :^/ The alternative (partial failure)
    isn't common in Unix system calls AIUI...

    As the author of some of the readdir swamp, if you want to
    clear some of the alligators out, be my guest.

     
  • Joe English

    Joe English - 2005-01-09

    Logged In: YES
    user_id=68433

    Committed to CVS HEAD to get more widespread testing.

    Refreshed patch attached (just added some comments) in the
    unlikely event that it needs to be reverted.

     
  • Joe English

    Joe English - 2005-01-09
    • assigned_to: mdejong --> jenglish
    • status: open --> closed-accepted
     
  • Joe English

    Joe English - 2005-01-09

    Patch to CVS HEAD (9 Jan 2005), remove readdir_r and related #ifdeffery

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2005-07-08

    Logged In: YES
    user_id=72656

    Backported to 8.4 branch port 8.4.11. Found the
    incorrectness of the previous version's m4 screwed up AIX
    threaded builds, so correcting the whole use of readdir_r
    seemed best.