Menu

find always accessing (stat'ing?) all files

Help
Cataddict
2006-04-27
2012-07-26
  • Cataddict

    Cataddict - 2006-04-27

    My version of "find" is always accessing (stat'ing??) all files, not just directories no matter what.

    I'm using GNU find version 4.2.20 on Windows2000.

    For testing purposes I'm using the simple mode of:

    find f:\ -name junk.txt

    and it is extremely slow. It ends up opening
    every file on f:. Documentation implies that when
    using just the -name parameter that exectuion is
    faster and docs point out that stat'ing is only done when necessary. I'm NOT proficient enough in C to debug this, but is it possibly always requesting stat() no matter what?

    Sysinternals "filemon" confirms OPEN and QUERY
    INFORMATION for all files during the "find" run.

    Essentially this behavior makes find impractical for searching over very large directory structures on Win32.

    FYI: Both the Cygwin and an old unxutils version of find do the same thing. Also I've tried this on two systems.

    Any ideas?

    (Apologies as a first time poster - I entered this as a "bug" last night because the "Start a New Thread" menu on this forum didn't appear.)

     
    • Nobody/Anonymous

      > Essentially this behavior makes find impractical for searching over very large directory structures on Win32.

      I also struggled with this problem a couple of years ago. I needed to search for files over huge directories and return size/mtime/ctime/attribs/etc on the matches. It was literally taking hours to run. GnuWin32 find/cygwin find/perl were all very slow.

      I wound up doing a little reverse engineering of Windows Explorer to see how it returned the file sizes/times/etc so quickly (~10 seconds). Turns out that the Win32 API calls FindFirstFileA() & FindNextFileA() in kernel32.dll are much faster than stat() on Win32.

      I ended up writing a little module in Perl to simply wrap those API functions. I could try to find it (pun intended) if you want to see it.

      If you discover a better solution, please post and share it. I'd like to know it.

       
MongoDB Logo MongoDB