My version of "find" is always accessing (stat'ing??) all files, not just directories no matter what.
I'm using GNU find version 4.2.20 on Windows2000.
For testing purposes I'm using the simple mode of:
find f:\ -name junk.txt
and it is extremely slow. It ends up opening
every file on f:. Documentation implies that when
using just the -name parameter that exectuion is
faster and docs point out that stat'ing is only done when necessary. I'm NOT proficient enough in C to debug this, but is it possibly always requesting stat() no matter what?
Sysinternals "filemon" confirms OPEN and QUERY
INFORMATION for all files during the "find" run.
Essentially this behavior makes find impractical for searching over very large directory structures on Win32.
FYI: Both the Cygwin and an old unxutils version of find do the same thing. Also I've tried this on two systems.
Any ideas?
(Apologies as a first time poster - I entered this as a "bug" last night because the "Start a New Thread" menu on this forum didn't appear.)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> Essentially this behavior makes find impractical for searching over very large directory structures on Win32.
I also struggled with this problem a couple of years ago. I needed to search for files over huge directories and return size/mtime/ctime/attribs/etc on the matches. It was literally taking hours to run. GnuWin32 find/cygwin find/perl were all very slow.
I wound up doing a little reverse engineering of Windows Explorer to see how it returned the file sizes/times/etc so quickly (~10 seconds). Turns out that the Win32 API calls FindFirstFileA() & FindNextFileA() in kernel32.dll are much faster than stat() on Win32.
I ended up writing a little module in Perl to simply wrap those API functions. I could try to find it (pun intended) if you want to see it.
If you discover a better solution, please post and share it. I'd like to know it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
My version of "find" is always accessing (stat'ing??) all files, not just directories no matter what.
I'm using GNU find version 4.2.20 on Windows2000.
For testing purposes I'm using the simple mode of:
find f:\ -name junk.txt
and it is extremely slow. It ends up opening
every file on f:. Documentation implies that when
using just the -name parameter that exectuion is
faster and docs point out that stat'ing is only done when necessary. I'm NOT proficient enough in C to debug this, but is it possibly always requesting stat() no matter what?
Sysinternals "filemon" confirms OPEN and QUERY
INFORMATION for all files during the "find" run.
Essentially this behavior makes find impractical for searching over very large directory structures on Win32.
FYI: Both the Cygwin and an old unxutils version of find do the same thing. Also I've tried this on two systems.
Any ideas?
(Apologies as a first time poster - I entered this as a "bug" last night because the "Start a New Thread" menu on this forum didn't appear.)
> Essentially this behavior makes find impractical for searching over very large directory structures on Win32.
I also struggled with this problem a couple of years ago. I needed to search for files over huge directories and return size/mtime/ctime/attribs/etc on the matches. It was literally taking hours to run. GnuWin32 find/cygwin find/perl were all very slow.
I wound up doing a little reverse engineering of Windows Explorer to see how it returned the file sizes/times/etc so quickly (~10 seconds). Turns out that the Win32 API calls FindFirstFileA() & FindNextFileA() in kernel32.dll are much faster than stat() on Win32.
I ended up writing a little module in Perl to simply wrap those API functions. I could try to find it (pun intended) if you want to see it.
If you discover a better solution, please post and share it. I'd like to know it.