Menu

fast directory enumerator?

James King
2020-06-21
2020-06-23
  • James King

    James King - 2020-06-21

    can this be adapted to also provide a faster alternative to GetDirectories()?

     
  • Opulos Inc.

    Opulos Inc. - 2020-06-22

    Looking at the source code for Directory.GetDirectories, it is calling the same method Win32Native.FindFirstFile(searchPath, ref data); in the internal class FileSystemEnumerableIterator. It is unlikely there is significant performance that can be gained.

    Have you timed the duration of Directory.GetDirectories in your own project? It makes sense to first confirm that this is actually a bottleneck in your code.

     
  • James King

    James King - 2020-06-22

    That's useful to know
    I was picking over the FastFileInfo code last night and I notice that it does actually get all files and folders from the target, then it skips over the folders before returning the files only

    my project needs to get information for all the files and all the folders - currently I'm using FastFileInfo to enumerate over the files then System.IO to enumerate over the folders. It's very clearly fetching the same set of information two times (this is to a UNC share over a VPN over the internet, with folders containing thousands of entries, so it's slow) and it very clearly could be reduced to a single fetch. If the folder I'm scanning contains 5000 files and 2 folders, the folder call still takes a long time despite only returning 2 folders. I think you're right though, they do both seem to complete in roughly the same amount of time now (and when I was using System.IO to get the files it was a LOT slower - the speed improvement from FastFileInfo is incredible, it's reduced the function time to around 30% of what it used to take)

    I originally tried to speed this up using System.IO.FileSystemInfo to wrap both fetches into a single call, but this doesn't return the file size which is needed, and subsequently fetching this for each file takes just as long again if not longer, so it actually doesn't help overall. FastFIleInfo speeds up the files fetch enormously, but I still have to fetch the folders separately so each folder is essentially taking twice as long as it needs to.

    So now I'm thinking if FastFileInfo can be adapted to FastFileSystemInfo which would fetch all files and folders in a single call, iterating over the target folder just once and returning every entry in one enumerable list or array, and which would improve upon System.IO.FileSystemInfo by also including the filesize for the files (which would necessarily be zero for folders), meaning you could genuinely retrieve all information for the entire folder contents in a single call. Folders would be identified by setting the file attribute accordingly.

    This is something I could possibly do myself, but the windows API calls and general structuring of the code is quite a lot more advanced than I'm used to, and this is likely to take a heck of a long time to figure out. If it's something you're able to put together quickly (it seems like a simple modification), I'd really absolutely love it. It'd likely benefit others as well.

     
  • Opulos Inc.

    Opulos Inc. - 2020-06-22

    Rather than an additional call to System.IO, have you considered extracting the folders using the filenames? However, only folders that contain a matching file are returned, which also means empty folders are not listed either. Does this solve your problem? Sample code:

    String path = "c:\\temp\\";
    String pattern = "*.txt";
    SearchOption so = SearchOption.AllDirectories;
    IList<FastFileInfo> files = FastFileInfo.GetFiles(path, pattern, so);
    Hashtable htFolders = new Hashtable();
    List<String> folders = new List<String>();
    foreach (FastFileInfo f in files) {
        String folder = f.DirectoryName;
        if (!htFolders.ContainsKey(folder)) {
            folders.Add(folder);
            htFolders[folder] = folder;
        }
    }
    
     
  • James King

    James King - 2020-06-22

    thanks - but my function doesn't scan into subfolders at all, it only scans the top level folder, to return all files and folders, so this isn't going to work as no matching files would be returned from subfolders. if I did try to go down that road, the function would then be scanning many thousands of additional files for no purpose other than to find their containing folders. in fact this would then produce many layers of subfolders when I'm only looking for the subfolders of the top level, so this would be a LOT of additional unwanted data and network traffic - the second System.IO call is clearly going to be faster than that

    think of the behaviour of a file browser, it needs to gather all info on all files and folders in the current folder, without going into each subfolder, what I'm doing is pretty similar

    I'm going to have another look over the code, I reckon I can transpose it to inherit from FileSystemInfo instead of FileInfo, add a readonly property for Length (as this is missing from FileSystemInfo), and then make it not skip over the folder entries but instead set the IsDirectory attribute and keep them in the list.... it's definitely beyond my current skills but it's also a good learning experience... I think I can do this xD

     
  • James King

    James King - 2020-06-22

    hmm one thought that instantly comes to mind - I need to get the info from each folder same as the files, as in I need the CreationTime and LastWriteTime need to be returned from the folders

    does the existing FindNextFile() which wraps WIN32_FIND_DATA get this same information from folders as it does from files? if so then I think I should be okay adapting this

     
  • Opulos Inc.

    Opulos Inc. - 2020-06-22

    Yes, the information is contained in the findData variable. In the MoveNext() method, look at the code:

                    if (hasCurrent || !advanceNext) {
                        // first skip over any directories, but store them if usePendingFolders is true
                        while (((FileAttributes) findData.dwFileAttributes & FileAttributes.Directory) == FileAttributes.Directory) {
                            if (usePendingFolders) {
                                String c = findData.cFileName;
                                if (!(c[0] == '.' && (c.Length == 1 || c[1] == '.' && c.Length == 2))) // skip folders '.' and '..'
                                    pendingFolders.Add(Path.Combine(currentFolder, c));
                            }
                            hasCurrent = FindNextFile(hndFile, findData);
                            if (!hasCurrent)
                                break;
                        }
                    }
    

    If pendingFolders.Add(...) was changed to new FastFileInfo(currentFolder, findData); then it would contain the information you need. The Attributes property would appear as a Directory, Length = 0, Name = name of sub-folder, and all the DateTime variables are assigned.

     
  • James King

    James King - 2020-06-22

    okay well I'm not really sure I understand that specific change, but I commented out that segment of code that skips over directories, and that works, I'm getting directories and files all in one go with all the info attached. sorted!

     
  • James King

    James King - 2020-06-23

    realised that was giving me "." and ".." folders, so rather than deal with those in my application loop, I have modified that block of code as follows. this seems to be working, and my file/folder enumerating is absolutely racing through now. honestly this is taking something like 15-20% the time it used to. incredible! thanks so much for your help on this

                        if (hasCurrent || !advanceNext)
                        {
                            while ((((FileAttributes)findData.dwFileAttributes & FileAttributes.Directory) == FileAttributes.Directory) && 
                                (findData.cFileName[0] == '.' && (findData.cFileName.Length == 1 || findData.cFileName[1] == '.' && findData.cFileName.Length == 2)))
                            {
                                hasCurrent = FindNextFile(hndFile, findData);
    
                                if (!hasCurrent)
                                    break;
                            }
                        }
    
     

Log in to post a comment.