dir_refresh vs file_refresh

monstruooo
2009-03-25
2013-04-16
  • monstruooo
    monstruooo
    2009-03-25

    I would like to draw your attention to one feature of davfs2.conf that creates a certain confusion. This is dir_refresh about which the man page states that it defines for how long information about directories is stored.

    In reality, what is stored is not only info about directories, but about what's inside them too. It's not so much refresh of directory as refresh of the command dir. One example of the confusion is a way we are using these settings. We have a directory tree with four layers and actual files are stored only in the directories of the fourth layer. We don't use refresh settings because we have quite a high rate of new files being created. However initially we had dir_refresh set to infinity because we have a fixed directory structure. It took us a while to figure out that dir_refresh refers to directory listings and not only to the info about directory itself and that's why new files created were not visible until dir_refresh expired.

    In practical terms the current dir_refresh is determined not only by how stable the directory structure is, but also by how often new files are created which means that in many cases dir_refresh have to follow file_refresh. The correct implementation in my view should view dir_refresh as info about presence or not of a directory and not of what's inside it.

    Another resolution may be to configure depth within which directories will be treated as directories while beyond it they will be treated with file_refresh. Or maybe there should be a setting that configures whether davfs should treat directory paths as given and check their actual existence only when a certain error from a web server is returned.

     
    • Werner Baumann
      Werner Baumann
      2009-03-25

      I agree, the term "information about a directory" is misleading. It should say something like "directory information, meaning the list of files and directories as well as some meta-information like size and modification date". I will change this in the manpage.
      file_refresh refers to the content of a file.
      A special case is when an application opens a directory to list the contents. In this case file_refresh is used.
      "that's why new files created were not visible until dir_refresh expired"
      Are you sure about this? This would be a bug. When an application wants to know about newly created files, it has to do a directory listing. For this the directory has to be opened and file_refresh applies. Could your observation be caused by a slow connection under heavy load with requests frequently timing out? When a PROPFIND-request to get directory information times out, davfs2 will not report an error but reuse the cached information.
      I don't think a special refresh-time to check for existence of directories would be generally useful. Generally directories may be created and deleted just as often and unpredictable as files.

      Cheers
      Werner

       
    • monstruooo
      monstruooo
      2009-03-26

      Hi Wener

      Yes. I am sure about it. I may recheck it but we ran into a few problems with this particular setting and our tests showed that new files will not be visible until dir_refresh expires.

      Regarding the directory structure, we, for example, have 65536 directories, but they come with the installation. We don't create new ones. Anyway, I think if there are two different refresh settings for files and directories, then dir_refresh should ignore directory listings. Otherwise it becomes both impractical to have it and liable to create confusion and misconfigurations.

      Cheers

       
    • monstruooo
      monstruooo
      2009-03-26

      Running ls  from a system with davfs amounts to an application doing a directory listing? If it's so then yes, it's there. It's very easy to reproduce

       
    • monstruooo
      monstruooo
      2009-03-26

      Actually I know see that it's ok. Dunno why I decided that it's so. It was probably something in one of the previous versions. I don't see it in 1.3.3. Forget it

      :D  :D

       
    • monstruooo
      monstruooo
      2009-03-26

      Sorry for that mistake. It's this application not doing ls

       
    • monstruooo
      monstruooo
      2009-03-30

      By the way, thanks for information. I straced their application and told them exactly how they should access files to make davfs refresh the listing in case file not found is reported by davfs.

       
    • Werner Baumann
      Werner Baumann
      2009-03-30

      When an applications tries to access a file that has only recently been created on the remote server without first scanning the directory, it must have some other channel to get information about the creation of the file. This is unusual but valid.
      If you need this you could force davfs2 to update its information more frequently by setting "dir_refresh" to some short time. But this will increase traffic significantly, because there are many upcalls and davfs2 would have to do a request for each of them.

      There is probably another solution:
      if some application tries to acces a file davfs2 could serve the information from the cache (if there is one), and do a request in case it can not find any information about that file in the cache. I made another patch and did a *short* successful test. If you are interested you might try it:

      At about line 850 in file cache.c change function "dav_lookup" like this:

      int dav_lookup(dav_node **nodep, dav_node *parent, const char *name,
                     uid_t uid) {

          if (!is_valid(parent))
              return ENOENT;
          if (debug)
              syslog(LOG_MAKEPRI(LOG_DAEMON, LOG_DEBUG), "lookup %s%s", parent->path,
                     name);
          if (!is_dir(parent))
              return ENOTDIR;
          if (!has_permission(parent, uid, X_OK | R_OK))
              return EACCES;

          /* If the child annot be found update the dir information if it is older than
              the short file_refresh time; otherwise only update dir information when
              older than the long dir_refresh time (stored in retry).
              After updating the dir information check again whether the child now
              (still) exists. */
          *nodep = get_child(parent, name);
          if (*nodep == NULL) {
              update_directory(parent, file_refresh);
          } else {
              update_directory(parent, retry);
          }
          *nodep = get_child(parent, name);
          if (*nodep == NULL)
              return ENOENT;

          if (is_dir(*nodep)) {
              if ((*nodep)->utime == 0)
                  update_directory(*nodep, retry);
              if (create_dir_cache_file(*nodep) != 0)
                  return EIO;
          } else if (is_open(*nodep)) {
              attr_from_cache_file(*nodep);
          }

          return 0;
      }

      Probably the forum software will again remove all indentation. If you want to be really sure you could use 0 instead of file_refresh. But this might cause several requests for the same information within one second.

      Cheers
      Werner

       
    • monstruooo
      monstruooo
      2009-03-30

      WOW!!! You are very cool!!

      Yes. Of  course. If you can have such an option, then it just rocks. Nobody should negotiate with these software providers - davfs should simply have an option to do directory listing in case of "file not found" errors and the problem is solved

      Very good !!!

      Cheers !!!

       
    • monstruooo
      monstruooo
      2009-03-31

      By the way, do you know why I could not run the fix with patch? Can it be something with the way the forum reformats the text?

       
    • monstruooo
      monstruooo
      2009-03-31

      "If you want to be really sure you could use 0 instead of file_refresh. But this might cause several requests for the same information within one second. "

      If I do it, then these files will stop being added to the cache? (I would prefer this)

       
    • monstruooo
      monstruooo
      2009-03-31

      Hi Werner

      There is one thing that I forgot to ask you about davfs. Say, today I was working with it and noticed that the cache grew to 200+ MB. However I am using the default of 50 MB.  I can't figure out when davfs decides to clean its cache, because it looks as if as long as you keep davfs busy the cache can grow indefinitely. In fact, it can stay like this even after nothing passed through davfs for quite a while. What are the rules regarding keeping the cache size down?

       
    • monstruooo
      monstruooo
      2009-04-01

      Let me put it in a more practical way. I would like to have control over how often or when davfs starts purging its cache. Mainly because I want to prevent it from growing too large. I would like to have it to do more often.

       
    • Werner Baumann
      Werner Baumann
      2009-04-01

      davfs2 is single threaded and it will clean the cache whenever it is not busy serving requests.
      But this is probably not the real problem. There are files that davfs2 *must* keep in cache:
      - files that are held open by an application
      - files that have been changed and could not yet be uploaded
      - files that cannot be uploaded at all for some reason and that are therefore
        moved into lost+found to not lose your work.
      If your applications cause more traffic than your connection can transport, there is no help.
      But probably it will help to look at lost+found from time to time. This will also show whether there is any problem with file upload.

      Cheers
      Werner

       
    • monstruooo
      monstruooo
      2009-04-04

      What I see is that when there are many requests the cache does grow beyond the limit and even when the load stops davfs makes no attempt to clean itself. There are no files in lost+found and in general it looks perfectly ok - no files are missing or left unpdated.

      However, though I am not 100% sure about it, at some point davfs does clean the cache from this stuff. It happens once a day or a few days. At this point davfs goes into 100% CPU and becomes unresponsive. I think that at this moment it's doing something that it does during every umount/mount. This is because during umount/mount davfs can also go into 100% for a while and rids itself successfully of these extra megabytes.

      My interest in this comes from an observation that if the cache grows too big, it makes davfs unresponsive during mounts and during these 100% CPU spikes which I suspect happen when davfs undertakes more aggressive cleaning of the cache.

       
    • Werner Baumann
      Werner Baumann
      2009-04-04

      davfs2 does cache maintainance whenever it has time to (with a delay of up to 10 seconds). When the cache size is to big, it will scan the hash table for the file that has not been accessed for the longest time and remove it from cache. It will repeat this search untill the cache is within its limit, or there is no more file in the cache it is allowed to remove.

      When you have a huge repository, and when you have a high load and when you additionaly have a slow or unreliable connection:
      Then you will get situations with a lot of dirty files that are not uploaded. When your connection comes up again and these files are uploaded, davfs may have to remove a lot of files from cache. This may take some time and as davfs2 is single threaded it will block. Additionally: davfs2 keeps a record in memory for every file and directory ever visitied. If you have not enough working memory and the system starts swaping it will get worse. You will need adequate CPU-power and working memeory for this.

      davfs2 was not designed a a network file system. It was designed with one or a few users editing resources on a WebDAV-server in mind.

      Maybe some day (not soon) I will make davfs2 multithreaded and als design a more sophisticated algorithm for maintaining the disk and memory cache.

      Cheers
      Werner