lots of coda_venus_readdir: Invalid dir: xxxx

Help
butchie55
2006-09-07
2013-04-16
  • butchie55
    butchie55
    2006-09-07

    I just updated davfs2 to 1.0.2 (from a 0.2.8) and my kernel to 2.6.17-gentoo-r7.
    It seems to work faster and better. However, I get a lot of "coda_venus_readdir: Invalid dir XXXXXXX" in my logs and in dmesg.
    I have 2 davfs mounts : one is 11GB, 30000 files, 3700 folders. The second is only 760 MB, 2600 files and 1100 folders.
    They are being accessed from windows webdav clients for distant access and from windows apps through samba shares mapped to the DAVFS2 folders for the local stations (and through konqueror or else from my machine on the mounted davfs).
    It seems to work great (for about 4 hours now) apart from a few messages.
    The first ones are when I mounted the davfs folders, I got a message that they don't find davfs.conf.template and secrets.template (which is probably a distribution-only problem). I got the /etc/davfs2/davfs.conf and secrets files configured as I could. But it seems to be working OK.
    The second set of messages are much more annoying (for me at least) :
    Sep  7 12:09:38 nuxdev coda_venus_readdir: Invalid dir: 2750452
    Sep  7 12:09:38 nuxdev coda_venus_readdir: Invalid dir: 2750452
    Sep  7 12:09:38 nuxdev coda_venus_readdir: Invalid dir: 2750140
    Sep  7 12:09:38 nuxdev coda_venus_readdir: Invalid dir: 2750140
    Sep  7 12:09:38 nuxdev coda_venus_readdir: Invalid dir: 2750085
    Sep  7 12:09:38 nuxdev coda_venus_readdir: Invalid dir: 2750085
    Sep  7 12:09:40 nuxdev coda_venus_readdir: Invalid dir: 2750275
    Sep  7 12:09:40 nuxdev coda_venus_readdir: Invalid dir: 2750275
    Sep  7 12:09:40 nuxdev coda_venus_readdir: Invalid dir: 2750077
    Sep  7 12:09:40 nuxdev coda_venus_readdir: Invalid dir: 2750077
    Sep  7 12:09:42 nuxdev coda_venus_readdir: Invalid dir: 2750275
    on and on, filling my logs.
    Now, lots of the file names contain special characters (french accents etc) : could it be the problem ?
    They appear correctly on the screen.
    The folders are on a reiserfs3 partition , which otherwise seems clean. Any known uncompatibility ?
    Due to the number of files contained in the davfs2 mounts, should some parameters be very different from the defaults ?
    Any suggestions ?
    Yours,
    Butchie

     
    • butchie55
      butchie55
      2006-09-07

      More on the problem :
      We just got disconnected with one of the davfs folders. Logs full with :
      "No pseudo device in upcall comms at c04c8940"
      with, from time to time :
      "Sep  7 13:46:51 nuxdev coda_statfs: Venus returns: -6"
      Nothing could get the folder accessible again (umount impossible : busy, apache2 restart did nothing). I had to reboot the server. It is now working ...with full of  : "Sep  7 14:11:52 nuxdev coda_venus_readdir: Invalid dir: 2749920
      Sep  7 14:12:17 nuxdev coda_venus_readdir: Invalid dir: 165082" in the log again.
      Also I noted that when the davfs user do
      "ls -la my_davfs_folder", he gets
      "ls: lecture du répertoire mnt/clients/: Mauvais descripteur de fichier" then the correct list of files.
      But when root does "ls -la real_folder" he gets no error message and same list of files. So it looks like a davfs2 problem.
      Hoping some help.
      Yours,
      Butchie

       
    • Werner Baumann
      Werner Baumann
      2006-09-08

      Hello Butchie,

      thanks for the bug report.

      Both errors ("coda_venus_readdir: Invalid dir: .." and "No pseudo device in upcall comms at ..") look like bugs in davfs2. But before I get to the difficult part (finding the bugs and how you can help with this), some easy hints:

      Optimizing for big directories:
      -------------------------------
      Whenever a file is opened, dafvs2 stores it on the local disc in case it can reuse it later (it will only have to download it again, if it has changed remotely). But when the size of the cache reaches "cache_size" it will delete old entries. So you might increase "cache_size" (up to the size of your directory) to reduce traffic if you have enough free space on your local disk.

      davfs2 also stores informations about files and directories in memory. There is a hash table used to quickly look up entries. You might increase this table with parameter "table_size". According to the number of files 32768 should be a reasonable value.

      Non-ascii-characters:
      ---------------------
      This should not be any problem. davfs2 treats names just as a sequence of 8-bit-characters and does not care about the meaning. Displaying the names is up to your environment settings and it seems to get it right.

      Unmounting impossible, busy:
      ----------------------------
      You know about the beginners trap to cd into a directory and then to try to unmount the same directory. But there are more subtle traps of this kind. e.g. Nautilus (Gnome file browser) is very lazy in releasing files and directories. So I sometimes have to stop Nautilus or even Gnome to be able to unmount (cdrom as well as davfs2 mounts). I don't know the behaviour of Samba in this respect. And if the Samba share is in use this might prevent unmounting. So you might look for applications that use the file system and stop them. You may try the "fuser" utility to find the processes that use the file system. It should be possible to unmount almost anything without shutting down the machine.

      In your case mount.davfs seems to have crashed. If you get problems with unmounting while mount.davfs is still running, it should respond to "kill -SIGTERM".

      Reiser:
      -------
      I have no experience with ReiserFS, but davfs2 just creates ordinary files and directories and I can't think of any problems that could arise with ReiserFS.

      "ls: lecture du répertoire mnt ..."
      -----------------------------------
      With the help of my daughters dictonary I assume this is the "ls"-version of the "coda_venus_readdir: Invalid dir:"-bug. But I have no clue why davfs behaves different for normal users and root. But if they ls different directories this may be due to the fact that the error occurs only with certain directoies.
      - By the way: are there any exceptional long file names? File names (not path, only the last component) as long as 128 or even 252 characters?

      Now the difficult part: Debugging.
      As I did a lot of changes recently that may also affect the bugs you reported, there are two options:
      - try to debug davfs2-1.0.2
      - test the latest version from CVS
      It is up to you which one you prefer. Both might be successful or frustrating.

      Debugging davfs2-1.0.2:

      coda_venus_readdir: Invalid dir:
      --------------------------------
      Coda kernel module claims that davfs2 calculated the size of dir_entries wrong. I looked up my code, but it is hard to find the reason this way, because I think I did it right. So best would be to look up the actual data davfs2 created. This dir-entries are stored by mount.davfs in files starting with "dir_" in the cache directory. But there are that many of them.
      One way to get this files:
      - read the file <your-cache-dir>/index to get the name of the cache-sub-directory for your mount.
      - run davfs2 with as little load as possible but as much load as necessary to get the "coda_venus_readdir: Invalid dir"-messages
      - copy all the files in the cache-sub-directory that start with "dir_" into some temporary directory.
      (cp -p <cache-sub-directory>/dir_* <temp-directory>
      - copy the "coda_venus_readdir: Invalid dir"-messages into a text file and add it to that directory.
      - send all of it as .tar.gz to me.
      I hope there are no sensitive informations in it. I will not abuse them, but it is better they are not there. This file just contain file and directory names and some numbers (name-length, length of the dir-entry, node number). You may have a look at them, but it is a mix-up of binary data and ascii.

      No pseudo device in upcall comms at ..:
      ---------------------------------------
      This usually means that mount.davfs has been killed because of a serious bug, mostly an invalid pointer. But a memory leak might also bee the reason. It is hard to find from remote and I also don't know how to work with "dumps".
      - Maybe there is some entry in the log files about this?

      You might help testing for memory leaks:
      ----------------------------------------
      From time to time run
      pmap <process id of running mount.davfs>
      The output should start like this:

      4577:   /sbin/mount.davfs http://localhost/apache2-default/davfs2 /home/werner/local -o rw nosuid nodev noauto user
      08048000     84K r-x--  /mount.davfs
      0805d000      4K rw---  /mount.davfs
      0805e000    140K rw---    [ anon ]
      40000000     88K r-x--  /ld-2.3.2.so
      40016000      4K rw---  /ld-2.3.2.so
      ...

      The line with [ anon ] shows the amount of dynamically allocated memory. davfs2 use this to hold information about files and directories. It needs appr. 250 Byte for every file or directory. So in your case this may sum up to about 10 MByte. After mounting it will slowly increase (due to new directories that are visited) but should finally settle by about 10 MBytes. If it exeeds this value and keeps on increasing all the time, there is a memory leak.

      Alternative:

      You might instead try the latest version from CVS. There are a lot of changes, that will hopefully fix some bugs, but might also introduce new ones.
      If you don't like compiling the sources from CVS I could make a prerelease package for you.

      Here is my email adress if you want to send debug data that are to big or to sensible for the forum:
      werner.baumann@onlinehome.de

      I hope this long message does not discourage you. Your informations are important for me, because I can not test with such heavy loads as you do.

      Greetings
      Werner

       
    • butchie55
      butchie55
      2006-09-08

      Thank you for your reply. I really would like to help. I will re-read your post to see what I can do easily (or not) and let you know the results.

      As to testing a cvs version, great but not on my production server. My boss really likes it to have access to all these files from away so I did activate this webdav and davfs system though it is not yet stable (I do very regular backups, in case...).
      But to install and run a cvs version, I would like to install every thing on a test server, with the same setup and data... Let me a couple of days at least, as I also have to get familiar with cvs...

      I started to look at the nmap stuff in the mean time. I will monitor it to see if it increases ...
      Here is the current state on one of the mounts :
      nuxdev apache2 # pmap 7097
      7097:   /sbin/mount.davfs http://localhost/matrix /home/fb/mnt/matrix -o rw noexec nosuid nodev noauto user
      08048000     64K r-x--  /usr/sbin/mount.davfs
      08058000      4K rw---  /usr/sbin/mount.davfs
      08059000   8148K rw---    [ anon ]
      b7b67000    296K rw---    [ anon ]
      b7bd2000    296K rw---    [ anon ]
      b7c1c000     32K r-x--  /lib/libnss_files-2.4.so
      b7c24000      8K rw---  /lib/libnss_files-2.4.so
      b7c26000     32K r-x--  /lib/libnss_nis-2.4.so
      b7c2e000      8K rw---  /lib/libnss_nis-2.4.so
      b7c30000     68K r-x--  /lib/libnsl-2.4.so
      b7c41000      8K rw---  /lib/libnsl-2.4.so
      b7c43000      8K rw---    [ anon ]
      b7c45000     24K r-x--  /lib/libnss_compat-2.4.so
      b7c4b000      8K rw---  /lib/libnss_compat-2.4.so
      b7c4d000      4K rw---    [ anon ]
      b7c4e000   1116K r-x--  /lib/libc-2.4.so
      b7d65000      8K r----  /lib/libc-2.4.so
      b7d67000      8K rw---  /lib/libc-2.4.so
      b7d69000     16K rw---    [ anon ]
      b7d6d000    116K r-x--  /usr/lib/libexpat.so.0.5.0
      b7d8a000      8K rw---  /usr/lib/libexpat.so.0.5.0
      b7d8c000      8K r-x--  /lib/libdl-2.4.so
      b7d8e000      8K rw---  /lib/libdl-2.4.so
      b7d90000   1192K r-x--  /usr/lib/libcrypto.so.0.9.8
      b7eba000     84K rw---  /usr/lib/libcrypto.so.0.9.8
      b7ecf000     12K rw---    [ anon ]
      b7ed2000    232K r-x--  /usr/lib/libssl.so.0.9.8
      b7f0c000     16K rw---  /usr/lib/libssl.so.0.9.8
      b7f10000     68K r-x--  /lib/libz.so.1.2.3
      b7f21000      4K rw---  /lib/libz.so.1.2.3
      b7f22000    112K r-x--  /usr/lib/libneon.so.26.0.1
      b7f3e000      4K rw---  /usr/lib/libneon.so.26.0.1
      b7f3f000      4K rw---    [ anon ]
      b7f59000    104K r-x--  /lib/ld-2.4.so
      b7f73000      4K r----  /lib/ld-2.4.so
      b7f74000      4K rw---  /lib/ld-2.4.so
      bfa5d000     88K rw---    [ stack ]
      ffffe000      4K -----    [ anon ]
      total    12228K

      (the other davfs mount is the same but with a total of 5672K)

      I will let you know how it goes.
      Yours,

      Butchie

       
    • Werner Baumann
      Werner Baumann
      2006-09-13

      Hello Butchie,

      your last email shows there is definitely a serious bug in davfs2. I tries to free a pointer that is no more valid and consequently crashes.

      To find this bug will take some time (I am not very experienced in debugging tasks like this). And I need to know the exact version of davfs2 you are running.
      - Is it the source-package or the binary package from this site?
      - If it is from some other origin (maybe your distribution) I need either the adress where to get exactly the same version, or you can send me the package.
      - If you compiled the package on your machine, it is best to send me your binary too.

      Some minor problems:
      If you add or delete files on the server, this will only be reflected on your local machine after time "expire" (this may be changed in davfs2.conf). There is also a bug concerning notification of the kernel that is fixed in CVS but not in the package davfs2-1.0.2.

      '~'-character:
      Apache (and maybe other webservers) use this character to access a users home direcotry; this will interfere with filenames starting with '~'. If you don't need this feature you should disable it on the server (most of this aliasing stuff will make trouble when used on a WebDAV-server).

      Greetings
      Werner

       
    • butchie55
      butchie55
      2006-09-14

      OK, here are my specs :
      davfs2-1.0.2_p20060820, last unstable package from portage (the package management system of Gentoo. You can get the ebuild there : http://gentoo-portage.com/AJAX/Ebuild/32507 ).
      Compiled on my machine (2xPIII 700, Kernel 2.6.17 -r8, GCC 4.1.1 ...)
      I can send you the package and maybe also the binary if you think it usefull. But maybe it is a waste of time if the cvs version has corrected already a few things. Let me know if you want me to install the cvs version .
      Yours,
      Butchie

       
    • butchie55
      butchie55
      2006-09-14

      regarding the "~"character :
      Thanks for the tip. It will be userfull.
      These are temp files, generated by M$Word and are correctly removed (even on a davfs-samba share) when the file is closed properly (which is not always the case with M$ software). I have not yet succeeded to get everyone here to use OOo so i guess I will have to do some clean-up regularly ;-)
      Yours
      Butchie

       
    • Werner Baumann
      Werner Baumann
      2006-09-15

      Hello Butchie,

      thanks for the package. Now I have the code to search the bug in. What would help further is, to find the function that is responsible. The error report you send to me, refers to "/sbin/mount.davfs[0x804af20]". To find the function from this address, there are two ways:

      - you send me the binary too (mount.davfs). It should be in /sbin (this may be a symbolic link), /usr/sbin or even /usr/local/sbin.

      - If you have the programm "readelf" installed, you may instead run
      "readelf -s /sbin/mount.davfs | grep FUNC | grep 804" (Maybe you will have to ajust the path.)
      This will print a list of functions and their address. You could send this list instead of the binary to me.

      I will try to fix this bug first. The "coda_venus_readdir: Invalid dir"-bug still remains. I will analyse the files you send me as soon as I have got time.

      Greetings
      Werner

       
    • butchie55
      butchie55
      2006-09-15

      Hello Werner.
      Here what I got from readelf :
      nuxdev dav # readelf -s /sbin/mount.davfs | grep FUNC | grep 804
          49: 08049ff4     0 FUNC    GLOBAL DEFAULT   10 _init
          79: 0804a4dc    15 FUNC    GLOBAL DEFAULT  UND ne_accept_2xx
      I will also send you the binary.
      Yours,
      Butchie

       
    • Werner Baumann
      Werner Baumann
      2006-09-17

      Hello Butchie,

      the gentoo package of davfs2 is actually the latest CVS version. Now I remember: gentoo wants support for gnutls instead of openssl, and this cvs version is the first one to support it. But it is not much tested.

      Unfortunately your binary does not include local symbols, so I could not directly find the evil doeing function. But nevertheless I found four programming errors that create dirty pointers and may be the reason for the crashes.

      The "coda_venus_readdir: Invalid dir"-errors seem to be caused by an unexpected big size of IO-blocks on your system, that davfs2 can not handle (the directories have a size of 131072 byte which is far to high).

      There is a patch for this bugs appended. Please copy it into your top-level sorce directory (davfs2) and issue the command:
      "patch -p1 < davfs2-cvs1.0.2-badpointer.patch"
      Now you can build the package again.

      Size of most directories should now be 4096 byte (some may be bigger), the "coda_venus_readdir: Invalid dir"-message should not show again, and hopefully it will not crash.

      Please carefully watch it as it is really beta-code.

      It would be nice if you could evaluate the size of IO-blocks on the file system where the davfs2 cache is stored. Just do
      "stat <name of any file or directory>"
      The output should show the size of IO-blocks. I would like to see wheather my assumptions about the reason of the bug are true.

      Greatings
      Werner

      Beginn patch "davfs2-cvs1.0.2-badpointer.patch":

      diff -Naur davfs2/src/cache.c davfs2-new/src/cache.c
      --- davfs2/src/cache.c    2006-08-21 22:51:51.000000000 +0200
      +++ davfs2-new/src/cache.c    2006-09-17 10:29:06.720200399 +0200
      @@ -236,7 +236,7 @@

      static int is_busy(const dav_node *node);

      -static inline int is_cached(const dav_node *node) {
      +static inline int is_cached(dav_node *node) {

           if (!S_ISREG(node->mode) || node->cache_path == NULL)
               return 0;
      @@ -244,6 +244,7 @@
               return 1;
           } else {
               free(node->cache_path);
      +        node->cache_path = NULL;
               return 0;
           }
      }
      @@ -299,6 +300,8 @@

      static inline void set_cache_file_times(dav_node *node) {

      +    if (node->cache_path == NULL)
      +        return;
           struct utimbuf t;
           t.actime = node->atime;
           t.modtime = node->mtime;
      @@ -1790,6 +1793,7 @@
               syslog(LOG_MAKEPRI(LOG_DAEMON, LOG_ERR),
                      "Error %i creating: %s", errno, node->cache_path);
               free(node->cache_path);
      +        node->cache_path = NULL;
               return EIO;
           }

      @@ -1817,6 +1821,7 @@
               syslog(LOG_MAKEPRI(LOG_DAEMON, LOG_ERR),
                      "Error %i creating: %s", errno, node->cache_path);
               free(node->cache_path);
      +        node->cache_path = NULL;
               return EIO;
           }

      @@ -1839,6 +1844,7 @@
                      "Error writing directory %s", node->cache_path);
               remove(node->cache_path);
               free(node->cache_path);
      +        node->cache_path = NULL;
               return EIO;
           }
      }
      @@ -1973,7 +1979,7 @@

           struct stat st;
           if (stat(index, &st) == 0) {
      -        blocksize = st.st_blksize;
      +        blocksize = (st.st_blksize > 4096) ? 4096 : st.st_blksize;
           } else {
               blocksize = 1024;
           }

      End of patch.

      I will send you the patch file additionally directly by email, as there may be incorrect line breaks on this web page.

       
    • butchie55
      butchie55
      2006-09-18

      Dear Werner,
      Thanks for the patch, I will apply it this afternoon.
      In the meantime, here is the result of the stat command on the davfs mount folders :
      nuxdev mnt # stat matrix
        File: `matrix'
        Size: 131072          Blocks: 256        IO Block: 131072 répertoire
      Device: 10h/16d Inode: 3834217831  Links: 35
      Access: (0755/drwxr-xr-x)  Uid: ( 1000/      fb)   Gid: (  100/   users)
      Access: 2006-09-18 10:34:21.000000000 +0000
      Modify: 2006-09-18 10:12:45.000000000 +0000
      Change: 2006-09-18 10:12:45.000000000 +0000
      nuxdev mnt # stat clients/
        File: `clients/'
        Size: 131072          Blocks: 256        IO Block: 131072 répertoire
      Device: 11h/17d Inode: 3834144103  Links: 9
      Access: (0755/drwxr-xr-x)  Uid: ( 1000/      fb)   Gid: (  100/   users)
      Access: 2006-09-14 12:23:39.000000000 +0000
      Modify: 2006-09-13 10:38:47.000000000 +0000
      Change: 2006-09-13 10:38:47.000000000 +0000

      When I do stat on the local file system :
      nuxdev dav # stat matrix
        File: `matrix'
        Size: 1088            Blocks: 2          IO Block: 131072 répertoire
      Device: 301h/769d       Inode: 2           Links: 35
      Access: (0777/drwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
      Access: 2005-12-08 13:43:31.000000000 +0000
      Modify: 2006-09-12 21:37:57.000000000 +0000
      Change: 2006-09-12 21:37:57.000000000 +0000
      nuxdev dav # stat clients
        File: `clients'
        Size: 208             Blocks: 0          IO Block: 131072 répertoire
      Device: 302h/770d       Inode: 2           Links: 9
      Access: (0777/drwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
      Access: 2005-12-08 13:43:52.000000000 +0000
      Modify: 2006-09-13 09:54:17.000000000 +0000
      Change: 2006-09-13 09:54:17.000000000 +0000

      IO Block are the same (131072 ??) for the 2 folders, either through the davfs mount or through the local file system.
      but file size are different :
      131072 for the davfs.mount of the folders matrix and clients instead of 1088 for matrix and 208 for clients on the local filesystem.

      I now go on to patch davfs ...

      Yours,

      Butchie

       
    • Werner Baumann
      Werner Baumann
      2006-09-21

      Hello Butchie,

      thanks for the output of stat. It really shows a preferred IO-blocksize of 131072, which davfs2 can't handle. But the patch should solve this. I will have to restrict blocksize for directories to 4096; it would be a waste of disk space anyway.

      Could you apply the patch? Does it solve the problems?

      Greetings
      Werner

       
    • butchie55
      butchie55
      2006-09-21

      Sorry for the delay, but I had to get help from the gentoo forum to apply the patch, using an overlay etc..
      So I finally go it.
      On mounting a davfs mount,  I got in the logs :
      Sep 21 08:50:17 nuxdev mount.davfs: Error parsing index file:   XML parse error at line 643: parsing finished
      Sep 21 08:50:23 nuxdev coda_read_super: device index: 0
      Sep 21 08:50:23 nuxdev coda_read_super: rootfid is (01234567.ffffffff.080696b0.00000000)
      Sep 21 08:50:23 nuxdev coda_read_super: rootinode is -460749465 dev coda
      I got the same message on mounting the second mount point :
      Sep 21 09:00:00 nuxdev mount.davfs: Error parsing index file:   XML parse error at line 657: parsing finished
      Sep 21 09:00:00 nuxdev coda_read_super: device index: 1
      Sep 21 09:00:00 nuxdev coda_read_super: rootfid is (01234567.ffffffff.080696c8.00000000)
      Sep 21 09:00:00 nuxdev coda_read_super: rootinode is -460823193 dev coda

      But, when I access the davfs folders, like "ls"
      I don't get any more the error message of bad descriptor name, and the logs don't get filled with the "coda_venus_readdir: Invalid dir: xxxx",
      which is very well done from you.

      I will continue to make tests and let you know of any surprises; I hope the "Error parsing index file:..." above is not to bad.

      Thanks again,
      Butchie

       
    • Werner Baumann
      Werner Baumann
      2006-09-21

      Hello Butchie,

      "Error parsing ..." usually means that davfs2 will have to build up the cache again. It might be caused by previous crashes of davfs2. If this error occurs even when the file system has been unmounted  without error, it might be a good idea to send me one of thise index files. In your cache directory there will be subdirectories for every mounted file system. Each of this will contain one file called index.

      Currently I am doing a lot of testing to get a new, better tested, release ready. I continue to detect more serious bugs, some of which might be in the version you are using. So please don't trust in this CVS version.

      Greetings
      Werner