Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#35 NFS4 doesn't work properly -hangs mounting

open
nobody
None
5
2012-09-10
2010-12-16
Anonymous
No

Hi,
NFS4 isn't working correctly - there is a large hang mounting or unmounting - if you turn on debugging ;
echo 1 > /proc/sys/sunrpc/rpc_debug
echo 1 > /proc/sys/sunrpc/nfs_debug
you can see what looks like a problem with RPC, but I cant work out what the issue is.

I can mount from other live systems ok to my server (have tried also on several different server boxes) and same issue, although all running Ubuntu.
nfs3 etc are all ok, just nfs4 is the hang.

Any ideas?!

cheers

rich

Discussion

  • I've tested the nfs with Fedora server, but don't regularly use it.
    I have in the past used strace along with commands to get more detailed info on what is going on.
    I do have a couple of old machines that would not run fedora 10 or above, but do have Ubuntu 10.4 running on them.
    Don't have the nfs setup, so would have to do some setup.

    I have been creating new images, and have moved to Fedora 14 as the build system, but the kernels are from kernel.org.

    I use ldd to make sure that all the libraries are available for the programs, but have sometimes found with strace that a library that doesn't show up is needed.

    I'm currently uploading a new version that includes strace.

    ftp://amd64gcc.dyndns.org/g4l-v0.36alpha18.iso

    If you could put strace in front of what you use to do the mount, and see if it shows more info.
    You might want to redirect output to a file. If you see something or can send me more data.
    Be sure to remove anything with passwords.

    You can contact me directly at mikes@kuentos.guam.net

     
  • One other issue. Do you have any characters in password that would be interrupted by the shell script. If so, you would need to put a \ in front of them.
    Example if password was pa$$word your would need to enter pa\$\$word

     
  • chud
    chud
    2010-12-16

    same issue in latest version you kindly uploaded (thanks for strace its very useful to be included !)
    this is from mounting....

    execve("/bin/mount", ["mount", "-t", "nfs4", "clonemod.le.ac.uk:/", "/tmp/t"], [/ 10 vars /]) = 0
    brk(0) = 0x82b2000
    mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb77db000
    access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
    open("/etc/ld.so.cache", O_RDONLY) = -1 ENOENT (No such file or directory)
    open("/lib/tls/i686/sse2/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
    stat64("/lib/tls/i686/sse2", 0xbfd3fac4) = -1 ENOENT (No such file or directory)
    open("/lib/tls/i686/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
    stat64("/lib/tls/i686", 0xbfd3fac4) = -1 ENOENT (No such file or directory)
    open("/lib/tls/sse2/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
    stat64("/lib/tls/sse2", 0xbfd3fac4) = -1 ENOENT (No such file or directory)
    open("/lib/tls/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
    stat64("/lib/tls", 0xbfd3fac4) = -1 ENOENT (No such file or directory)
    open("/lib/i686/sse2/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
    stat64("/lib/i686/sse2", 0xbfd3fac4) = -1 ENOENT (No such file or directory)
    open("/lib/i686/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
    stat64("/lib/i686", 0xbfd3fac4) = -1 ENOENT (No such file or directory)
    open("/lib/sse2/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
    stat64("/lib/sse2", 0xbfd3fac4) = -1 ENOENT (No such file or directory)
    open("/lib/libc.so.6", O_RDONLY) = 3
    read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p\357\30\0004\0\0\0"..., 512) = 512
    fstat64(3, {st_mode=S_IFREG|0755, st_size=1893912, ...}) = 0
    mmap2(0x178000, 1653288, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x178000
    mprotect(0x305000, 4096, PROT_NONE) = 0
    mmap2(0x306000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18d) = 0x306000
    mmap2(0x309000, 10792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x309000
    close(3) = 0
    mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb77da000
    set_thread_area({entry_number:-1 -> 6, base_addr:0xb77da6c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
    mprotect(0x306000, 8192, PROT_READ) = 0
    mprotect(0x174000, 4096, PROT_READ) = 0
    getuid32() = 0
    brk(0) = 0x82b2000
    brk(0x82d3000) = 0x82d3000
    brk(0) = 0x82d3000
    getuid32() = 0
    geteuid32() = 0
    stat64("clonemod.le.ac.uk:/", 0xbfd3ff44) = -1 ENOENT (No such file or directory)
    mount("clonemod.le.ac.uk:/", "/tmp/t", "nfs4", MS_SILENT, "") = -1 EINVAL (Invalid argument)
    vfork() = 1861
    waitpid(1861, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 1861
    --- SIGCHLD (Child exited) @ 0 (0) ---
    exit_group(0) = ?

    this is from unmounting

    execve("/bin/umount", ["umount", "/tmp/t"], [/ 10 vars /]) = 0
    brk(0) = 0x8c50000
    mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb785a000
    access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
    open("/etc/ld.so.cache", O_RDONLY) = -1 ENOENT (No such file or directory)
    open("/lib/tls/i686/sse2/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
    stat64("/lib/tls/i686/sse2", 0xbfb45f14) = -1 ENOENT (No such file or directory)
    open("/lib/tls/i686/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
    stat64("/lib/tls/i686", 0xbfb45f14) = -1 ENOENT (No such file or directory)
    open("/lib/tls/sse2/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
    stat64("/lib/tls/sse2", 0xbfb45f14) = -1 ENOENT (No such file or directory)
    open("/lib/tls/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
    stat64("/lib/tls", 0xbfb45f14) = -1 ENOENT (No such file or directory)
    open("/lib/i686/sse2/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
    stat64("/lib/i686/sse2", 0xbfb45f14) = -1 ENOENT (No such file or directory)
    open("/lib/i686/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
    stat64("/lib/i686", 0xbfb45f14) = -1 ENOENT (No such file or directory)
    open("/lib/sse2/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
    stat64("/lib/sse2", 0xbfb45f14) = -1 ENOENT (No such file or directory)
    open("/lib/libc.so.6", O_RDONLY) = 3
    read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p\357\30\0004\0\0\0"..., 512) = 512
    fstat64(3, {st_mode=S_IFREG|0755, st_size=1893912, ...}) = 0
    mmap2(0x178000, 1653288, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x178000
    mprotect(0x305000, 4096, PROT_NONE) = 0
    mmap2(0x306000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18d) = 0x306000
    mmap2(0x309000, 10792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x309000
    close(3) = 0
    mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7859000
    set_thread_area({entry_number:-1 -> 6, base_addr:0xb78596c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
    mprotect(0x306000, 8192, PROT_READ) = 0
    mprotect(0x174000, 4096, PROT_READ) = 0
    getuid32() = 0
    brk(0) = 0x8c50000
    brk(0x8c73000) = 0x8c73000
    brk(0) = 0x8c73000
    open("/proc/mounts", O_RDONLY) = 3
    fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
    mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7858000
    read(3, "rootfs / rootfs rw 0 0\n/dev/root"..., 1024) = 394
    read(3, "", 1024) = 0
    close(3) = 0
    munmap(0xb7858000, 4096) = 0
    lstat64("/tmp", {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0
    lstat64("/tmp/t", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
    oldumount("/tmp/t") = 0
    exit_group(0) = ?

     
  • From the look of it.
    execve("/bin/mount", ["mount", "-t", "nfs4", "clonemod.le.ac.uk:/",
    "/tmp/t"]
    , [/ 10 vars /]) = 0

    You are enter the command mount -t nfs4 clonemod.le.ac.uk:/ /tmp/t

    The mount point for g4l needs to be the fixed /mnt/local

    Can you try it with /mnt/local for the mount point?

     
  • Can you try the mount as just -t nfs and use the -o nolock as well.

    mount -t nfs -o nolock ipaddress_of_nfs:/direcotry /mnt/local

    I set the nfs on my Fedora 14 machine, and had to turn off the firewall on the machine to get the connection

    The /etc/exports file contained
    /home/g4l/img2 192.168.7.220(rw,sync)

    I used the command
    mount -t nfs -o nolock 192.168.7.219:/home/g4l/img2 /mnt/local

    and it was able to mount and use correct.

    I tried with -t nfs4 and it failed to connect??

    strace didn't show anything other than it just waiting for something?

     
  • chud
    chud
    2010-12-17

    Hi,
    Mount point makes no difference, (nor does nolock).
    It sounds like you have the same problem as me then - if you leave it as nfs it uses nfs3 and works fine, specifying -t nfs4 it hangs, (if you leave it a few minutes it will work eventually, your strace will probably look same as mine). If you set debugging on rpc and nfs with echo 1 > ... as I mentioned initially you can see it making attempts to work with RPC issues (as you can can see repetition with strace as same points in time)
    Any ideas? If I use buysbox's mount command in Ubuntu it seems to work ok?

     
    1. Can you do an strace with one of the live cds that work with the nfs4 and see it if reboots something we don't see.
    2. The mount with the g4l is the busybox mount of 1.17.4. I could try the mount from the fedora 14, which is 70+K in size, but don't know if it would do anything different.
    3. Since not being a user of NFS, not sure what the difference between using the nfs versus nfs4 would be.

    The g4l is only going to be doing a write or read with the connection. Might also be interesting to do a wireshark capture to see what the network traffic shows.

     
  • http://wiki.linux-nfs.org/wiki/index.php/Comparison_of_NFS_vs._others#Strengths_3

    Seems to show that nfs4 has higher level of authentication using Kerberos and SPKM-3, which may be the server is requesting for some info that g4l doesn't include.

    G4L has about a 6M kernel and then a 12M filesystem to support everything, so there may be something that the other livecds have that is handling this extra stuff.

     
  • chud
    chud
    2010-12-21

    i used your kernel and recreated my own initrd from scratch with latest busybox and nfs-utils (no gss) - and I have the same problem with a hang. I am sure it relates to RPC somehow but don't understand enough about what is going on to understand. I think its timing out on part of the process. I think a service isnt running perhaps?

    It worked ok when I used it with puppy linux, I downloaded the NFS part on-the-fly, did insmod and then it worked fine without delay. Will have more of a fiddle later.

     
  • I'm thinking that it might be that the client is making the beginning connection, but then the server is requesting some additional authentication. That is why the strace doesn't show anything since its command is working fine. In this case running wireshark on the server might show what is going on. From my reading of the stuff on nfs3 verses nfs4, I don't see where it would make a difference.

    At the moment, I've discovered some of the kernels no longer work. Am getting a kernel too old message. Looks like a glibc issue with the newer fedora 14 builds. When to rebuild the kernels with the new Fedora, but some it works, and other get a make error. Seems to be something with the new make that doesn't work with older make files.

     
  • chud
    chud
    2010-12-21

    I had a look with wireshark prior to doing my own initrd (in Ubuntu), but didnt see anything obviously wrong - although I didnt really understand what I was looking at a great deal. I could just see it hang waiting for something to happen, that didn't.

    Unless you tell nfs4 to specifcally use kerberos it shouldnt be doing anything more than nfs3 does, (just verifies the IP on the server) - I am wondering if it somehow relates to portmapper on the client for a back connection? I also tried with portmapper running on client but didnt seem to make a difference, but I may have been doing something wrong.

     
  • chud
    chud
    2010-12-21

    Yay! fixed it, has only taken 20 hours of effort :)
    rpcbind service needs to added and running.

    I downloaded rpcbind-0.2.0.tar.bz2 and just copied the two binaries it makes.
    As I was using my initrd build, I only needed to add libnss_files.so.2 as all the other libraries I had allready added (could see it was missing from strace rather than ldd)

    I used my /etc/netconfig from ubuntu and removed the ipv6 bits.

    It still errors when it runs with
    'cannot get local address for udp: servname not supported for ai_socktype
    'cannot get local address for tcp: servname not supported for ai_socktype

    ...dont know what this means....but it runs ok and fixes nfs4 hang!

    rpcinfo also works too.

     
  • I have just added rpcbind and rpcinfo plus the extra libraries that seem to be missing.
    Currently building the new alpha 27, but will be later to upload it.
    Takes sometime to upload it from my home machine to college site, but should be later today.

     
  • ftp://amd64gcc.dyndns.org/g4l-v0.36alpha27.iso

    That has the rpcbind and rpcinfo, so unless I missed something it should work, but have not yet tried it.

     
  • ftp://amd64gcc.dyndns.org/g4l-v0.36alpha28.iso

    needed a copy of other little things like creating an rpc user and /etc/netconfig

    So, I think it should work, but I don't have an nfs4 setup.

    I am still working on getting the latest partimage to work, and have gotten close, but am currently getting to the final step after password, and it then gives a version mismatch.?? The original one works fine.

    Also, removed all the older kernels that were failing with kernel too old messages.

     
  • chud
    chud
    2010-12-22

    (you need to remove the ipv6 bit from netconfig)
    im still getting the hang on yours.... there are some libraries i have you dont - but I am at work so I cant be more specific, as not in front of me. cheers

     
  • Thanks for info. I have all the libraries that ldd reported for both the rpcbind and rpcinfo, and they both did run with no errors, but I don't have a nfs4 setup to fully test it on? So don't know if it is a library or some file like the netconfig that is missing then.

    Again, thanks.

     
  • One other thing that comes to mind. Perhaps the rpcbind needs to be run before doing the nfs4 mount?
    I recall running rpcinfo and got an error, then ran rpcbind and then it gave results?

     
  • chud
    chud
    2010-12-23

    yeah you need to start rpcbind first.

    ..brought my initrd into work today and tried it here on imaging system - worked fine with no hang.

     
  • Just to confirm. Does the regular g4l work with the nfs4 mount if you do an rpcbind before doing the mount.
    If so, I could have it do that as part of the script when one does the mount nfs option, or do it at the beginning of the boot in the rcS.

     
  • chud
    chud
    2010-12-23

    on the regular one, wether rpcbind is started or not, it hangs mounting nfs4, on mine it only hangs if rpcbind is not started.

    i just did a transfer test with my initrd, using 'time', ~3.5 Gb image on Gb ethernet to a Intel DQ45 with hdd formatted ext2
    NFS4 (tcp)
    user time 0.1
    system time 3.84
    elapsed 1.41:07

    NFS3 (udp)
    user time 0.13
    system time 4.56
    elapsed 0.51:75

     
  • chud
    chud
    2010-12-23

    NFS3 (tcp)
    user tie 0.1
    system time 3.73
    elapsed 1.43:57

     
  • In looking at the .config for the linux 2.6.36.2 these are the NFS settings.

    CONFIG_NFS_FS=y
    CONFIG_NFS_V3=y

    CONFIG_NFS_V3_ACL is not set

    CONFIG_NFS_V4=y

    CONFIG_NFS_V4_1 is not set

    CONFIG_NFS_USE_LEGACY_DNS is not set

    CONFIG_NFS_USE_KERNEL_DNS=y

    CONFIG_NFSD is not set

    CONFIG_NFS_COMMON=y

    The ones that are not set list experimental with the config except for the nfsd, which isn't set since the g4l isn't running an nfs server, but perhaps nfs4 might do something with this.

    How are you building the initrd?

     
  • CNTI0O xqcborjldtmj, [url=http://cpuyevbqmjqt.com/]cpuyevbqmjqt[/url], [link=http://qtyfcizwgbas.com/]qtyfcizwgbas[/link], http://vykdtukfdhwx.com/

     
  • The latest comment from nobody doesn't make much sense? It is some kind of links, that don't actually seems to go anywhere. It is my understanding that if one uses just nfs instead of nfs4 the connection to the nfs4 does work.

    If anyone has more info on what might need to be added to get nfs4 connection to work directly rather than having it fall back to nfs3.

    Note: Build system and libraries are from Fedora 14 at the moment with Kernel build from kernel.org source.