On 10/28/09 7:29 PM, sfjro@... wrote:
> Hello michael,
>
> michael rodriguez:
>> I netboot debian linux (etch). I mount the OS filesystem over nfs with
>> aufs. kernel version 2.6.29.6 using the aufs patches. I am seeing
>> situations where processes pile up on aufs read locks. The problem
>> occurs on active web servers with io utilization of about 10-20%. The
>> machine stays responsive and can often recover, but the load spikes into
>> the hundreds.
>>
>> here is a partial kernel stack trace from when the problem is occuring.
>> Is there an easy way to avoid this bottleneck?
>
> I've checked the stacktraces and translated these addresses into symbols
> by simple '#define' and cpp(1).
> It shows that nfs_permission() calls aufs_read_lock(). It is impossible.
> How did you mount aufs (on nfs client), and how did you mount and export
> (on nfs server)?
> The stacktrace might be incorrect which depends upon your kernel
> configuration.
>
> Finally I want you to read the aufs README file, and provide these
> information. While you wrote your kernel is 2.6.29.6, ksymoops says
> 2.6.24.4. Which is correct?
2.6.29.6 is correct, I was just running ksymoops on a different machine,
sorry. Here is the information requested:
- /proc/mounts (instead of the output of mount(8))
rootfs / rootfs rw 0 0
none /mnt/root_base/sys sysfs rw 0 0
none /mnt/root_base/proc proc rw 0 0
udev /mnt/root_base/dev tmpfs rw,size=10240k,mode=755 0 0
10.106.0.5:/vol/boot/netboot/etch64-peon /mnt/root_base nfs
ro,vers=3,rsize=32768,wsize=32768,namlen=255,acregmin=300,acregmax=600,acdirmin=300,acdirmax=600,hard,nointr,nolock,proto=tcp,timeo=7,retrans=3,sec=sys,addr=10.106.0.5
0 0
10.106.0.5:/vol/boot/netboot/etch64-peon /mnt/root_base/dev/.static/dev
nfs
ro,vers=3,rsize=32768,wsize=32768,namlen=255,acregmin=300,acregmax=600,acdirmin=300,acdirmax=600,hard,nointr,nolock,proto=tcp,timeo=7,retrans=3,sec=sys,addr=10.106.0.5
0 0
/dev/sda1 /mnt/local ext3 rw,errors=continue,data=ordered 0 0
none / aufs rw,si=679a245f70c6e951 0 0
tmpfs /lib/init/rw tmpfs rw,nosuid,mode=755 0 0
proc /proc proc rw,nosuid,nodev,noexec 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec 0 0
none /dev/.static/dev aufs rw,si=679a245f70c6e951 0 0
tmpfs /dev tmpfs rw,size=10240k,mode=755 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
devpts /dev/pts devpts rw,nosuid,noexec,gid=5,mode=620 0 0
/dev/sda2 /tmp ext3
rw,noexec,noatime,nodiratime,errors=continue,commit=300,data=ordered 0 0
/dev/sda6 /usr/local/var/spool/cron/crontabs ext3
rw,nosuid,nodev,errors=continue,data=ordered 0 0
/dev/sdb1 /home ext3
rw,nosuid,nodev,noatime,nodiratime,errors=remount-ro,data=writeback 0 0
rpc_pipefs /usr/local/var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
- /sys/module/aufs/*
indians:~# find /sys/module/aufs -name "*" -type f -print -exec cat {} \;
/sys/module/aufs/parameters/brs
1
/sys/module/aufs/parameters/nwkq
4
- /sys/fs/aufs/* (if you have them)
indians:~# find /sys/fs/aufs -name "*" -type f -print -exec cat {} \;
/sys/fs/aufs/si_679a245f70c6e951/xi_path
/mnt/local/.aufs.xino
/sys/fs/aufs/si_679a245f70c6e951/br0
/mnt/local=rw
/sys/fs/aufs/si_679a245f70c6e951/br1
/mnt/root_base=ro
- /debug/aufs/* (if you have them)
Don't have.
- linux kernel version
if your kernel is not plain, for example modified by distributor,
the url where i can download its source is necessary too.
vanilla 2.6.29.y.git tag v2.6.29.6, merged with aufs2-2.6 tag aufs2-29,
patched with
http://blackprecipice.com/dl/grsecurity-2.1.14-2.6.29.6-200908252018.patch
Complete sources with patches applied can be downloaded here:
http://blackprecipice.com/dl/ndn-2.6.29.6-aufs2-grsec-v1.5.tar.bz2
- aufs version which was printed at loading the module or booting the
system, instead of the date you downloaded.
aufs2-29
- configuration (define/undefine CONFIG_AUFS_xxx)
indians:~# zgrep AUFS /proc/config.gz
.CONFIG_AUFS_FS=y
CONFIG_AUFS_BRANCH_MAX_127=y
# CONFIG_AUFS_BRANCH_MAX_511 is not set
# CONFIG_AUFS_BRANCH_MAX_1023 is not set
# CONFIG_AUFS_BRANCH_MAX_32767 is not set
CONFIG_AUFS_HINOTIFY=y
# CONFIG_AUFS_EXPORT is not set
# CONFIG_AUFS_RDU is not set
# CONFIG_AUFS_SHWH is not set
# CONFIG_AUFS_BR_RAMFS is not set
# CONFIG_AUFS_BR_FUSE is not set
# CONFIG_AUFS_DEBUG is not set
CONFIG_AUFS_BDEV_LOOP=y
- kernel configuration or /proc/config.gz (if you have it)
http://blackprecipice.com/dl/dotconfig
- behaviour which you think to be incorrect
I am trying to understand what would cause so many processes to be stuck
waiting.
here is the System.map file:
http://blackprecipice.com/dl/System.map
and the trace of some waiting processes:
http://blackprecipice.com/dl/indians.kernel.trace
Thanks!
michael
http://blackprecipice.com/dl/
|