From https://bugs.launchpad.net/ubuntu/+source/aufs-tools/+bug/1442892/
Core was generated by `auplink /var/lib/docker/aufs/mnt/94c22127479d09c2ba17287431da3a794c8ecd48a1898c'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 ftw_startup (
dir=dir@entry=0x1fd2010 "/var/lib/docker/aufs/mnt/94c22127479d09c2ba17287431da3a794c8ecd48a1898ce8615e36c197691057", is_nftw=is_nftw@entry=1, func=func@entry=0x40140a <ftw_cpup>, descriptors=1048566, flags=flags@entry=19)
at ../sysdeps/wordsize-64/../../io/ftw.c:656
656 ../sysdeps/wordsize-64/../../io/ftw.c: No such file or directory.
(gdb) bt
#0 ftw_startup (
dir=dir@entry=0x1fd2010 "/var/lib/docker/aufs/mnt/94c22127479d09c2ba17287431da3a794c8ecd48a1898ce8615e36c197691057", is_nftw=is_nftw@entry=1, func=func@entry=0x40140a <ftw_cpup>, descriptors=1048566, flags=flags@entry=19)
at ../sysdeps/wordsize-64/../../io/ftw.c:656
#1 0x00007fbe94e7050a in __new_nftw (
path=path@entry=0x1fd2010 "/var/lib/docker/aufs/mnt/94c22127479d09c2ba17287431da3a794c8ecd48a1898ce8615e36c197691057", func=func@entry=0x40140a <ftw_cpup>, descriptors=<optimized out>, flags=flags@entry=19)
at ../sysdeps/wordsize-64/../../io/ftw.c:859
#2 0x0000000000401cfa in do_plink (br=<optimized out>, nbr=<optimized out>, cmd=0,
cwd=0x1fd2010 "/var/lib/docker/aufs/mnt/94c22127479d09c2ba17287431da3a794c8ecd48a1898ce8615e36c197691057")
at plink.c:303
#3 au_plink (
cwd=0x1fd2010 "/var/lib/docker/aufs/mnt/94c22127479d09c2ba17287431da3a794c8ecd48a1898ce8615e36c197691057",
cmd=cmd@entry=0, flags=flags@entry=1, fd=fd@entry=0x0) at plink.c:364
#4 0x0000000000401356 in main (argc=<optimized out>, argv=0x7ffc39d91948) at auplink.c:64
There's a crash when docker uses auplink.
Crash occurs in function ftw_startup of glibc, file io/ftw.c, function ftw_startup.
Line is memset (data.dirstreams, '\0', data.maxdir * sizeof (struct dir_data *));
data.maxdir is 1048566 and data.dirstreams is result of alloca with 1048566 * sizeof (pointer), i.e. 8 Mebibytes.
So alloca allocates 8 mebibytes from the stack, my stack size is only 8 mebibytes. Hence it overflows the stack and crashes.
Looking at "man nftw", setting nopenfd to such a huge number makes not much sense:
"nopenfd specifies the maximum number of directories that nftw() will hold open simultaneously. When the search depth exceeds this, nftw() will become slower because directories have to be closed and reopened. nftw() uses at most one file descriptor for each level in the directory tree."
Why would I want to keep a million directories open?
aufs-tools's plink.c calculates this value:
getrlimit(RLIMIT_NOFILE, &rlim);
rlim.rlim_cur - 10
So it would mean that RLIMIT_NOFILE is 1048576. Looks a bit high, but not fully unreasonable. I assume docker sets this value so the problem occurs only in docker.
One solution is to set a reasonable upper limit. How about 1024?
Attached patch works for debian/jessie.
Hello jheissler,
"jheissler":
Thanx for the report and the patch.
I am surprised to know that docker sets so huge number (1 mega) to
RLIMIT_NOFILE.
Here is a patch from me which is refined from yours. Is it good for you?
diff --git a/plink.c b/plink.c
index b8891bb..adacb91 100644
--- a/plink.c
+++ b/plink.c
@@ -249,6 +249,7 @@ static int do_plink(char cwd, int cmd, int nbr, union aufs_brinfo brinfo)
{
int err, i, l;
struct rlimit rlim;
__nftw_func_t func;
char *p;
@@ -296,7 +297,13 @@ static int do_plink(char cwd, int cmd, int nbr, union aufs_brinfo brinfo)
err = getrlimit(RLIMIT_NOFILE, &rlim);
if (err)
AuFin("getrlimit");
+
FTW_PHYS | FTW_MOUNT | FTW_ACTIONRETVAL);
/ ignore /
J. R. Okajima
Hi sfjro,
I forgot to lookup what "AuFin" is doing. It doesn't always terminate the program. So I guess not using it here is better. Errors because of too few file descriptors should be caught elsewhere anyway.
What I don't like is writing into rlim. What's wrong with using a new variable?
Hmm, I am afraid that I don't fully understand you.
"jheissler":
AuFin here, must terminate the program. Needless to say, if getrlimit(2)
doesn't set the correct value to errno in error cases, then AuFin won't
terminate. But this is a problem other than auplink.
What is your point? Are you worrying about the type of the variable?
Do you mean
?
Maybe you are right, although it won't be a problem in our real world I
guess.
Anyway here is my latest fix.
J. R. Okajima
commit a4bb87bbb4d0a0b2eb45a8433f0742759a2c4db1
Author: J. R. Okajima hooanon05g@gmail.com
Date: Mon Jul 11 01:56:53 2016 +0900
diff --git a/plink.c b/plink.c
index b8891bb..c0ed303 100644
--- a/plink.c
+++ b/plink.c
@@ -1,5 +1,5 @@
/*
*
@@ -247,8 +247,9 @@ void au_clean_plink(void)
static int do_plink(char cwd, int cmd, int nbr, union aufs_brinfo brinfo)
{
struct rlimit rlim;
__nftw_func_t func;
char *p;
@@ -296,7 +297,14 @@ static int do_plink(char cwd, int cmd, int nbr, union aufs_brinfo brinfo)
err = getrlimit(RLIMIT_NOFILE, &rlim);
if (err)
AuFin("getrlimit");
FTW_PHYS | FTW_MOUNT | FTW_ACTIONRETVAL);
/ ignore /
My point is that rlim contains a specific value, the maximum number of open files.
I would treat this variable as a constant and not change it. If I want a new value, I use a new variable. There's no technical reason for it, it's my coding style.
Current code looks fine to me. Thanks!