|
From: Vallevand, M. K <Mar...@UN...> - 2014-09-29 14:42:17
|
Valgrind meet containers. Containers meet valgrind. I've found what lxc doesn't like when running valgrind. The lxc_start() checks to see if there are extra file descriptors open and won't call __lxc_start(). vdr1: inherited fd 1024 on /home/vallevand/trunk_s4m/s4m-appliance/src/vdrd/vgVdrTest vdr1: inherited fd 1025 on /tmp/valgrind_proc_24989_cmdline_4fbfb9a5 (deleted)VdrTest vdr1: inherited fd 1026 on /dev/pts/1ind_proc_24989_cmdline_4fbfb9a5 (deleted)VdrTest vdr1: inherited fd 1027 on pipe:[768863]_proc_24989_cmdline_4fbfb9a5 (deleted)VdrTest vdr1: inherited fd 1028 on pipe:[768863]_proc_24989_cmdline_4fbfb9a5 (deleted)VdrTest Vdr1 is the name of my container. All those open files in the child process are related to valgrind. If I call __lxc_start() rather than lxc_start(), I see this: vdr1: sync wake failure : Broken pipe vdr1: failed to spawn 'vdr1' And, just before that there is some complaining from valgrind: ==25086== Syscall param clone(child_tidptr) contains uninitialised byte(s) ==25086== at 0x56622E1: clone (clone.S:84) ==25086== by 0x4E3BD38: __lxc_start (in /usr/lib/lxc/liblxc.so.0.7.5) ==25086== by 0x4014C9: vgVdrStartClone (vgVdrTest.c:88) ==25086== by 0x400F0A: main (vgVdrTest.c:337) ==25086== ==1== Syscall param wait4(status) points to unaddressable byte(s) ==1== at 0x53607C4: wait (wait.c:32) ==1== by 0x4E3A400: ??? (in /usr/lib/lxc/liblxc.so.0.7.5) ==1== by 0x566231C: clone (clone.S:112) ==1== Address 0xffffffffffffffd4 is not stack'd, malloc'd or (recently) free'd ==1== ==1== Invalid write of size 4 ==1== at 0x4E3A4FF: ??? (in /usr/lib/lxc/liblxc.so.0.7.5) ==1== by 0x566231C: clone (clone.S:112) ==1== Address 0xffffffffffffffc0 is not stack'd, malloc'd or (recently) free'd ==1== ==1== ==1== Process terminating with default action of signal 11 (SIGSEGV) ==1== Access not within mapped region at address 0xFFFFFFFFFFFFFFC0 ==1== at 0x4E3A4FF: ??? (in /usr/lib/lxc/liblxc.so.0.7.5) ==1== by 0x566231C: clone (clone.S:112) Our program is designed to close all open file descriptors in the child process before calling lxc_start(). That code can try to close all file descriptors to make sure something doesn't sneak through. However, closing the file descriptors associated with valgrind does not work. I get errno=0 Bad File Descriptor. Valgrind really has them held open. I am running as root in all these tests. I've also reproduced the problem using the 'lxc-' programs. If you do something like 'lxc-create -n XXX' and then something like 'valgrind lxc-start -n XXX -- ls' you'll see it. Well, the flavor of the error with open file descriptors. My hopes aren't high, but any ideas are very welcome. Regards. Mark K Vallevand "If there are no dogs in Heaven, then when I die I want to go where they went." -Will Rogers THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. From: lxc-users [mailto:lxc...@li...] On Behalf Of Vallevand, Mark K Sent: Thursday, September 25, 2014 09:19 AM To: lxc...@li... Subject: [lxc-users] Using valgrind with lxc In our program, we do a fork() and in the child process the lxc library is called to start a program in a container using lxc_start(). We don't care about valgrind in the child process. You can disable valgrind messages from child processes, but you cannot detach valgrind unless you exec() a new binary on top. However, valgrind and lxc do not play nicely, at least with the versions in Ubuntu 12.04 LTS. I'm getting an error back from lxc_start(). I'm having trouble getting logs to see why its failing, so I don't know exactly what's failing, yet. But, I'm looking for any ideas for getting valgrind to work with programs that use lxc_start(). Any suggestions will be welcome. And, thanks! Regards. Mark K Vallevand "If there are no dogs in Heaven, then when I die I want to go where they went." -Will Rogers THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. |
|
From: Vallevand, M. K <Mar...@UN...> - 2014-10-01 21:25:36
|
I did this by calling __lxc_start(). So, lxc_check_inherited() didn't get called. That was this: > If I call __lxc_start() rather than lxc_start(), I see this: > vdr1: sync wake failure : Broken pipe > vdr1: failed to spawn 'vdr1' > And, just before that there is some complaining from valgrind: > ==25086== Syscall param clone(child_tidptr) contains uninitialised byte(s) > ==25086== at 0x56622E1: clone (clone.S:84) > ==25086== by 0x4E3BD38: __lxc_start (in /usr/lib/lxc/liblxc.so.0.7.5) > ==25086== by 0x4014C9: vgVdrStartClone (vgVdrTest.c:88) > ==25086== by 0x400F0A: main (vgVdrTest.c:337) > ==25086== > ==1== Syscall param wait4(status) points to unaddressable byte(s) > ==1== at 0x53607C4: wait (wait.c:32) > ==1== by 0x4E3A400: ??? (in /usr/lib/lxc/liblxc.so.0.7.5) > ==1== by 0x566231C: clone (clone.S:112) > ==1== Address 0xffffffffffffffd4 is not stack'd, malloc'd or (recently) free'd > ==1== > ==1== Invalid write of size 4 > ==1== at 0x4E3A4FF: ??? (in /usr/lib/lxc/liblxc.so.0.7.5) > ==1== by 0x566231C: clone (clone.S:112) > ==1== Address 0xffffffffffffffc0 is not stack'd, malloc'd or (recently) free'd > ==1== > ==1== > ==1== Process terminating with default action of signal 11 (SIGSEGV) > ==1== Access not within mapped region at address 0xFFFFFFFFFFFFFFC0 > ==1== at 0x4E3A4FF: ??? (in /usr/lib/lxc/liblxc.so.0.7.5) > ==1== by 0x566231C: clone (clone.S:112) Regards. Mark K Vallevand "If there are no dogs in Heaven, then when I die I want to go where they went." -Will Rogers THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. -----Original Message----- From: lxc-users [mailto:lxc...@li...] On Behalf Of Serge Hallyn Sent: Wednesday, October 01, 2014 04:18 PM To: LXC users mailing-list Cc: val...@li... Subject: Re: [lxc-users] Using valgrind with lxc Hi, For the sake of testing I'd go ahead and just 'return 0' at the top of lxc_check_inherited. We can talk about adding an option to do this, i.e. lxc.close_all_fds = -1 maybe. It's a very rare case where that should be done, though. -serge Quoting Vallevand, Mark K (Mar...@UN...): > Valgrind meet containers. > Containers meet valgrind. > > I've found what lxc doesn't like when running valgrind. > > The lxc_start() checks to see if there are extra file descriptors open and won't call __lxc_start(). > vdr1: inherited fd 1024 on /home/vallevand/trunk_s4m/s4m-appliance/src/vdrd/vgVdrTest > vdr1: inherited fd 1025 on /tmp/valgrind_proc_24989_cmdline_4fbfb9a5 (deleted)VdrTest > vdr1: inherited fd 1026 on /dev/pts/1ind_proc_24989_cmdline_4fbfb9a5 (deleted)VdrTest > vdr1: inherited fd 1027 on pipe:[768863]_proc_24989_cmdline_4fbfb9a5 (deleted)VdrTest > vdr1: inherited fd 1028 on pipe:[768863]_proc_24989_cmdline_4fbfb9a5 (deleted)VdrTest > > Vdr1 is the name of my container. All those open files in the child process are related to valgrind. > > If I call __lxc_start() rather than lxc_start(), I see this: > vdr1: sync wake failure : Broken pipe > vdr1: failed to spawn 'vdr1' > And, just before that there is some complaining from valgrind: > ==25086== Syscall param clone(child_tidptr) contains uninitialised byte(s) > ==25086== at 0x56622E1: clone (clone.S:84) > ==25086== by 0x4E3BD38: __lxc_start (in /usr/lib/lxc/liblxc.so.0.7.5) > ==25086== by 0x4014C9: vgVdrStartClone (vgVdrTest.c:88) > ==25086== by 0x400F0A: main (vgVdrTest.c:337) > ==25086== > ==1== Syscall param wait4(status) points to unaddressable byte(s) > ==1== at 0x53607C4: wait (wait.c:32) > ==1== by 0x4E3A400: ??? (in /usr/lib/lxc/liblxc.so.0.7.5) > ==1== by 0x566231C: clone (clone.S:112) > ==1== Address 0xffffffffffffffd4 is not stack'd, malloc'd or (recently) free'd > ==1== > ==1== Invalid write of size 4 > ==1== at 0x4E3A4FF: ??? (in /usr/lib/lxc/liblxc.so.0.7.5) > ==1== by 0x566231C: clone (clone.S:112) > ==1== Address 0xffffffffffffffc0 is not stack'd, malloc'd or (recently) free'd > ==1== > ==1== > ==1== Process terminating with default action of signal 11 (SIGSEGV) > ==1== Access not within mapped region at address 0xFFFFFFFFFFFFFFC0 > ==1== at 0x4E3A4FF: ??? (in /usr/lib/lxc/liblxc.so.0.7.5) > ==1== by 0x566231C: clone (clone.S:112) > > Our program is designed to close all open file descriptors in the child process before calling lxc_start(). That code can try to close all file descriptors to make sure something doesn't sneak through. However, closing the file descriptors associated with valgrind does not work. I get errno=0 Bad File Descriptor. Valgrind really has them held open. I am running as root in all these tests. > > I've also reproduced the problem using the 'lxc-' programs. If you do something like 'lxc-create -n XXX' and then something like 'valgrind lxc-start -n XXX -- ls' you'll see it. Well, the flavor of the error with open file descriptors. > > My hopes aren't high, but any ideas are very welcome. > > Regards. > Mark K Vallevand > "If there are no dogs in Heaven, then when I die I want to go where they went." > -Will Rogers > > THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. > From: lxc-users [mailto:lxc...@li...] On Behalf Of Vallevand, Mark K > Sent: Thursday, September 25, 2014 09:19 AM > To: lxc...@li... > Subject: [lxc-users] Using valgrind with lxc > > In our program, we do a fork() and in the child process the lxc library is called to start a program in a container using lxc_start(). > > We don't care about valgrind in the child process. You can disable valgrind messages from child processes, but you cannot detach valgrind unless you exec() a new binary on top. However, valgrind and lxc do not play nicely, at least with the versions in Ubuntu 12.04 LTS. I'm getting an error back from lxc_start(). I'm having trouble getting logs to see why its failing, so I don't know exactly what's failing, yet. > > But, I'm looking for any ideas for getting valgrind to work with programs that use lxc_start(). > Any suggestions will be welcome. And, thanks! > > > Regards. > Mark K Vallevand > "If there are no dogs in Heaven, then when I die I want to go where they went." > -Will Rogers > > THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. > _______________________________________________ > lxc-users mailing list > lxc...@li... > http://lists.linuxcontainers.org/listinfo/lxc-users _______________________________________________ lxc-users mailing list lxc...@li... http://lists.linuxcontainers.org/listinfo/lxc-users |
|
From: Vallevand, M. K <Mar...@UN...> - 2014-10-02 12:44:52
|
From my test program, which is trying to recreate the issue we see when running valgrind against our application. Our application, running on Ubuntu 12.04 LTS, is a complicated control program that is having a memory leak or corruption. The program manages multiple containers which are running a mature program that doesn't need any valgrinding. The program does a fork() and in the child process the lxc library is called to start the mature program in a container using lxc_start(). My test program is a very simple thing that just does an lxc_start() or __lxc_start() against an existing container. (The __lxc_start() and some supporting code were copied into my test program and compiled there. It was simpler. But, it's still __lxc_start(). ) Regards. Mark K Vallevand "If there are no dogs in Heaven, then when I die I want to go where they went." -Will Rogers THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. -----Original Message----- From: lxc-users [mailto:lxc...@li...] On Behalf Of Serge Hallyn Sent: Wednesday, October 01, 2014 04:59 PM To: LXC users mailing-list Cc: val...@li... Subject: Re: [lxc-users] Using valgrind with lxc You called __lxc_start() from where, how? Quoting Vallevand, Mark K (Mar...@UN...): > I did this by calling __lxc_start(). So, lxc_check_inherited() didn't get called. That was this: > > If I call __lxc_start() rather than lxc_start(), I see this: > > vdr1: sync wake failure : Broken pipe > > vdr1: failed to spawn 'vdr1' > > And, just before that there is some complaining from valgrind: > > ==25086== Syscall param clone(child_tidptr) contains uninitialised byte(s) lxc uses the libc clone wrappers and does not pass in a tidptr... > > ==25086== at 0x56622E1: clone (clone.S:84) > > ==25086== by 0x4E3BD38: __lxc_start (in /usr/lib/lxc/liblxc.so.0.7.5) > > ==25086== by 0x4014C9: vgVdrStartClone (vgVdrTest.c:88) > > ==25086== by 0x400F0A: main (vgVdrTest.c:337) > > ==25086== > > ==1== Syscall param wait4(status) points to unaddressable byte(s) > > ==1== at 0x53607C4: wait (wait.c:32) > > ==1== by 0x4E3A400: ??? (in /usr/lib/lxc/liblxc.so.0.7.5) > > ==1== by 0x566231C: clone (clone.S:112) > > ==1== Address 0xffffffffffffffd4 is not stack'd, malloc'd or (recently) free'd Would help to see file:line in the lxc code (as would using a newer lxc :) > > ==1== Invalid write of size 4 > > ==1== at 0x4E3A4FF: ??? (in /usr/lib/lxc/liblxc.so.0.7.5) > > ==1== by 0x566231C: clone (clone.S:112) > > ==1== Address 0xffffffffffffffc0 is not stack'd, malloc'd or (recently) free'd > > ==1== > > ==1== > > ==1== Process terminating with default action of signal 11 (SIGSEGV) > > ==1== Access not within mapped region at address 0xFFFFFFFFFFFFFFC0 > > ==1== at 0x4E3A4FF: ??? (in /usr/lib/lxc/liblxc.so.0.7.5) > > ==1== by 0x566231C: clone (clone.S:112) > > > Regards. > Mark K Vallevand > > "If there are no dogs in Heaven, then when I die I want to go where they went." > -Will Rogers > > THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. > > > -----Original Message----- > From: lxc-users [mailto:lxc...@li...] On Behalf Of Serge Hallyn > Sent: Wednesday, October 01, 2014 04:18 PM > To: LXC users mailing-list > Cc: val...@li... > Subject: Re: [lxc-users] Using valgrind with lxc > > Hi, > > For the sake of testing I'd go ahead and just 'return 0' at the > top of lxc_check_inherited. > > We can talk about adding an option to do this, i.e. > lxc.close_all_fds = -1 maybe. It's a very rare case where > that should be done, though. > > -serge > > Quoting Vallevand, Mark K (Mar...@UN...): > > Valgrind meet containers. > > Containers meet valgrind. > > > > I've found what lxc doesn't like when running valgrind. > > > > The lxc_start() checks to see if there are extra file descriptors open and won't call __lxc_start(). > > vdr1: inherited fd 1024 on /home/vallevand/trunk_s4m/s4m-appliance/src/vdrd/vgVdrTest > > vdr1: inherited fd 1025 on /tmp/valgrind_proc_24989_cmdline_4fbfb9a5 (deleted)VdrTest > > vdr1: inherited fd 1026 on /dev/pts/1ind_proc_24989_cmdline_4fbfb9a5 (deleted)VdrTest > > vdr1: inherited fd 1027 on pipe:[768863]_proc_24989_cmdline_4fbfb9a5 (deleted)VdrTest > > vdr1: inherited fd 1028 on pipe:[768863]_proc_24989_cmdline_4fbfb9a5 (deleted)VdrTest > > > > Vdr1 is the name of my container. All those open files in the child process are related to valgrind. > > > > If I call __lxc_start() rather than lxc_start(), I see this: > > vdr1: sync wake failure : Broken pipe > > vdr1: failed to spawn 'vdr1' > > And, just before that there is some complaining from valgrind: > > ==25086== Syscall param clone(child_tidptr) contains uninitialised byte(s) > > ==25086== at 0x56622E1: clone (clone.S:84) > > ==25086== by 0x4E3BD38: __lxc_start (in /usr/lib/lxc/liblxc.so.0.7.5) > > ==25086== by 0x4014C9: vgVdrStartClone (vgVdrTest.c:88) > > ==25086== by 0x400F0A: main (vgVdrTest.c:337) > > ==25086== > > ==1== Syscall param wait4(status) points to unaddressable byte(s) > > ==1== at 0x53607C4: wait (wait.c:32) > > ==1== by 0x4E3A400: ??? (in /usr/lib/lxc/liblxc.so.0.7.5) > > ==1== by 0x566231C: clone (clone.S:112) > > ==1== Address 0xffffffffffffffd4 is not stack'd, malloc'd or (recently) free'd > > ==1== > > ==1== Invalid write of size 4 > > ==1== at 0x4E3A4FF: ??? (in /usr/lib/lxc/liblxc.so.0.7.5) > > ==1== by 0x566231C: clone (clone.S:112) > > ==1== Address 0xffffffffffffffc0 is not stack'd, malloc'd or (recently) free'd > > ==1== > > ==1== > > ==1== Process terminating with default action of signal 11 (SIGSEGV) > > ==1== Access not within mapped region at address 0xFFFFFFFFFFFFFFC0 > > ==1== at 0x4E3A4FF: ??? (in /usr/lib/lxc/liblxc.so.0.7.5) > > ==1== by 0x566231C: clone (clone.S:112) > > > > Our program is designed to close all open file descriptors in the child process before calling lxc_start(). That code can try to close all file descriptors to make sure something doesn't sneak through. However, closing the file descriptors associated with valgrind does not work. I get errno=0 Bad File Descriptor. Valgrind really has them held open. I am running as root in all these tests. > > > > I've also reproduced the problem using the 'lxc-' programs. If you do something like 'lxc-create -n XXX' and then something like 'valgrind lxc-start -n XXX -- ls' you'll see it. Well, the flavor of the error with open file descriptors. > > > > My hopes aren't high, but any ideas are very welcome. > > > > Regards. > > Mark K Vallevand > > "If there are no dogs in Heaven, then when I die I want to go where they went." > > -Will Rogers > > > > THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. > > From: lxc-users [mailto:lxc...@li...] On Behalf Of Vallevand, Mark K > > Sent: Thursday, September 25, 2014 09:19 AM > > To: lxc...@li... > > Subject: [lxc-users] Using valgrind with lxc > > > > In our program, we do a fork() and in the child process the lxc library is called to start a program in a container using lxc_start(). > > > > We don't care about valgrind in the child process. You can disable valgrind messages from child processes, but you cannot detach valgrind unless you exec() a new binary on top. However, valgrind and lxc do not play nicely, at least with the versions in Ubuntu 12.04 LTS. I'm getting an error back from lxc_start(). I'm having trouble getting logs to see why its failing, so I don't know exactly what's failing, yet. > > > > But, I'm looking for any ideas for getting valgrind to work with programs that use lxc_start(). > > Any suggestions will be welcome. And, thanks! > > > > > > Regards. > > Mark K Vallevand > > "If there are no dogs in Heaven, then when I die I want to go where they went." > > -Will Rogers > > > > THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. > > > _______________________________________________ > > lxc-users mailing list > > lxc...@li... > > http://lists.linuxcontainers.org/listinfo/lxc-users > > _______________________________________________ > lxc-users mailing list > lxc...@li... > http://lists.linuxcontainers.org/listinfo/lxc-users > _______________________________________________ > lxc-users mailing list > lxc...@li... > http://lists.linuxcontainers.org/listinfo/lxc-users _______________________________________________ lxc-users mailing list lxc...@li... http://lists.linuxcontainers.org/listinfo/lxc-users |
|
From: Tom H. <to...@co...> - 2014-09-29 15:07:45
|
On 29/09/14 15:41, Vallevand, Mark K wrote: > Our program is designed to close all open file descriptors in the child > process before calling lxc_start(). That code can try to close all file > descriptors to make sure something doesn’t sneak through. However, > closing the file descriptors associated with valgrind does not work. I > get errno=0 Bad File Descriptor. Valgrind really has them held open. I > am running as root in all these tests. Yes, we refuse to let them be closed because that would, for example, break logging as it would close our log stream. We do however also lie when asked with getrlimit how many file descriptors there are, so lxc is obviously just guessing some high upper limit rather than actually asking what the limit is. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: John R. <jr...@Bi...> - 2014-09-29 15:13:13
|
> Valgrind meet containers. > > Containers meet valgrind. Unless the container implementation has a way to "whitelist" some open fd then the container mechanism is to blame for not providing functionality that is reasonably required for debugging. If the container mechanism does have such a "fd whitelist" then *you* should figure out how valgrind can use it, and submit the patches (or documentation, strategy, etc.) to valgrind. Either way, please visit https://bugs.kde.org/enter_bug.cgi?product=valgrind and enter a report so that the issue does not get lost. Specify which container environment, etc. Give enough details so that somebody else can reproduce your observations starting "from scratch". Then in the short run, the container mechanism probably has the attitude "I provide an encapsulated, protected environment for running ["mature"] programs", so you should run valgrind in an environment that is more friendly to debugging. [Get another x86_64 computer and run Linux on it.] |