Re: [Ocamlnet-devel] Netplex issue with VServer PID namespace

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi,

After further investigation, I can reproduce it on a standard Linux kernel.

Here's the gdb backtrace when the controller is stuck (seems fork() fails
at low-level):

(gdb) bt
#0  __lll_lock_wait_private () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1  0x00007f8e37e91eeb in _L_lock_13840 () from
/lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f8e37e8ffb8 in __GI___libc_realloc (oldmem=0x15b2260, bytes=574)
at malloc.c:3025
#3  0x00007f8e37e7f2db in _IO_vasprintf (result_ptr=0x7fff5a19b2f0,
format=<optimized out>,
    args=args@entry=0x7fff5a19b1c8) at vasprintf.c:84
#4  0x00007f8e37e61657 in ___asprintf (string_ptr=string_ptr@entry
=0x7fff5a19b2f0,
    format=format@entry=0x7f8e37f8d830 "%s%s%s:%u: %s%sAssertion `%s'
failed.\n%n") at asprintf.c:35
#5  0x00007f8e37e3cae2 in __assert_fail_base (fmt=0x7f8e37f8d830
"%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x7f8e37f90a38 "({ __typeof (self->tid)
__value; if (sizeof (__value) == 1) asm volatile (\"movb %%fs:%P2,%b0\" :
\"=q\" (__value) : \"0\" (0), \"i\" (__builtin_offsetof (struct pthread,
tid))); else if (sizeof (__value) == "..., file=file@entry=0x7f8e37f90a00
"../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c",
    line=line@entry=141, function=function@entry=0x7f8e37f8b38d
<__PRETTY_FUNCTION__.11207> "__libc_fork")
    at assert.c:57
#6  0x00007f8e37e3cc32 in __GI___assert_fail (
    assertion=0x7f8e37f90a38 "({ __typeof (self->tid) __value; if (sizeof
(__value) == 1) asm volatile (\"movb %%fs:%P2,%b0\" : \"=q\" (__value) :
\"0\" (0), \"i\" (__builtin_offsetof (struct pthread, tid))); else if
(sizeof (__value) == "..., file=0x7f8e37f90a00
"../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c", line=141,
    function=0x7f8e37f8b38d <__PRETTY_FUNCTION__.11207> "__libc_fork") at
assert.c:101
#7  0x00007f8e37ece252 in __libc_fork () at
../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c:141
#8  0x000000000062b7a6 in unix_fork ()
#9  0x00000000004b88b4 in camlNetplex_mp__fun_1577 () at netplex_mp.ml:80
#10 0x00000000004d54ef in camlNetplex_controller__fun_3861 () at
netplex_controller.ml:359
#11 0x00000000004d5dd8 in camlNetplex_controller__fun_3830 () at
netplex_controller.ml:265
#12 0x00000000004c8355 in camlNetplex_workload__fun_2015 () at
netplex_workload.ml:332
#13 0x00000000004c8b06 in camlNetplex_workload__fun_1982 () at
netplex_workload.ml:230
#14 0x00000000004d445e in camlNetplex_controller__fun_4158 () at
netplex_controller.ml:665
#15 0x00000000004d339d in camlNetplex_controller__fun_4275 () at
netplex_controller.ml:896
#16 0x000000000050fbdc in camlRpc_server__protect_1582 () at
rpc_server.ml:504
#17 0x000000000050fbdc in camlRpc_server__protect_1582 () at
rpc_server.ml:504
#18 0x00000000005144fa in camlRpc_server__handle_incoming_message_1710 ()
at rpc_server.ml:889
#19 0x00000000005392ba in camlUq_multiplex__anyway_1042 () at
uq_multiplex.ml:20
#20 0x0000000000531abe in camlUq_multiplex__fun_3594 () at
uq_multiplex.ml:464
#21 0x000000000051b33e in camlUnixqueue_pollset__forward_event_to_1567 ()
at unixqueue_pollset.ml:768
#22 0x0000000000517f88 in camlEqueue__fun_1262 () at equeue.ml:166
#23 0x00000000005eaea9 in camlQueue__iter_1048 () at queue.ml:135
#24 0x0000000000518a02 in camlEqueue__run_1070 () at equeue.ml:159
#25 0x000000000051cdc9 in camlUnixqueue_pollset__fun_3391 () at
unixqueue_pollset.ml:999
#26 0x00000000004e54a4 in camlNetplex_main__run_controller_1077 () at
netplex_main.ml:130
#27 0x00000000004e4348 in camlNetplex_main__fun_1295 () at
netplex_main.ml:312
---Type <return> to continue, or q <return> to quit---
#28 0x00000000004e574d in camlNetplex_main__redirect_logger_1094 () at
netplex_main.ml:187
#29 0x00000000004e4a20 in camlNetplex_main__fun_1287 () at
netplex_main.ml:294
#30 0x000000000043c2f4 in camlServer__entry ()
#31 0x00000000004063e9 in caml_program ()
#32 0x0000000000642934 in caml_start_program ()
#33 0x0000000000630a0a in caml_main ()
#34 0x0000000000630a4c in main ()

Reading over the line 141 of glibc's fork.c, it seems this assert fails:

assert (THREAD_GETMEM (self, tid) != ppid);

This is triggered when the worker process PID in the namespace reaches that
of the ancestor on the host.

Cheers,

Thomas

On Mon, Jun 1, 2015 at 6:10 PM, Thomas Calderon <cal...@gm...>
wrote:

> OK,
>
> I am investigating further, I have some more hints that it might be
> related to the Linux PID namespace implementation.
> I am trying to reproduce the issue outside OCaml/Ocamlnet.
>
> I will keep you posted.
>
> Thanks
>
> On Mon, Jun 1, 2015 at 6:05 PM, Gerd Stolpmann <in...@ge...>
> wrote:
>
>> Just a guess: There is a Unix.getpid call in Netplex_mp. This call
>> returns the PID in the new PID space, and this PID is different from the
>> PID returned by fork() (look into the sources of Netplex_mp, where both
>> is done). I do not remember for what the PID is used, but it is probably
>> a key in a management data structure. Then (and this is the unverified
>> part of my guess), some lookup fails that normally cannot fail, and the
>> controller gets confused.
>>
>> Note that you do not see the getpid() calls in the strace because it is
>> not a real syscall (afaik the kernel just writes the PID into some
>> memory location after fork/clone, where glibc expects it).
>>
>> I don't know whether this is really the problem, but if so, the fix is
>> probably not trivial. The controller would have to tell the container
>> via the control socket what the PID from the view of the controller is;
>> or via the pipe that is used inside Netplex_mp for synchronization.
>>
>> The restart_syscall thing you observed is just a poll waiting for an
>> event. strace just doesn't print it cleanly.
>>
>> Gerd
>>
>> Am Montag, den 01.06.2015, 13:03 +0200 schrieb Thomas Calderon:
>> > Hello Gerd,
>> >
>> >
>> > I do not think I reach the limit of the maximum number of processes
>> > since I have at most 3 defunct processes.
>> > I would also be likely to see some other message indicating I reached
>> > this limit (GrSecurity would leave a trace).
>> >
>> >
>> > When attaching to stalled instances, the controller and worker
>> > instances (except one) are blocked on :
>> >         restart_syscall(<... resuming interrupted call ...>
>> >
>> >
>> > As mentioned, one of the worker process is blocked on:
>> >         futex(0x...., FUTEX_WAIT_PRIVATE, 2, NULL
>> >
>> >
>> > You will find the strace -f as an attachment.
>> >
>> >
>> >
>> > Cheers,
>> >
>> >
>> > Thomas
>> >
>> > On Mon, Jun 1, 2015 at 12:13 PM, Gerd Stolpmann
>> > <in...@ge...> wrote:
>> >         Am Montag, den 01.06.2015, 11:06 +0200 schrieb Thomas
>> >         Calderon:
>> >         > Hi,
>> >         >
>> >         >
>> >         > We are observing an issue when using OCamlnet netplex in
>> >         combination
>> >         > with VServer PID namespaces.
>> >         > We are using Netplex in the multi-process mode.
>> >         >
>> >         >
>> >         > Here is what we are doing:
>> >         >   - start our netplex controller
>> >         >     - use the post_add_hook to enter a new PID namespace
>> >         >   - use dynamic workload manager to spawn child workers
>> >         >     - configured with conn_limit=1
>> >         >
>> >         >
>> >         >   - launch a loop of client connections
>> >         >     - this spawns a new worker process for each connection
>> >         >
>> >         >
>> >         > After several successful connections of the loop, clients
>> >         cannot
>> >         > connect anymore.
>> >         > We observe some worker processes in a defunct/zombie state.
>> >         > The controller and running worker processes seem deadlocked
>> >         in some
>> >         > condition.
>> >         >
>> >         >
>> >         > When we do not use the post_add_hook to enter a new PID
>> >         namespace, the
>> >         > problem cannot be triggered anymore.
>> >         >
>> >         >
>> >         > Do you have any hint on this?
>> >
>> >
>> >         The controller runs of course waitpid() on the terminated
>> >         processes to
>> >         un-zombie these, and obviously this does not work. I guess you
>> >         reach
>> >         then the maximum number of processes after some time.
>> >
>> >         You say "vServer" but there are several such technologies
>> >         (Linux
>> >         containers, Virtuozzo, maybe some derived products). I also
>> >         don't know
>> >         much about this corner of the OS.
>> >
>> >         What would definitely help is an strace -f of the server.
>> >
>> >         Gerd
>> >
>> >
>> >
>> >         >
>> >         >
>> >         > Many thanks.
>> >         >
>> >         >
>> >         > Thomas
>> >
>> >         --
>> >         ------------------------------------------------------------
>> >         Gerd Stolpmann, Darmstadt, Germany    ge...@ge...
>> >         My OCaml site:          http://www.camlcity.org
>> >         Contact details:        http://www.camlcity.org/contact.html
>> >         Company homepage:       http://www.gerd-stolpmann.de
>> >         ------------------------------------------------------------
>> >
>> >
>> >
>>
>> --
>> ------------------------------------------------------------
>> Gerd Stolpmann, Darmstadt, Germany    ge...@ge...
>> My OCaml site:          http://www.camlcity.org
>> Contact details:        http://www.camlcity.org/contact.html
>> Company homepage:       http://www.gerd-stolpmann.de
>> ------------------------------------------------------------
>>
>>
>