From: Christopher S. A. <ca...@th...> - 2003-08-07 21:58:28
|
Hello, I have a function written in perl that once initiated, sets up some timers and calls "mconsole cad" to shutdown an UML. It then waits 60 seconds while checking for shutdown. If it's still alive after 60 seconds it then does "mconsole sysrq sync" (waits a few more seconds) then calls "mconsole halt" (waits a few more) and finally resorts to a kill -9. Each "call" is a separate exec of mconsole. I've seen mconsole hang, along with at least one of the UML processes/threads stick around (skas mode). What I suspect is some kind of race with the mconsole "socket" file going away, because this has only occurred while the UML was powering off on its own && my perl function was running. I think what is happening is between the time mconsole initiates and does what I've asked it, the UML has already shut down. Or, perhaps the problem is in the UML kernel itself since I do recall one of the UML's processes hanging, too. Or, the most obvious, perhaps I am the problem. Any thoughts? Thanks, -Chris |
From: Matthew B. <ma...@by...> - 2003-08-08 07:43:26
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, Aug 07, 2003 at 04:56:33PM -0500, Christopher S. Aker wrote: > Hello, > > I have a function written in perl that once initiated, sets up some timers > and calls "mconsole cad" to shutdown an UML. It then waits 60 seconds > while checking for shutdown. If it's still alive after 60 seconds it then > does "mconsole sysrq sync" (waits a few more seconds) then calls "mconsole > halt" (waits a few more) and finally resorts to a kill -9. Each "call" is > a separate exec of mconsole. > > I've seen mconsole hang, along with at least one of the UML > processes/threads stick around (skas mode). What I suspect is some kind > of race with the mconsole "socket" file going away, because this has only > occurred while the UML was powering off on its own && my perl function was > running. Hmm, snap :) I did something similar, but re-implemented the mconsole functionality natively to avoid needing the binary dependency-- part of this re-implementation involves bailing out of waiting for a response from the socket after two seconds and assuming failure; uml_mconsole.c doesn't do this: do { int len = sizeof(sun); n = recvfrom(fd, &reply, sizeof(reply), 0, (struct sockaddr *) &sun, &len); if(n < 0){ perror("recvmsg"); return; } if(reply.err) printf("ERR "); else printf("OK "); printf("%s", reply.data); } while(reply.more); i.e. it just hangs around until it gets a complete response. Given the volatile nature of a shutting-down or crashed UML, and the fact that there's no "connection broken" signal from a UNIX socket, you have no guarantee of a response. You probably want to ensure that every invocation of mconsole has a timeout on it to avoid this kind of problem, and go to killing processes if an mconsole fails for this reason. cheers, - -- Matthew Bloch Bytemark Hosting tel. +44 (0) 8707 455026 http://www.bytemark-hosting.co.uk/ Dedicated Linux hosts from 15ukp ($26) per month -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/Mz7BT2rVDg8aLXQRAv88AJwNa+UIORMBIUIXzUcfLO6uWrW0kgCfdmRW UizpHqbMochmm/z1TXfb8Mc= =T0ED -----END PGP SIGNATURE----- |
From: Jeff D. <jd...@ad...> - 2003-08-08 21:35:32
|
ma...@by... said: > i.e. it just hangs around until it gets a complete response. Given > the volatile nature of a shutting-down or crashed UML, and the fact > that there's no "connection broken" signal from a UNIX socket, you > have no guarantee of a response. It's actually that mconsole is using SOCK_DGRAM rather than SOCK_STREAM. Otherwise, it would get an EOF. This is a possible cause of hangs. For other reasons, we want to know when the UML has disappeared, and there's a patch in my queue that implements that. Jeff |