#2508 zoneadm boot should use ctrun

1.270
closed
5
2006-06-07
2006-05-22
Mike Gerdts
No

I upgraded webmin from the version that ships with
Solaris 10 to 1.270, now I see this bug...

If the following sequence happens, a zone will get
stuck in the "shutting_down" state:

1) Boot a zone with webmin
2) svcadm restart webmin
3) zlogin <zone> halt

This is caused because:

1) When webmin issues the zoneadm -z <zone> boot, the
zones zoneadmd process is in the same contract as the
webmin service. This can be confirmed with "svcs -p
webmin"
2) When webmin restarts (svcadm restart webmin), the
contract associated with the webmin service is killed.
This causes the zone's zoneadmd to get a SIGKILL.
3) When a halt (and presumably reboot, shutdown, init
0, etc.) are issued from within the zone, the zoneadmd
does not do some cleanup that it should do. As such,
the kernel thread associated with the zone's zched
process is stuck waiting for an event that never happens.

There are two bugs:

1) webmin should use ctrun(1) when booting the zones.
See /lib/svc/method/svc-zones on a Solaris 10 box for
an example.
2) Solaris has a kernel, zoneadmd, or zsched bug. This
is being raised with Sun.

If you are in this state and need to recover without a
reboot, issue the following command from the global zone:

/usr/lib/zones/zoneadmd -z <zone>

Discussion

  • Jamie Cameron

    Jamie Cameron - 2006-06-07

    Logged In: YES
    user_id=129364

    Using ctrun seems reasonable - I will do this in the next
    Webmin release. Thanks for the bug report .. I had no idea
    Solaris contexts worked like this.

     
  • Jamie Cameron

    Jamie Cameron - 2006-06-07
    • status: open --> closed
     
  • Mike Gerdts

    Mike Gerdts - 2006-06-08

    Logged In: YES
    user_id=119417

    This likely represents an entire class of problems with
    Solaris 10: If webmin is to start a long running process
    that should keep running when webmin is restarted, the long
    running process should be started in a new contract.

    FWIW, my conversations with Sun have led to them deciding
    that there is a problem with zoneadmd as well. The
    following bug has been filed against Solaris by Sun.

    6431807 zoneadmd should daemonize itself into a new contract

    The following code *seems* to fix the problem in zoneadmd:

    Index: zoneadmd.c

    --- zoneadmd.c (revision 247)
    +++ zoneadmd.c (working copy)
    @@ -1051,6 +1051,8 @@
    zlog_t errlog;
    zlog_t *zlogp;

    + int ctfd;
    +
    progname = get_execbasename(argv[0]);

    /*
    @@ -1205,6 +1207,12 @@
    (void) sigaddset(&block_cld, SIGCHLD);
    (void) sigprocmask(SIG_BLOCK, &block_cld, NULL);

    + if ( (ctfd = init_template()) == -1 ) {
    + zerror(zlogp, B_TRUE, "failed to create contract");
    + return(1);
    + }
    + close(ctfd);
    +
    /*
    * Do not let another thread localize a message while we
    are forking.
    */

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks