Oddjob to centralise job execution

General
2012-08-25
2013-05-28
  • Hi All,

    I came across Oddjob while trying to find something that might help me centrally manage a task I have to perform once per quarter within my organisation.

    Essentially at a given (or scheduled) time I need to stop an application running on 200+ servers so that maintenance activities can be performed. At the moment I'm using the crontab to do this, however as the time of the maintenance changes each quarter I have to go onto each server beforehand and set the crontab accordingly which is very resource intensive!

    What I'd like to do is use Oddjob to store ONE version of a Job either locally or on a central "server" and then execute that Job on 200+ servers. That way I only need to make modifications to ONE job definition each quarter and everything else works automatically.

    I've been able to setup a local client that points to a "master" and "slave" server and I've almost got my "slave" connecting back to the "master" to execute a job - but that's when it dawned on me that this would execute the job on the "master" and not the "slave".

    Another option I thought about was to use Oddjob to FTP an xml file with the job definition to each server (calling it the same name each time), then define one job on each "slave" that uses that xml file. The problem I have is I can only FTP into my own "area" on the box then have to "su" to the main user running the application in order to gain the right permissions.

    Does anyone have any ideas how I can achieve what I want with Oddjob?? Does anyone have any examples of calling a Linux shell script remotely??

    Many thanks in advance for your help it's very much appreciated.

    - Martin

     
  • Rob Gordon
    Rob Gordon
    2012-08-26

    Hi - Here's a very simple example of how to get Oddjob to configure itself using a centrally held configuration:

    first the 'master':

    <oddjob id="oddjob">
        <job>
            <sequential id="root" name="main">
                <jobs>
                    <rmireg id="rmireg" name="RMI Registry"/>
                    <jmx:server id="server" name="Oddjob Server" root="${vars}" url="service:jmx:rmi://ignored/jndi/rmi://localhost/oddjob-server" xmlns:jmx="http://rgordon.co.uk/oddjob/jmx"/>
                    <sequential>
                        <jobs>
                            <variables id="vars">
                                <buffer>
                                    <buffer/>
                                </buffer>
                            </variables>
                            <copy name="Copy File To Buffer">
                                <from>
                                    <file file="my-job.xml"/>
                                </from>
                                <output>
                                    <value value="${vars.buffer}"/>
                                </output>
                            </copy>
                        </jobs>
                    </sequential>
                </jobs>
            </sequential>
        </job>
    </oddjob>

    This simply copies the contents of a file my-job.xml into a buffer that is exposed via the server.

    Now the 'slave':

    <oddjob>
        <job>
            <sequential>
                <jobs>
                    <jmx:client id="client" name="Oddjob Client" url="service:jmx:rmi:///jndi/rmi://localhost/oddjob-server" xmlns:jmx="http://rgordon.co.uk/oddjob/jmx"/>
                    <oddjob>
                        <configuration>
                            <arooa:configuration xmlns:arooa="http://rgordon.co.uk/oddjob/arooa">
                                <xml>
                                    <value value="${client/vars.buffer.text}"/>
                                </xml>
                            </arooa:configuration>
                        </configuration>
                    </oddjob>
                </jobs>
            </sequential>
        </job>
    </oddjob>

    This creates a client that a nested Oddjob uses as the source of it's configuration.

    Another option would be for the slave to ftp the configuration back to itself.

    But could you not use SSH to execute the same command on all the servers?

    Anyway, hope this helps.

    Rob.

     
  • Hi Rob,

    Many thanks for the suggestions and apologies for not replying sooner but I've been on annual leave and have only just found time to look at this again.

    I tried what you suggested and got a very simple 'Helllo World' to appear in the console of my 'slave' on startup, which proves it was reading the configuration from the 'master' - which is excellent!

    Ultimately I'd like to use Oddjob to not only run a job that stops & starts my applications on demand across our 200+ servers but also to centrally manage many other daily jobs that run on each of these boxes, so ideally I'd like one tool that does both jobs and allows me to store & manage the job definitions in a central place rather than on each server.

    So with that in mind I'm trying to figure out the best "topology" to use to set that up.

    For example I'd need to be able to click one button (or run one job) to have the "stop" command run across all servers on demand, and then a "start" to bring everything back up at the desired time (whether that be manually or scheduled). So in this instance I can now see how a 'slave' can read a configuration from a 'master' but what's the best way to trigger it to run at a given time on the 'slave' (preferably from an Oddjob Explorer sitting somewhere else) but still be able to see what's happening on each box.

    I tried to adjust the 'slave' example you provided to register itself as a server and then connect to that from an Oddjob Explorer. It sort of worked and I saw the master and slave servers but I also saw a '…Server loopback detected…" error in the Log of my 'slave' within Oddjob Explorer.

    Here's the code:

    <oddjob id="oddjob">
        <job>
            <sequential id="root" name="main">
                <jobs>
                    <rmireg id="rmireg" name="RMI Registry"/>
                    <jmx:server xmlns:jmx="http://rgordon.co.uk/oddjob/jmx" id="slave1" name="Slave 1" root="${client}" url="service:jmx:rmi://ignored/jndi/rmi://localhost/oddjob-server"/>
            <sequential>
                <jobs>
                <jmx:client id="client" name="Oddjob Client" url="service:jmx:rmi:///jndi/rmi://us-sc4-jobsch1.sc4.niceondemand.com/oddjob-server" xmlns:jmx="http://rgordon.co.uk/oddjob/jmx"/>
                <oddjob>
                    <configuration>
                    <arooa:configuration xmlns:arooa="http://rgordon.co.uk/oddjob/arooa">
                        <xml>
                        <value value="${client/vars.buffer.text}"/>
                        </xml>
                    </arooa:configuration>
                    </configuration>
                </oddjob>
                </jobs>
            </sequential>
                </jobs>
            </sequential>
        </job>
    </oddjob>
    

    In terms of using SSH to execute a command, yes this could indeed be a simpler option as I already have stop/start scripts on each server, however could I achieve this using Oddjob with "execute"?? Do you have any example of calling a shell script on a remote server using Oddjob??

    Thanks,
    Martin

     
  • Rob Gordon
    Rob Gordon
    2012-09-05

    Hi Martin

    Oddjob won't do the distributed adhoc command thing easily. You could have the slaves poll the master and execute when the command changed - but that's not very elegant. You could have the master push the command out to the slaves but that is no better than SSH, except Oddjob could execute the command asynchronously.

    I would publish commands to a JMS Topic and write a Consumer for Oddjob - but this is writing code, Oddjob doesn't have this out of the box.

    With regard to SSH - Oddjob doesn't have this either but it can execute any Ant task, and Ant does - so this is the way to do SSH from Oddjob.

    Have you looked at Puppet? - http://puppetlabs.com/ - I think it does distributed sys admin tasks.

    You have and interesting problem but I can't believe it hasn't been solved already!

    Rob.