Hi all,

Now you all vote for Shinken at http://www.linuxquestions.org/questions/2011-linuxquestions-org-members-choice-awards-95/network-monitoring-application-of-the-year-919908/ it's time to see a bit more the next big thing, the service packs.

You can see them as services that will be attached to host templates. So the more packs we got by default, the easier it is for the beginners, and so for the project future :)

But how can we create such "packs"? In fact it's quite easy. It's divided in some parts :
* the discovery part, only if you want that a discovery phase "auto-tag" hosts
* the host template definition
* the commands used for theses checks
* the macros if need
* the services really checked :)

We will take an example of a full pack creation. The goal is to create a new pack for the Netapp appliances. They can be detected by NMap and already got a plugin for them (check_netapp2). So here we go :)

Phase 1 : Nmap scan
The goal is to create a discovery rule. The end user discovery will use nmap (and others, by nmap is by default) called by the nmap_discovery_runner.py script. So to know what to "filter" in a discovery rule, we need to reproduce what this plugin will output. We launch a nmap scan like this script will do, and dump the result in a xml file :

$ nmap -T4 -O -oX /tmp/netapp.xml nasip

Starting Nmap 5.00 ( http://nmap.org ) at 2012-01-10 13:27 CET
Not shown: 844 closed ports, 144 filtered ports
22/tcp    open  ssh
80/tcp    open  http
[...] lot of ports
OS detection performed. Please report any incorrect results at http://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 39.01 seconds

In the /tmp/netapp.xml file nmap already put some useful information. We give it to the nmap_discovery_runner.py to output them in a human way :p

Phase 2 : discovery rule creation
The nmap_discovery_runner.py output the interested fields from the nmap xml file :

$ ./libexec/nmap_discovery_runner.py -s ../netapp.xml
nas.mydomain.com::os=data ontap

Like all "discovery runner" command, it output data in 3 parts :

The key will be the host_name.
Now we can create the host creation rule in the file etc/packs/storage/netapp/discovery.cfg (each "pack" is in a directory)

define discoveryrule {
       discoveryrule_name       NetApp
       creation_type                host

       osvendor                     netapp
       osfamily                     data ontap

       +use                     netapp

The creation_type say we create/update an host. The osvendor and osfamilly are the keys we should match to activate this rule, and so create/update the host. The +use will add the "netapp" string to the "use" property of the host. If we remove the "+" it will erase the value instead of appending it.

So at this point, all netapp appliances detected by a shinken-discovery will create an host with the template netapp (and generic-host that always apply to hosts in the default discovery_rules).

Now we need to do something with all theses hosts "netapp" :)

Phase 3 : hsot template declaration in etc/packs/storage/netapp/templates.cfg

It's a standard host template :

define host{
   name                         netapp
   use                          generic-host
   check_command                check_ping
   register                     0


Here there is just a small trick : we want to allow an host to set it's own snmp community if need but if the admin don't overrite it, it should use the default one in the etc/resource.cfg file. So we are using a custom macro that got by default the resource.cfg value :)

Thanks configuration inheritance.

Phase 4 : check command
Classic command. It is just using the host macro value. The declaration is in etc/packs/storage/netapp/commands.cfg
define command {
       command_name     check_netapp_cpu
       command_line     $PLUGINSDIR$/check_netapp2 -H $HOSTADDRESS$ -C $_HOSTSNMPCOMMUNITY$ -v CPULOAD

Phase 5 : servcies on the netapp template :) in the file  etc/packs/storage/netapp/cpu.cfg
We got hosts, but now we want real services :)

It's just a service template applied over an host template.

define service{
   service_description    Cpu
   use                           generic-service
   register                     0
   host_name                netapp
   check_command       check_netapp_cpu

One important point is to avoid if possible on service values, like warning and critical values. If need, use custom host macros, so you can use template to overwrite the values, and not duplicate the service.

Phase 6 : enjoy, you're done :)

It's quite easy to do, and you don't have to do the discovery rule if you want (like if the "thing" you want to check is not detectable by the network easily).

One thing interesting here is the nmap xml way. You don't need to have a direct access to an equipement to create a rule. All you need is someone that got this access to launch a nmap and send you the xml file :)

We already got some "pack" in the sources, for windows (WMI polling), linux (snmp), classical services(HTTP(s), Ldap, DNS, etc...), some network equipements (cisco, nortel,..), databases (mysql, mssql and oracle), environmental things (temperature with AKCP probe) and hardware check for HP servers and blade chassis.

We are missing some nmap xml scan for some other classical things :
* Solaris
* NT4 servers (I'm sure there are some out-there, don't be shy :) )
* network equipments. There are plenty of them :)

If you got some application monitor that you think it can be good to have in the packs (I'm thinking about jboss or other websphere things) and you got the check commands and/or a nmap scan or a port detection method, we can add it :)

The next step is a configuration UI that use theses packs and the discovery lib and plugins, so the user can just give some dns names or ip range, and the UI will automatically launch a discovery and "auto-tag" them. The user just need to add manual tags and/or change some macros values (like snmp community value) if need :)

So get out your nmap and your "plugin packs", it's time to get a lot of ready to run templates for the next 1.0 version :)