Home

Herve Nicol Alan Brenner
Attachments
200px-Cd-process.png (21322 bytes)

Answers to Frequently Asked Questions about nagiosgraph

What is nagiosgraph?


nagiosgraph is an add-on to Nagios. nagiosgraph does two things: (1) collect performance data from Nagios plugins into RRD files, and (2) generate graphs and web-based reports of the performance data. nagiosgraph is written in perl. nagiosgraph is almost entirely self-contained; it requires only RRDs - the perl interface to rrdtool. Graphs are generated and managed via CGI scripts, with a small amount of JavaScript and CSS.

nagiosgraph uses a parametric approach to configuration rather than a template approach.

nagiosgraph was first released in 2004.

nagiosgraph is free. The source code is distributed under the terms of the Artistic License.

The official nagiosgraph site is http://nagiosgraph.sourceforge.net/

Where is the nagiosgraph documentation?

The README and INSTALL files contain all of the nagiosgraph documentation. The configuration files (etc/*.conf) contain descriptions of syntax and examples.

How do I install nagiosgraph?

There are a few ways to install nagiosgraph.

  • Manually from source. Follow the recipe in the INSTALL file, or read the details in the README file.
  • Automatically from source. Simply run 'install.pl'. Run 'install.pl --help' to see all of the options.
  • Using a package. RPM packages are available for suse, redhat/fedora/centos, and a DEB package is available for debian/ubuntu.

When installing from source (not from a deb or rpm package), the standalone layout is recommended as it is easier to update than the overlay layout.

The install script and packages were introduced in 1.4.4.

How do I find the solution to a Nagios and/or nagiosgraph problem?

First identify whether the problem is a data collection problem or a data display problem. Data collection involves nagios, rrdtool, the nagiosgraph ''map'' file, and and the nagiosgraph insert.pl script. Data display involves the web server, rrdtool, and the nagiosgraph CGI scripts.

Scan the Frequently Asked Questions on this page and the Help Forum to see if you are facing a problem already encountered by someone else. If that does not yield a solution, follow the instructions in the Troubleshooting section of the README document. If that does not yield a solution, please post a description of your problem to the nagiosgraph Help Forum.

The nagiosgraph Help Forum is located at:

https://sourceforge.net/projects/nagiosgraph/forums/forum/394748

Where does nagiosgraph report status and errors?

The log file for data collection is specified by the logfile directive in the nagiosgraph.conf file. The log file must be writable by the nagios user. If no log file is specified, or if the file cannot be written, log entries go to the nagios log file.

The log file for data display is specified by the cgilogfile directive in the nagiosgraph.conf file. The CGI log file must be writable by the web server user. If no log file is specified, or if the file cannot be written, log entries go to the web server log file.

The debug directives in the nagiosgraph.conf file control what information is logged. There are 5 log levels, from debug to critical. There are also mechanisms to specify different log levels for each host and/or service. This makes it easy to find out what is happening for a specific host and/or service, even if your installation has thousands of hosts or services.

If GD is installed, rrdtool errors will be displayed directly in the CGI output. If GD is not installed, look in the nagiosgraph CGI log or the web server error log.

What is a host? service? database? data source?

Hosts and services are defined in Nagios as host_name and service_desc, respectively. A database is a single RRD file. Each database contains one or more data sources. Databases and data sources are defined in the map file by the rules that extract data from Nagios output and performance data. A single graph can display one or more data sources from one or more databases.

What do the configuration files (*.conf) do?

The configuration files control the behavior of data collection and data display. The syntax for each file is spelled out in the sample .conf files. nagiosgraph.conf is the only required configuration file.

  • nagiosgraph.conf

    • This is the main configuration file. It is used by all of the CGI scripts as well as the data processor, insert.pl. This file is required for the CGI scripts and the data processor to function. All of the other configuration files are optional.
  • datasetdb.conf

    • Determines which data sources will be shown when a service is specified with no database/datasource. If defined, this is used by every CGI script, including showgraph.cgi. Use this to get a subset or superset of data sources for a service instead of having to specify them explicitly each time you graph the service.
  • groupdb.conf

    • Defines groups of graphs and which data sources should be displayed in each of the graphs. This is used only by showgroup.cgi.
  • hostdb.conf

    • Defines which services should be displayed, and their order. This is used only by showhost.cgi. If this file is empty or not defined, then every available service will be shown for the specified host.
  • servdb.conf

    • Defines which hosts should be displayed, and their order. This is used only by showservice.cgi. If this file is empty or not defined, then every host with the specified service will be shown in alphabetic order.
  • rrdopts.conf

    • Defines RRD graphing options for specific services. Use this to add labels to the vertical axis, control graph scaling, specify axis and graph styles, and other parameters used by the rrdgraph function.
  • labels.conf

    • Defines labels for services and databases/datasources.
  • access.conf

    • Defines which users have permission to see graphs.

Everything is configured, but nothing seems to happen. What do I do?

First see whether insert.pl is being invoked properly. To do this, increase the logging level in Nagios. In the nagios.cfg file, set debug_level=256 and set debug_verbosity and debug_file. Then look at the Nagios log file for clues.

If you see messages such as 'Can't locate object method "croak" via package "..."', the embedded PERL interpreter (ePN) is not able to execute insert.pl. To fix this, use non-embedded PERL by invoking insert.pl directly in the Nagios configuration, for example:

/usr/bin/perl /usr/local/nagiosgraph/libexec/insert.pl

Nagios must be restarted after any change to the Nagios configuration files (nagios.cfg, commands.cfg). Changes to the map file or nagiosgraph.conf do not require a restart of Nagios. Changes to nagiosgraph.conf might require a restart of the web server, for example if mod_perl or mod_cgi is in use and caching.

RRD files are not being created or updated. Why not?

Ensure that you have configured Nagios to process performance data as detailed in the Configuring Data Processing section of the README file. Be sure that process_performance_data=1 for Nagios (typically in the nagios.cfg file) and that process_perf_data=1 for any service you want to record (typically in a service template).

Permissions

Does the nagios user have write access to the directory in which the performance data log file is located? When nagiosgraph parses performance data, it creates a temporary file next to the performance data log file, so it needs write access to the containing directory.

Does the nagios user have write access to the RRD directory? Look for messages about 'cannot create rrd directory' or 'cannot create directory' in the nagiosgraph log file. Log in as the nagios user and ensure that you can create a file in the RRD directory.

Is SELinux interfering? Use setenforce 0 to temporarily disable SELinux. If that is the cause of the problem, use setenforce 1 to re-enable SELinux, then see the SELinux documentation to configure policies that do not interfere with Nagios, the web server, or nagiosgraph.

Performance Data

Are the perfdata recognized? Look for messages about 'output/perfdata not recognized' in the nagiosgraph log file. If no map rule matches the plugin output and/or perfdata, no RRD file will be created/updated.

Are the data source names valid? Monitor the nagiosgraph log file for messages about 'ds-name is not valid'. Ensure that each map rule uses valid data source names.

Does the plugin not return output or perfdata? In nagiosgraph 1.4.3 and earlier, insert.pl would silently abort if a plugin did not return output or performance data. See this thread for the symptoms and the fix:

https://sourceforge.net/projects/nagiosgraph/forums/forum/394748/topic/3764827

With logging set to INFO (debug_insert=4 in nagiosgraph.conf), nagiosgraph 1.4.4 and later makes log entries about the number of lines of perfdata available from nagios, followed by information about how many of those lines were recognized and processed by nagiosgraph.

Data are not showing up for any host or service. Why not?

Ensure that permissions are set correctly. The nagios user must have write access to the RRD directory and to the directory in which the performance data log file resides. The web server user must have read access to the RRD directory and its contents.

For example, with the RRD directory at /var/nagios/rrd and the performance data log file at /var/nagios/perfdata.log, this will not work:

[user@host]% ls -la /var/nagios/
total 620
drwxr-xr-x  3 root   root         4096 Mar 21 12:39 .
drwxr-xr-x 28 root   root         4096 Dec 14 16:22 ..
-rw-r--r--  2 nagios nagioscmd  610996 Mar 23 11:31 perfdata.log
drwxr-xr-x  2 nagios nagioscmd    4096 Mar 22 16:22 rrd

but this will work:

[user@host]% chown nagios:nagioscmd /var/nagios
[user@host]% ls -la /var/nagios/
total 620
drwxr-xr-x  3 nagios nagioscmd    4096 Mar 21 12:39 .
drwxr-xr-x 28 root   root         4096 Dec 14 16:22 ..
-rw-r--r--  2 nagios nagioscmd  611089 Mar 23 11:33 perfdata.log
drwxr-xr-x  2 nagios nagioscmd    4096 Mar 22 16:22 rrd

Data from service X are not showing up. Why not?

Services may emit output, performance data, or both output and performance data. Make sure there is a rule in the map file that matches the service output and/or performance data. If the service does not emit performance data, you will have to create a rule to parse the service output. See the section Adding Service Types in the README file for details.

Nagiosgraph 1.4.4 and later includes a map rule which will capture perfdata from any standards-compliant plugin. It should work as the last rule in the map file for earlier releases as well.

http://nagiosgraph.svn.sourceforge.net/viewvc/nagiosgraph/trunk/nagiosgraph/etc/map

In nagiosgraph 1.4.3 and earlier, if no perfdata and no output were emitted, processing of data would stop. See this thread for details and the fix (included in 1.4.4):

https://sourceforge.net/projects/nagiosgraph/forums/forum/394748/topic/3967175

Data from Windows hosts are not showing up. Why not?

First ensure that the data are being collected. Look in the RRD directory (specified in nagiosgraph.conf) and see if there are RRD files for the windows hosts. If there are no RRD files, you probably need to add one or more rules to the map file (also specified in nagiosgraph.conf). Here are some examples:

http://nerhood.wordpress.com/2004/09/22/nagiosgraph-with-windows-support/

http://rambling-techie.blogspot.com/2009/02/nagiosgraph-windows-clients.html

http://www.claudiokuenzler.com/blog/120/nagiosgraph-map-windows-nsclient-memory-cpu-disk

If you are using nagiosgraph 1.4.2 or earlier, please upgrade. The 1.4.3 release of nagiosgraph contains a bugfix for use of backslashes and colons in service and database names (this often shows up when using windows disk and directory names directly, such as c:).

I updated to Nagios 3.3.1 and now nagiosgraph does not work for some plugins. Why not?

A bug was introduced in Nagios 3.3.1. If a plugin does not emit performance data, nothing is emitted to the performance data file, even if the plugin does emit output. As a result, any plugin that does not emit performance data will be ignored, even if there is a map rule to parse its output.

See this thread:

http://www.mail-archive.com/nagios-users@lists.sourceforge.net/msg36835.html

One workaround is to create a plugin wrapper that captures the output from a plugin and formats the output as performance data in the standard format.

How do I make popup graphs appear on mouseovers in Nagios?

See the section Graphs in Nagios Mouseovers in the README file. Explicit support for popups on mouseovers was introduced in version 1.4.1.

How do I display graphs in the Nagios frame?

See the section Graphs in Nagios Frames in the README file.

How do I zoom into a graph? How do I zoom out?

Simply click and drag to zoom in on a section of data in a graph. To revert back to the original zoom level, right-click anywhere on the graph. Zooming was introduced in version 1.4.3.

How do I limit access to graphs?

There are two mechanisms for controlling access: Nagios configuration files or standalone nagiosgraph access control file. See the section Configuring Access Controls in the README file. Access control was introduced in version 1.4.2.

Can nagiosgraph display multiple Y axes?

nagiosgraph 1.4.x or earlier can display only one vertical (Y) axis.

Embedded PERL (ePN) returns errors. What is wrong?

The Nagios embedded PERL (ePN) does not understand all PERL idioms. If you see errors such as this:

ePN failed to compile /usr/local/nagiosgraph/libexec/insert.pl: "Missing right curly or square bracket at (eval 1) line 45, at end of line: syntax error at (eval 1) line 52, at EOF" at /usr/local/nagios/bin/p1.pl line 159.

you must invoke insert.pl explicitly with non-embedded PERL, for example:

/usr/bin/perl /usr/local/nagiosgraph/libexec/insert.pl

See this thread:
https://sourceforge.net/projects/nagiosgraph/forums/forum/394748/topic/3616208

The CGI scripts fail, but the web server is configured properly. Why?

Is SELinux enabled? If the web server error log contains errors such as this:

Permission denied: exec of '/usr/lib/nagiosgraph/cgi-bin/show.cgi' failed

then try temporarily disabling SELinux with setenforce 0. If that is the problem, you can re-enable with setenforce 1 then see the SELinux documentation to create a policy that does not interfere with the web server behavior.

My graphs are fragmented/spotty. Why?

Check the data sampling rate. The stepsize (specified in nagiosgraph), heartbeat (specified in nagiosgraph) and sampling interval (specified in Nagios) must be coordinated.

For example, if the stepsize is 300 (5 minutes - the default) and the heartbeat is 600 (10 minutes - the default), but data are sampled every 20 minutes, then every other data point in the RRD will be undefined (a value of NaN in the RRD file), resulting in fragmented graphs.

Gaps can also happen when the sampling interval is equal to the heartbeat, but sampling is delayed. For example, with a stepsize of 300 (5 minutes), a heartbeat of 600 (10 minutes), and a sampling interval of 10 minutes (specified in Nagios), and delays due to Nagios processing will result in NaN values in the RRD file and gaps in the graphs.

A good rule of thumb is to use a heartbeat that is twice the sampling interval, and a stepsize that is the same as the sampling interval.

Note that the stepsize and heartbeat are set when an RRD file is created. If you change the stepsize and/or heartbeat, you must either delete the corresponding RRD file(s) so that nagiosgraph can create a new one with the new stepsize/heartbeat, or manually modify the stepsize and/or heartbeat in the RRD files(s) by doing a dump/edit/restore.

How do I record maximum/minimum/last values instead of average values?

rrdtool can record values as AVERAGE, MIN, MAX, or LAST.

By default, RRD files created by nagiosgraph record average values. Use maximums, minimums, or lasts in nagiosgraph.conf to specify the services for which data should be recorded as MAX, MIN, or LAST, respectively.

Note that if the RRD file for a service has already been created using AVERAGE (the default), you must delete the RRD file after changing the service to MAX, MIN, or LAST so that the RRD file can be re-created.

If you want to record maximum and/or minimum values in addition to average values, use withmaximums and/or withminimums in nagiosgraph.conf.

How do I specify the data source type?

rrdtool characterizes data as one of GAUGE, COUNTER, DERIVE, or ABSOLUTE.

The data source type is specified by the rules in the map file. Most data are saved as GAUGE or DERIVE. To specify a different type, create a rule that matches the service check output/perfdata then use the desired type in the update array.

See the rrdtool documentation for details about each type:

http://www.mrtg.org/rrdtool/doc/rrdcreate.en.html

How do I record more data?

In nagiosgraph.conf, modify the resolution and step for all hosts/services/databases, or the resolutions and steps to specify values for a single host/service/database. The resolution determines the number of points that will be saved. The step determines how many values are consolidated.

This will only affect new RRD files; you must manually dump/edit/restore to resize any existing RRD files.

The default settings are:

resolution=600 700 775 797
step=1 6 24 288

To record twice as much data, use:

resolution=1200 1400 1550 1594
step=1 6 24 288

What settings control the amount and frequency of data stored?

The stepsize, in seconds, defines the nominal amount of time between data points. The default value is 300 (5 minutes), which matches the Nagios sampling interval. The heartbeat, in seconds, defines the amount of time between updates before a data point should be considered unknown. The default is 600 (10 minutes) and is typically set to twice the stepsize. The resolution defines how many data points should be kept. The step defines how data points are consolidated. The xfiles factor defines how unknown data points are considered when consolidating data.

These values are used only when an RRD file is created. To change the stepsize, heartbeat, or resolution of an existing RRD, one must dump the RRD file to XML, modify the data, then restore the RRD file. Alternatively, simply delete the existing RRD file and create a new one with the new settings.

The heartbeat and stepsize must be coordinated with the values in Nagios that specify how often data will be collected, recorded , and processed (the check_interval and configuration for processing of the perfdata). If these values are not coordinated, RRD files will contain gaps in data and graphs will appear spotty.

As of nagiosgraph 1.4.3, the stepsize, heartbeat, and resolution can be specified per-host, per-service, and/or per-database. For example, data sampled from a wind sensor every 10 seconds could have a stepsize of 10 seconds while data sampled from pinging a host every 10 minutes could have a stepsize of 600 seconds.

As of nagiosgraph 1.4.4, the xff (xfiles factor) and step can be specified per-host, per-service, and/or per-database.

A typo in 1.4.3 and 1.4.4 prevents the specification of stepsizes, heartbeats, and resolutions. It has been fixed in 1.4.5. The problem and solution are described in this thread:

https://sourceforge.net/projects/nagiosgraph/forums/forum/394748/topic/4428502

What parameters are used to create the RRD files?

For nagiosgraph up to 1.4, the following definition is used to create an RRD file:

DS:DSNAME:DST:HEARTBEAT:U:U
RRA:CF:XFF:STEP1:NROWS1
RRA:CF:XFF:STEP2:NROWS2
RRA:CF:XFF:STEP3:NROWS3
RRA:CF:XFF:STEP4:NROWS4

where:

  • DSNAME is the data source name. A data source name must be 1 to 19 characters long in the characters [a-zA-Z0-9_]. The data source name is specified in the map rule.

  • DST is the data source type, one of GAUGE, COUNTER, DERIVE, or ABSOLUTE. The data source type is specified in the map rule.

  • HEARTBEAT is the heartbeat specified in the nagiosgraph configuration.

  • CF is the consolidation function, one of AVERAGE, MIN, MAX, or LAST. AVERAGE is the default. MIN, MAX, or LAST is used when minimums, maximums, or lasts is specified. MIN and MAX are used in separate RRD files when withminimums or withmaximums is specified.

  • XFF is the xfiles factor defined as xff in the nagiosgraph configuration. The default value is 0.5.

  • STEP are step values from the nagiosgraph configuration.

  • NROWS are resolution values from the nagiosgraph configuration.

For example, the default configuration and the map rule:

/output:PING.*?(\d+)%.+?([.\d]+)\sms/
and push @s, [ 'pingloss',
               [ 'losspct', GAUGE, $1 ]]
and push @s, [ 'pingrta',
               [ 'rta', GAUGE, $2/1000 ]];

yields the following two files: hostname/PING___pingloss.rrd

DS:losspct:GAUGE:600:U:U
RRA:AVERAGE:0.5:1:600
RRA:AVERAGE:0.5:6:700
RRA:AVERAGE:0.5:24:775
RRA:AVERAGE:0.5:288:797

and hostname/PING___pingrta.rrd

DS:rta:GAUGE:600:U:U
RRA:AVERAGE:0.5:1:600
RRA:AVERAGE:0.5:6:700
RRA:AVERAGE:0.5:24:775
RRA:AVERAGE:0.5:288:797

The RRA definitions are confusing. Could you provide some examples?

The default nagiosgraph configuration uses these parameters:

hearbeat=600
stepsize=300
resolution=600 700 775 797
step=1 6 24 288
xff=0.5

which results in a single data source RRD file with size 24124 bytes and RRA definition:

DS:datasourcename:GAUGE:600:U:U     seconds  hours    days  years
RRA:AVERAGE:0.5:1:600                180000     50    2.08
RRA:AVERAGE:0.5:6:700               1260000    350   14.58
RRA:AVERAGE:0.5:24:775              5580000   1550   64.58
RRA:AVERAGE:0.5:288:797            68860800  19128  797.00   2.18

To record twice the historical and four times the daily data using the same consolidation factors, use these parameters:

hearbeat=600
stepsize=300
resolution=2400 1400 1550 1594
step=1 6 24 288
xff=0.5

which results in a single data source RRD file with size 56700 bytes and RRA definition:

DS:datasourcename:GAUGE:600:U:U     seconds  hours    days  years
RRA:AVERAGE:0.5:1:2400               720000    200    8.33
RRA:AVERAGE:0.5:6:1400              2520000    700   29.17
RRA:AVERAGE:0.5:24:1550            11160000   3100  129.17
RRA:AVERAGE:0.5:288:1594          137721600  38256 1594.00   4.37

How do I inspect the contents of RRD files?

Use rrdtool to manipulate the contents of RRD files. For example, to export to XML use the dump option:

rrdtool dump filename.rrd > filename.xml

To import from XML use the restore option:

rrdtool restore filename.xml filename.rrd

What is the best way to organize data into RRD files?

The default map rule creates an RRD file for each performance data metric. Each RRD file contains the metric data and may contain additional data sources for warning, critical, minimum, and maximum values.

For example, a plugin outputs the following:

OK - all is well |size=10 cost=30;100;500 level=2%;80;90;0;100

From this output, nagiosgraph creates 3 RRD files, one each for size, cost, and level. The size RRD file contains a single data source named 'data'. The cost RRD file contains 3 data sources named 'data', 'warn', and 'crit'. The level RRD file contains 5 data sources named 'data', 'warn', 'crit', 'min', and 'max'.

The graphing portion of nagiosgraph is agnostic with respect to RRD contents. The CGI scripts will display data whether they come from multiple data sources in a single RRD file, or individual data sources each in its own RRD file.

Project Members: