nagios-devel Mailing List for Nagios Core (Page 2)

nagios-devel — Developers List

You can subscribe to this list here.

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (4)	Dec
2002	Jan (9)	Feb (11)	Mar (37)	Apr (50)	May (51)	Jun (31)	Jul (35)	Aug (30)	Sep (39)	Oct (85)	Nov (91)	Dec (30)
2003	Jan (65)	Feb (56)	Mar (80)	Apr (73)	May (84)	Jun (108)	Jul (62)	Aug (53)	Sep (46)	Oct (44)	Nov (59)	Dec (91)
2004	Jan (71)	Feb (25)	Mar (87)	Apr (67)	May (65)	Jun (60)	Jul (31)	Aug (100)	Sep (97)	Oct (133)	Nov (87)	Dec (65)
2005	Jan (84)	Feb (133)	Mar (67)	Apr (74)	May (108)	Jun (92)	Jul (54)	Aug (82)	Sep (22)	Oct (10)	Nov (69)	Dec (65)
2006	Jan (89)	Feb (133)	Mar (100)	Apr (69)	May (138)	Jun (85)	Jul (45)	Aug (56)	Sep (33)	Oct (69)	Nov (33)	Dec (78)
2007	Jan (120)	Feb (120)	Mar (136)	Apr (153)	May (75)	Jun (65)	Jul (64)	Aug (129)	Sep (122)	Oct (231)	Nov (145)	Dec (93)
2008	Jan (163)	Feb (157)	Mar (60)	Apr (96)	May (92)	Jun (67)	Jul (90)	Aug (75)	Sep (87)	Oct (81)	Nov (158)	Dec (71)
2009	Jan (83)	Feb (77)	Mar (94)	Apr (78)	May (210)	Jun (146)	Jul (104)	Aug (208)	Sep (80)	Oct (54)	Nov (55)	Dec (84)
2010	Jan (21)	Feb (67)	Mar (85)	Apr (20)	May (29)	Jun (11)	Jul (13)	Aug (48)	Sep (56)	Oct (70)	Nov (100)	Dec (83)
2011	Jan (59)	Feb (36)	Mar (59)	Apr (21)	May (96)	Jun (27)	Jul (17)	Aug (58)	Sep (12)	Oct (15)	Nov (27)	Dec (20)
2012	Jan (36)	Feb (3)	Mar (19)	Apr (9)	May (22)	Jun (34)	Jul (7)	Aug (31)	Sep (65)	Oct (18)	Nov (78)	Dec (24)
2013	Jan (46)	Feb (73)	Mar (50)	Apr (11)	May (32)	Jun (10)	Jul (6)	Aug (18)	Sep (36)	Oct	Nov	Dec

Flat | Threaded

<< < 1 2 3 4 .. 380 > >> (Page 2 of 380)

Re: [Nagios-devel] configuration directory

From: AL13N <al...@rm...> - 2013-09-04 10:55:37

>
> On Sep 3, 2013, at 5:34 PM, AL13N <al...@rm...> wrote:
>
>> Op dinsdag 3 september 2013 17:22:32 schreef Andreas Ericsson:
>>> On 2013-09-02 21:20, AL13N wrote:
>>>> Hi,
>>>>
>>>> i'm a Mageia distribution packager, and i'm maintaining some event
>>>> broker
>>>> modules.
>>>>
>>>> since these are packaged seperately we're trying to get things to work
>>>> out-of- the-box for our users, specifically adding event brokers in
>>>> the
>>>> configuration file for nagios.
>>>>
>>>> the problem we're facing is that: We don't want a nagios addon package
>>>> to
>>>> rewrite the configuration file itself. however, we would like to add
>>>> the
>>>> event broker (or other settings) and then reload/restart nagios.
>>>>
>>>> a beneficial thing would be a configuration directory (much like it
>>>> exists
>>>> with hosts and commands and stuff like that) but for the main nagios
>>>> configuration file.
>>>>
>>>> and if while we're at it, to have multiple directories, where
>>>> /usr/share/nagios/nagios.d/ has the "defaults" which can be overridden
>>>> in
>>>> /etc/nagios.d/ (for instance) (this is a convention that's growing in
>>>> use)
>>>>
>>>> I can try to start making a patch for this, but i'm asking about this
>>>> first, in order to find out if it makes a chance of getting accepted.
>>>
>>> Not only does it stand a chance of getting accepted; It's on the
>>> roadmap. If you start working on it I'll help as much as I can and
>>> I'm willing to accept less-than-awesome quality code, although I'll
>>> polish such later if it turns out to be an issue.
>>>
>>> The only real requirement is that any "main" config file option
>>> should be settable from any file in the dropdir.
>>
>>
>> the idea i thought might be doable, would be to allow multiple -d
>> options to
>> nagios, and allow those to be a file or a directory.
>>
>> this would keep working:
>>
>> [  ]# /usr/sbin/nagios -d /etc/nagios/nagios.cfg
>>
>> and allow
>>
>> [  ]# /usr/sbin/nagios -d /usr/share/nagios/conf.d/ -d
>> /etc/nagios/conf.d/
>>
>> to work too.
>>
>> internally, instead of one file, we'll have to have an array of
>> files/dirs, so
>> that reloading nagios would also expand the config files/dirs into files
>> and
>> parse them all in order...
>>
>>
>
> remember too that -d is daemonize mode so probably need another option…

yes, i guess i meant

[ ]# nagios -d /usr/share/nagios/conf.d/ /etc/nagios/conf.d/

but it seems we're looking into it differently

Re: [Nagios-devel] [oss-security] Security bug or feature? Servicegroups leak hostnames to unauthorized users (Was: [oss-security] CVE request: unauthorized host/service views displayed in servicegroup view)

From: Andreas E. <ae...@op...> - 2013-09-04 10:54:12

On 2013-09-04 11:37, Jochen Bern wrote:
> On 04.09.2013 11:03, Andreas Ericsson wrote:
>> On 2013-09-04 10:31, Jonas Meurer wrote:
>>> The indisputable part of this change is, that users are allowed to see
>>> hostgroups and servicegroups with at least one authorized host or
>>> service. Unclear is, whether this means "group and all its group
>>> members", or "group and only authorized group members".
>>
>> It should mean "group and only authorized group members, except also
>> hosts for services where one is authorized to see the service".
> [...]
>> Well, it *was* by design, but now I'm changing the design. It's a good
>> time for it, since 4.0 is about to come out. I think the security teams
>> can move on and we'll consider this "changed" rather than "fixed" for
>> 4.0, where we do some security tightening.
>
> Since you do seem to be willing to ponder the system of access rights
> and its security implications: I haven't checked the 4.x prereleases
> yet, does being authorized to see a host's information still necessarily
> provide access to *all* services on it?
>

AFAIK, yes. Please understand that I'm very uninterested in changes to
the UI though, and I'd be much (much) happier if UI and core were
split into two different components.

> In the "customers accessing provider's Nagios" scenario, I suppose that
> the customer might be interested in seeing "application is running" but
> not, say, "the snmpd that ties this machine to the provider's NMS is
> acting up" ...
>

I agree. The problem is that with access to the host comes access to
commands that affect all services on that host as well, so it's not
necessarily as clearcut as "disable viewing here and we're done", if
one wants to do things properly.

-- 
Andreas Ericsson                   and...@op...
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

Re: [Nagios-devel] configuration directory

From: AL13N <al...@rm...> - 2013-09-04 10:51:21

> On 2013-09-04 00:34, AL13N wrote:
>> Op dinsdag 3 september 2013 17:22:32 schreef Andreas Ericsson:
>>> On 2013-09-02 21:20, AL13N wrote:
>>>> Hi,
>>>>
>>>> i'm a Mageia distribution packager, and i'm maintaining some event
>>>> broker
>>>> modules.
>>>>
>>>> since these are packaged seperately we're trying to get things to work
>>>> out-of- the-box for our users, specifically adding event brokers in
>>>> the
>>>> configuration file for nagios.
>>>>
>>>> the problem we're facing is that: We don't want a nagios addon package
>>>> to
>>>> rewrite the configuration file itself. however, we would like to add
>>>> the
>>>> event broker (or other settings) and then reload/restart nagios.
>>>>
>>>> a beneficial thing would be a configuration directory (much like it
>>>> exists
>>>> with hosts and commands and stuff like that) but for the main nagios
>>>> configuration file.
>>>>
>>>> and if while we're at it, to have multiple directories, where
>>>> /usr/share/nagios/nagios.d/ has the "defaults" which can be overridden
>>>> in
>>>> /etc/nagios.d/ (for instance) (this is a convention that's growing in
>>>> use)
>>>>
>>>> I can try to start making a patch for this, but i'm asking about this
>>>> first, in order to find out if it makes a chance of getting accepted.
>>>
>>> Not only does it stand a chance of getting accepted; It's on the
>>> roadmap. If you start working on it I'll help as much as I can and
>>> I'm willing to accept less-than-awesome quality code, although I'll
>>> polish such later if it turns out to be an issue.
>>>
>>> The only real requirement is that any "main" config file option
>>> should be settable from any file in the dropdir.
>>
>>
>> the idea i thought might be doable, would be to allow multiple -d
>> options to
>> nagios, and allow those to be a file or a directory.
>>
>
> I'd much rather see this as "include=/some/path.d" where we read all
> files ending with .cfg and .conf if it's a directory, or the file if
> it's a fail, and then parse each include'd file in order of
>    include-statement -> lexicographical order
> so that
>
> include=/etc/nagios.conf.d/
> include=/etc/nagios.perfdata.d/
>
> would cause all files from /etc/nagios.conf.d to be parsed before all
> files from /etc/nagios.perfdata.d.
>
> The included files can include other files, if they're so inclined.

so, you mean to keep one nagios.cfg file and have include=/path/to/foo.

would this mean that if you have

setting1=value

setting2=value

include=/path/to/foo

setting3=value

that being parsed would:
1) setting1
2) setting2
3) all from /path/to/foo if it's a file or directory
4) setting3


is this the priority that you aim for?

From: Jochen B. <Jochen.Bern@LINworks.de> - 2013-09-04 09:37:42

On 04.09.2013 11:03, Andreas Ericsson wrote:
> On 2013-09-04 10:31, Jonas Meurer wrote:
>> The indisputable part of this change is, that users are allowed to see
>> hostgroups and servicegroups with at least one authorized host or
>> service. Unclear is, whether this means "group and all its group
>> members", or "group and only authorized group members".
> 
> It should mean "group and only authorized group members, except also
> hosts for services where one is authorized to see the service".
[...]
> Well, it *was* by design, but now I'm changing the design. It's a good
> time for it, since 4.0 is about to come out. I think the security teams
> can move on and we'll consider this "changed" rather than "fixed" for
> 4.0, where we do some security tightening.

Since you do seem to be willing to ponder the system of access rights
and its security implications: I haven't checked the 4.x prereleases
yet, does being authorized to see a host's information still necessarily
provide access to *all* services on it?

In the "customers accessing provider's Nagios" scenario, I suppose that
the customer might be interested in seeing "application is running" but
not, say, "the snmpd that ties this machine to the provider's NMS is
acting up" ...

Regards,
								J. Bern
-- 
*NEU* - NEC IT-Infrastruktur-Produkte im <http://www.linworks-shop.de/>:
Server--Storage--Virtualisierung--Management SW--Passion for Performance
Jochen Bern, Systemingenieur --- LINworks GmbH <http://www.LINworks.de/>
Postfach 100121, 64201 Darmstadt | Robert-Koch-Str. 9, 64331 Weiterstadt
PGP (1024D/4096g) FP = D18B 41B1 16C0 11BA 7F8C DCF7 E1D5 FAF4 444E 1C27
Tel. +49 6151 9067-231, Zentr. -0, Fax -299 - Amtsg. Darmstadt HRB 85202
Unternehmenssitz Weiterstadt, Geschäftsführer Metin Dogan, Oliver Michel

[Nagios-devel] Security bug or feature? Servicegroups leak hostnames to unauthorized users (Was: [oss-security] CVE request: unauthorized host/service views displayed in servicegroup view)

From: Jonas M. <jo...@fr...> - 2013-09-04 09:08:39

Hey list and fellow Nagios developers,

as you might have noticed, there's a discussion ongoing on 
oss-security[1]
regarding bug report #456[2].

I'm the one who discovered the described issue, and I still believe that
it's a bug with security implications, even though not everyone seems to
be convinced.

I'll try to give a brief description of the issue:

The Nagios status.cgi (at all 3.4* and 4.0* versions I checked) leaks
hostnames to unauthorized users as part of servicegroups. All of
servicegroup overview, summary and grid list each and every hostname 
that
is part of a servicegroup, regardless whether the HTTP user is listed in
contacts/contactgroups for this host.

In my opinion this is a security issue - at least on multi-user (e.g.
multi-customer) Nagios-setups. I guess that most ISPs which give their
customers access to the Nagios CGIs don't want to provide a full list
of monitored hosts to their customers as a side-effect.

One reason for confusion is the following entry from Nagios3 
changelog[3]:

3.4.0 - 05/04/2012
ENHANCEMENTS
[...]
- Users can now see hostgroups and servicegroups that contain at least
   one host or service they are authorized for, instead of having to
   be authorized for them all (Ethan Galstad)


The indisputable part of this change is, that users are allowed to see
hostgroups and servicegroups with at least one authorized host or
service. Unclear is, whether this means "group and all its group
members", or "group and only authorized group members".

Unfortunately, no Nagios developer speaked up yet about this issue. Thus
there's still a lot confusion about it.

You can find my patch at the Nagios Issue Tracker. This patch changes
status.cgi behaviour to show only group members (hosts/services) that
the user is authorized to see.

A comment about this issue by the Nagios Developers whould be highly
appreciated. In case that the described (and critizised) behaviour of
status.cgi is intended, the distribution security teams can move on.

If on the other hand you agree with me, that this issue should be
fixed, I'll continue to work with the security teams in order to
provide patched Nagios packages for their distributions.

Thanks for your work on Nagios, it's a very valuable piece of software!

Kind regards,
  jonas

[1] http://www.openwall.com/lists/oss-security/2013/06/26/6
[2] http://tracker.nagios.org/view.php?id=456
[3] http://www.nagios.org/projects/nagioscore/history/core-3x

From: Andreas E. <ae...@op...> - 2013-09-04 09:03:20

On 2013-09-04 10:31, Jonas Meurer wrote:
> Hey list and fellow Nagios developers,
>
> as you might have noticed, there's a discussion ongoing on oss-security[1]
> regarding bug report #456[2].
>
> I'm the one who discovered the described issue, and I still believe that
> it's a bug with security implications, even though not everyone seems to
> be convinced.
>
> I'll try to give a brief description of the issue:
>
> The Nagios status.cgi (at all 3.4* and 4.0* versions I checked) leaks
> hostnames to unauthorized users as part of servicegroups. All of
> servicegroup overview, summary and grid list each and every hostname that
> is part of a servicegroup, regardless whether the HTTP user is listed in
> contacts/contactgroups for this host.
>
> In my opinion this is a security issue - at least on multi-user (e.g.
> multi-customer) Nagios-setups. I guess that most ISPs which give their
> customers access to the Nagios CGIs don't want to provide a full list
> of monitored hosts to their customers as a side-effect.
>
> One reason for confusion is the following entry from Nagios3 changelog[3]:
>
> 3.4.0 - 05/04/2012
> ENHANCEMENTS
> [...]
> - Users can now see hostgroups and servicegroups that contain at least
>    one host or service they are authorized for, instead of having to
>    be authorized for them all (Ethan Galstad)
>
>
> The indisputable part of this change is, that users are allowed to see
> hostgroups and servicegroups with at least one authorized host or
> service. Unclear is, whether this means "group and all its group
> members", or "group and only authorized group members".
>

It should mean "group and only authorized group members, except also
hosts for services where one is authorized to see the service".

> Unfortunately, no Nagios developer speaked up yet about this issue. Thus
> there's still a lot confusion about it.
>

Well, now I have, so confusion dispelled.

> You can find my patch at the Nagios Issue Tracker.

Ah, right. Care to provide a link? Mostly, I prefer to get patches to
this mailing list, since I don't spend a lot of time hunting them down
from the (underused) tracker.

> This patch changes
> status.cgi behaviour to show only group members (hosts/services) that
> the user is authorized to see.
>
> A comment about this issue by the Nagios Developers whould be highly
> appreciated. In case that the described (and critizised) behaviour of
> status.cgi is intended, the distribution security teams can move on.
>

Well, it *was* by design, but now I'm changing the design. It's a good
time for it, since 4.0 is about to come out. I think the security teams
can move on and we'll consider this "changed" rather than "fixed" for
4.0, where we do some security tightening.

> If on the other hand you agree with me, that this issue should be
> fixed, I'll continue to work with the security teams in order to
> provide patched Nagios packages for their distributions.
>
> Thanks for your work on Nagios, it's a very valuable piece of software!
>

Thanks for enjoying it.

-- 
Andreas Ericsson                   and...@op...
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

Re: [Nagios-devel] configuration directory

From: Andreas E. <ae...@op...> - 2013-09-04 08:57:17

On 2013-09-04 00:34, AL13N wrote:
> Op dinsdag 3 september 2013 17:22:32 schreef Andreas Ericsson:
>> On 2013-09-02 21:20, AL13N wrote:
>>> Hi,
>>>
>>> i'm a Mageia distribution packager, and i'm maintaining some event broker
>>> modules.
>>>
>>> since these are packaged seperately we're trying to get things to work
>>> out-of- the-box for our users, specifically adding event brokers in the
>>> configuration file for nagios.
>>>
>>> the problem we're facing is that: We don't want a nagios addon package to
>>> rewrite the configuration file itself. however, we would like to add the
>>> event broker (or other settings) and then reload/restart nagios.
>>>
>>> a beneficial thing would be a configuration directory (much like it exists
>>> with hosts and commands and stuff like that) but for the main nagios
>>> configuration file.
>>>
>>> and if while we're at it, to have multiple directories, where
>>> /usr/share/nagios/nagios.d/ has the "defaults" which can be overridden in
>>> /etc/nagios.d/ (for instance) (this is a convention that's growing in use)
>>>
>>> I can try to start making a patch for this, but i'm asking about this
>>> first, in order to find out if it makes a chance of getting accepted.
>>
>> Not only does it stand a chance of getting accepted; It's on the
>> roadmap. If you start working on it I'll help as much as I can and
>> I'm willing to accept less-than-awesome quality code, although I'll
>> polish such later if it turns out to be an issue.
>>
>> The only real requirement is that any "main" config file option
>> should be settable from any file in the dropdir.
>
>
> the idea i thought might be doable, would be to allow multiple -d options to
> nagios, and allow those to be a file or a directory.
>

I'd much rather see this as "include=/some/path.d" where we read all
files ending with .cfg and .conf if it's a directory, or the file if
it's a fail, and then parse each include'd file in order of
   include-statement -> lexicographical order
so that

include=/etc/nagios.conf.d/
include=/etc/nagios.perfdata.d/

would cause all files from /etc/nagios.conf.d to be parsed before all
files from /etc/nagios.perfdata.d.

The included files can include other files, if they're so inclined.

-- 
Andreas Ericsson                   and...@op...
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

Re: [Nagios-devel] configuration directory

From: Daniel W. <dwi...@gm...> - 2013-09-03 23:55:07

On Sep 3, 2013, at 5:34 PM, AL13N <al...@rm...> wrote:

> Op dinsdag 3 september 2013 17:22:32 schreef Andreas Ericsson:
>> On 2013-09-02 21:20, AL13N wrote:
>>> Hi,
>>> 
>>> i'm a Mageia distribution packager, and i'm maintaining some event broker
>>> modules.
>>> 
>>> since these are packaged seperately we're trying to get things to work
>>> out-of- the-box for our users, specifically adding event brokers in the
>>> configuration file for nagios.
>>> 
>>> the problem we're facing is that: We don't want a nagios addon package to
>>> rewrite the configuration file itself. however, we would like to add the
>>> event broker (or other settings) and then reload/restart nagios.
>>> 
>>> a beneficial thing would be a configuration directory (much like it exists
>>> with hosts and commands and stuff like that) but for the main nagios
>>> configuration file.
>>> 
>>> and if while we're at it, to have multiple directories, where
>>> /usr/share/nagios/nagios.d/ has the "defaults" which can be overridden in
>>> /etc/nagios.d/ (for instance) (this is a convention that's growing in use)
>>> 
>>> I can try to start making a patch for this, but i'm asking about this
>>> first, in order to find out if it makes a chance of getting accepted.
>> 
>> Not only does it stand a chance of getting accepted; It's on the
>> roadmap. If you start working on it I'll help as much as I can and
>> I'm willing to accept less-than-awesome quality code, although I'll
>> polish such later if it turns out to be an issue.
>> 
>> The only real requirement is that any "main" config file option
>> should be settable from any file in the dropdir.
> 
> 
> the idea i thought might be doable, would be to allow multiple -d options to 
> nagios, and allow those to be a file or a directory.
> 
> this would keep working:
> 
> [  ]# /usr/sbin/nagios -d /etc/nagios/nagios.cfg
> 
> and allow
> 
> [  ]# /usr/sbin/nagios -d /usr/share/nagios/conf.d/ -d /etc/nagios/conf.d/
> 
> to work too.
> 
> internally, instead of one file, we'll have to have an array of files/dirs, so 
> that reloading nagios would also expand the config files/dirs into files and 
> parse them all in order...
> 
> 

remember too that -d is daemonize mode so probably need another option…

Dan

Re: [Nagios-devel] configuration directory

From: AL13N <al...@rm...> - 2013-09-03 22:34:50

Op dinsdag 3 september 2013 17:22:32 schreef Andreas Ericsson:
> On 2013-09-02 21:20, AL13N wrote:
> > Hi,
> > 
> > i'm a Mageia distribution packager, and i'm maintaining some event broker
> > modules.
> > 
> > since these are packaged seperately we're trying to get things to work
> > out-of- the-box for our users, specifically adding event brokers in the
> > configuration file for nagios.
> > 
> > the problem we're facing is that: We don't want a nagios addon package to
> > rewrite the configuration file itself. however, we would like to add the
> > event broker (or other settings) and then reload/restart nagios.
> > 
> > a beneficial thing would be a configuration directory (much like it exists
> > with hosts and commands and stuff like that) but for the main nagios
> > configuration file.
> > 
> > and if while we're at it, to have multiple directories, where
> > /usr/share/nagios/nagios.d/ has the "defaults" which can be overridden in
> > /etc/nagios.d/ (for instance) (this is a convention that's growing in use)
> > 
> > I can try to start making a patch for this, but i'm asking about this
> > first, in order to find out if it makes a chance of getting accepted.
> 
> Not only does it stand a chance of getting accepted; It's on the
> roadmap. If you start working on it I'll help as much as I can and
> I'm willing to accept less-than-awesome quality code, although I'll
> polish such later if it turns out to be an issue.
> 
> The only real requirement is that any "main" config file option
> should be settable from any file in the dropdir.


the idea i thought might be doable, would be to allow multiple -d options to 
nagios, and allow those to be a file or a directory.

this would keep working:

[  ]# /usr/sbin/nagios -d /etc/nagios/nagios.cfg

and allow

[  ]# /usr/sbin/nagios -d /usr/share/nagios/conf.d/ -d /etc/nagios/conf.d/

to work too.

internally, instead of one file, we'll have to have an array of files/dirs, so 
that reloading nagios would also expand the config files/dirs into files and 
parse them all in order...

Re: [Nagios-devel] configuration directory

From: Andreas E. <ae...@op...> - 2013-09-03 15:22:43

On 2013-09-02 21:20, AL13N wrote:
> Hi,
>
> i'm a Mageia distribution packager, and i'm maintaining some event broker
> modules.
>
> since these are packaged seperately we're trying to get things to work out-of-
> the-box for our users, specifically adding event brokers in the configuration
> file for nagios.
>
> the problem we're facing is that: We don't want a nagios addon package to
> rewrite the configuration file itself. however, we would like to add the event
> broker (or other settings) and then reload/restart nagios.
>
> a beneficial thing would be a configuration directory (much like it exists with
> hosts and commands and stuff like that) but for the main nagios configuration
> file.
>
> and if while we're at it, to have multiple directories, where
> /usr/share/nagios/nagios.d/ has the "defaults" which can be overridden in
> /etc/nagios.d/ (for instance) (this is a convention that's growing in use)
>
> I can try to start making a patch for this, but i'm asking about this first, in
> order to find out if it makes a chance of getting accepted.
>

Not only does it stand a chance of getting accepted; It's on the
roadmap. If you start working on it I'll help as much as I can and
I'm willing to accept less-than-awesome quality code, although I'll
polish such later if it turns out to be an issue.

The only real requirement is that any "main" config file option
should be settable from any file in the dropdir.

-- 
Andreas Ericsson                   and...@op...
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

[Nagios-devel] configuration directory

From: AL13N <al...@rm...> - 2013-09-02 19:37:23

Hi,

i'm a Mageia distribution packager, and i'm maintaining some event broker 
modules.

since these are packaged seperately we're trying to get things to work out-of-
the-box for our users, specifically adding event brokers in the configuration 
file for nagios.

the problem we're facing is that: We don't want a nagios addon package to 
rewrite the configuration file itself. however, we would like to add the event 
broker (or other settings) and then reload/restart nagios.

a beneficial thing would be a configuration directory (much like it exists with 
hosts and commands and stuff like that) but for the main nagios configuration 
file.

and if while we're at it, to have multiple directories, where 
/usr/share/nagios/nagios.d/ has the "defaults" which can be overridden in 
/etc/nagios.d/ (for instance) (this is a convention that's growing in use)

I can try to start making a patch for this, but i'm asking about this first, in 
order to find out if it makes a chance of getting accepted.

Regards,

Maarten

Re: [Nagios-devel] [PATCH] YES! test: build on machines with only one cpu

From: Mattias R. <mat...@op...> - 2013-08-30 12:41:03

On Fri, Aug 30, 2013 at 2:18 PM, Andreas Ericsson <ae...@op...> wrote:

> On 2013-08-29 14:42, Mattias Ryrlén wrote:
>
>> commit f0054ac856059183f6c7bc3f9e8a80**965a8c0368 (HEAD, master)
>> Author: Mattias Ryrlén <mat...@op...>
>> Date:   Thu Aug 29 14:34:39 2013 +0200
>>
>>      YES! test: build on machines with only one cpu
>>
>>      There might be someone out there who still have only one cpu/core,
>> let
>> those
>>      build too.
>>
>>      Signed-off-by: Mattias Ryrlén <mat...@op...>
>>
>>
> While I sort of agree that tests should pass on single-cpu machines as
> well, this change renders the whole test useless, since we *always*
> return > 0 from online_cpus().
>
> Adding a function (real_online_cpus()) which returns 0 when it can't
> reliably test for number of online cpus, and letting that be wrapped
> by online_cpus() so that the latter can retain its current behaviour
> would solve the problem. online_cpus() should then look like this:
>
> unsigned int online_cpus(void)
> {
>         unsigned int cpus = real_online_cpus();
>         return cpus ? cpus : 1;
> }
>
> while the new function real_online_cpus() must take the body of the
> current online_cpus() function but be changed so that the fallback is
> to return 0. The test must then be changed to use real_online_cpus()
> instead.
>
> Would you like to reroll, or should I make the necessary changes and
> credit you with an "Reported-by"?


I reroll, have enough lvl80 chars anyway..


>
>
>  diff --git a/lib/test-nsutils.c b/lib/test-nsutils.c
>> index 1a23ee1..f835d67 100644
>> --- a/lib/test-nsutils.c
>> +++ b/lib/test-nsutils.c
>> @@ -31,10 +31,10 @@ int main(int argc, char **argv)
>>          asprintf(&s1, "arg varg foo %d", 12);
>>          s2 = mkstr("arg varg foo %d", 12);
>>          ok_str(s1, s2, "mkstr() must build proper strings");
>> -       if (online_cpus() > 1) {
>> +       if (online_cpus() > 0) {
>>                  t_pass("%d online cpus detected", online_cpus());
>>          } else {
>> -               t_fail("Do you really have only one cpu core?");
>> +               t_fail("No online cpus detected");
>>          }
>>          return t_end();
>>   }
>>
>>
>
> --
> Andreas Ericsson                   and...@op...
> OP5 AB                             www.op5.se
> Tel: +46 8-230225                  Fax: +46 8-230231
>
> Considering the successes of the wars on alcohol, poverty, drugs and
> terror, I think we should give some serious thought to declaring war
> on peace.
>



-- 
Vänliga hälsningar / Best Regards
Mattias Ryrlén

__________________________
op5 AB
Första Långgatan 19
SE-413 27 Göteborg
Mobil: +46 735-17 70 99
Support: +46 31-774 09 24
www.op5.com

Re: [Nagios-devel] [PATCH] YES! test: build on machines with only one cpu

From: Andreas E. <ae...@op...> - 2013-08-30 12:19:08

On 2013-08-29 14:42, Mattias Ryrlén wrote:
> commit f0054ac856059183f6c7bc3f9e8a80965a8c0368 (HEAD, master)
> Author: Mattias Ryrlén <mat...@op...>
> Date:   Thu Aug 29 14:34:39 2013 +0200
>
>      YES! test: build on machines with only one cpu
>
>      There might be someone out there who still have only one cpu/core, let
> those
>      build too.
>
>      Signed-off-by: Mattias Ryrlén <mat...@op...>
>

While I sort of agree that tests should pass on single-cpu machines as
well, this change renders the whole test useless, since we *always*
return > 0 from online_cpus().

Adding a function (real_online_cpus()) which returns 0 when it can't
reliably test for number of online cpus, and letting that be wrapped
by online_cpus() so that the latter can retain its current behaviour
would solve the problem. online_cpus() should then look like this:

unsigned int online_cpus(void)
{
	unsigned int cpus = real_online_cpus();
	return cpus ? cpus : 1;
}

while the new function real_online_cpus() must take the body of the
current online_cpus() function but be changed so that the fallback is
to return 0. The test must then be changed to use real_online_cpus()
instead.

Would you like to reroll, or should I make the necessary changes and
credit you with an "Reported-by"?

> diff --git a/lib/test-nsutils.c b/lib/test-nsutils.c
> index 1a23ee1..f835d67 100644
> --- a/lib/test-nsutils.c
> +++ b/lib/test-nsutils.c
> @@ -31,10 +31,10 @@ int main(int argc, char **argv)
>          asprintf(&s1, "arg varg foo %d", 12);
>          s2 = mkstr("arg varg foo %d", 12);
>          ok_str(s1, s2, "mkstr() must build proper strings");
> -       if (online_cpus() > 1) {
> +       if (online_cpus() > 0) {
>                  t_pass("%d online cpus detected", online_cpus());
>          } else {
> -               t_fail("Do you really have only one cpu core?");
> +               t_fail("No online cpus detected");
>          }
>          return t_end();
>   }
>

-- 
Andreas Ericsson                   and...@op...
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

[Nagios-devel] [PATCH 2/2] wproc callbacks: Reimplement logging

From: Max S. <max...@op...> - 2013-08-27 11:39:59

From: Max Sikström <msi...@op...>

Some per-object logging was removed with previous commit, which can be useful
in many cases. This commit reimplements that logging in the related callbacks.

Using logging in the callbacks instead of centrally means that more information
can be outputted in the log, because each callback has much bigger domain
knowledge than what the worker process itself has. Using this structure also
keeps the worker API as clean as possible from domain-specific information.

Signed-off-by: Max Sikström <msi...@op...>
---
 base/checks.c        |    7 +++++
 base/notifications.c |   18 ++++--------
 base/sehandlers.c    |   73 +++++++++++++++++++++++--------------------------
 base/workers.c       |   64 +++++++++++++++++++++++++++++--------------
 include/workers.h    |   10 +++++++
 xdata/xpddefault.c   |   16 +++++++---
 6 files changed, 111 insertions(+), 77 deletions(-)

diff --git a/base/checks.c b/base/checks.c
index c5c1ff2..e81ec35 100644
--- a/base/checks.c
+++ b/base/checks.c
@@ -68,6 +68,13 @@ int reap_check_results(void) {
 static void check_complete_callback(struct wproc_result *wpres, void *data, int flags) {
 	check_result *cr = (check_result*)data;
 	if(wpres) {
+		if (cr->object_check_type == SERVICE_CHECK) {
+			wproc_handle_err(wpres, WPROC_ERROR_ALL & ~WPROC_ERROR_EXITCODE, "Service check", "Host: %s Service: %s", cr->host_name, cr->service_description);
+			}
+		else if(cr->object_check_type == HOST_CHECK) {
+			wproc_handle_err(wpres, WPROC_ERROR_ALL & ~WPROC_ERROR_EXITCODE, "Host check", "Host: %s", cr->host_name);
+			}
+
 		/* Got a result. If wpres == NULL, only clean up */
 
 		memcpy(&cr->rusage, &wpres->rusage, sizeof(wpres->rusage));
diff --git a/base/notifications.c b/base/notifications.c
index e9b6dee..7f85c83 100644
--- a/base/notifications.c
+++ b/base/notifications.c
@@ -73,21 +73,15 @@ const char *notification_reason_name(unsigned int reason_type)
 static void notification_callback(struct wproc_result *wpres, void *data, int flags) {
 	struct notification_job *oj = (struct notification_job *)data;
 	if(wpres) {
-		double runtime = ((double)wpres->runtime.tv_sec) + ((double)wpres->runtime.tv_usec) / 1000000.0;
-		if (wpres->early_timeout) {
-			if (oj->service_description) {
-				logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: Notifying contact '%s' of service '%s' on host '%s' by command '%s' timed out after %.2f seconds\n",
-					  oj->contact_name, oj->service_description,
-					  oj->host_name, wpres->command, runtime);
-			} else {
-				logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: Notifying contact '%s' of host '%s' by command '%s' timed out after %.2f seconds\n",
-					  oj->contact_name, oj->host_name,
-					  wpres->command, runtime);
+		if (oj->service_description) {
+			wproc_handle_err(wpres, WPROC_ERROR_ALL, "Service notification", "Host: %s Service: %s Contact: %s\n", oj->host_name, oj->service_description, oj->contact_name );
+			}
+		else {
+			wproc_handle_err(wpres, WPROC_ERROR_ALL, "Host notification", "Host: %s Contact: %s\n", oj->host_name, oj->contact_name );
 			}
 		}
-	}
 	free(oj);
-}
+	}
 
 /* notify contacts about a service problem or recovery */
 int service_notification(service *svc, int type, char *not_author, char *not_data, int options) {
diff --git a/base/sehandlers.c b/base/sehandlers.c
index 8df5f25..558eeb8 100644
--- a/base/sehandlers.c
+++ b/base/sehandlers.c
@@ -36,22 +36,42 @@
 #endif
 
 /******************************************************************/
-/************* OBSESSIVE COMPULSIVE HANDLER FUNCTIONS *************/
+/********************* JOB COMPLETE CALLBACKS *********************/
 /******************************************************************/
 
-static void ocsp_complete_callback(struct wproc_result *wpres, void *data, int flags) {
-	service *svc = (service *)data;
+struct eventhandler_job {
+	service *svc;
+	host *hst;
+	const char *msg;
+};
+
+static void eventhandler_job_complete(struct wproc_result *wpres, void *data, int flags) {
+	struct eventhandler_job *job = (struct eventhandler_job*)data;
 	if(wpres) {
-		if (wpres->early_timeout) {
-			double runtime = ((double)wpres->runtime.tv_sec) + ((double)wpres->runtime.tv_usec) / 1000000.0;
-			logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: OCSP command '%s' for service '%s' on host '%s' timed out after %.2f seconds\n",
-				  wpres->command, svc->description, svc->host_name,
-				  runtime);
+		if( job->svc ) {
+			wproc_handle_err(wpres, WPROC_ERROR_ALL, job->msg, "Host: %s Service: %s\n", job->svc->host_name, job->svc->description );
+			}
+		else if( job->hst ) {
+			wproc_handle_err(wpres, WPROC_ERROR_ALL, job->msg, "Host: %s\n", job->hst->name );
+			}
 		}
+	free(job);
 	}
+
+static int eventhandler_run_job(char *command_line, int timeout, host *hst, service *svc, char *msg, nagios_macros *mac) {
+	struct eventhandler_job *job = calloc(1, sizeof(struct eventhandler_job));
+	job->hst = hst;
+	job->svc = svc;
+	job->msg = msg;
+	/* run the command through a worker */
+	return wproc_run_callback(command_line, timeout, eventhandler_job_complete, (void*)job, mac);
 }
 
 
+/******************************************************************/
+/************* OBSESSIVE COMPULSIVE HANDLER FUNCTIONS *************/
+/******************************************************************/
+
 /* handles service check results in an obsessive compulsive manner... */
 int obsessive_compulsive_service_check_processor(service *svc) {
 	char *raw_command = NULL;
@@ -102,7 +122,7 @@ int obsessive_compulsive_service_check_processor(service *svc) {
 	log_debug_info(DEBUGL_CHECKS, 2, "Processed obsessive compulsive service processor command line: %s\n", processed_command);
 
 	/* run the command through a worker */
-	wproc_run_callback(processed_command, ocsp_timeout, ocsp_complete_callback, (void*)svc, &mac);
+	eventhandler_run_job(processed_command, ocsp_timeout, svc->host_ptr, svc, "OCSP", &mac);
 
 	/* free memory */
 	clear_volatile_macros_r(&mac);
@@ -112,16 +132,6 @@ int obsessive_compulsive_service_check_processor(service *svc) {
 	}
 
 
-static void ochp_complete_callback(struct wproc_result *wpres, void *data, int flags) {
-	host *hst = (host *)data;
-	if(wpres) {
-		if (wpres->early_timeout) {
-			double runtime = ((double)wpres->runtime.tv_sec) + ((double)wpres->runtime.tv_usec) / 1000000.0;
-			logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: OCHP command '%s' for host '%s' timed out after %.2f seconds\n",
-				  wpres->command, hst->name, runtime);
-		}
-	}
-}
 
 /* handles host check results in an obsessive compulsive manner... */
 int obsessive_compulsive_host_check_processor(host *hst) {
@@ -167,7 +177,7 @@ int obsessive_compulsive_host_check_processor(host *hst) {
 	log_debug_info(DEBUGL_CHECKS, 2, "Processed obsessive compulsive host processor command line: %s\n", processed_command);
 
 	/* run the command through a worker */
-	wproc_run_callback(processed_command, ochp_timeout, ochp_complete_callback, hst, &mac);
+	eventhandler_run_job(processed_command, ochp_timeout, hst, NULL, "OCHP", &mac);
 
 	/* free memory */
 	clear_volatile_macros_r(&mac);
@@ -183,17 +193,6 @@ int obsessive_compulsive_host_check_processor(host *hst) {
 /**************** SERVICE EVENT HANDLER FUNCTIONS *****************/
 /******************************************************************/
 
-static void evthandler_complete_with_msg(struct wproc_result *wpres, void *data, int flags) {
-	char *msg = (char *)data;
-	if(wpres) {
-		if (wpres->early_timeout) {
-			double runtime = ((double)wpres->runtime.tv_sec) + ((double)wpres->runtime.tv_usec) / 1000000.0;
-			logit(NSLOG_EVENT_HANDLER | NSLOG_RUNTIME_WARNING, TRUE, msg,
-					wpres->command, runtime);
-		}
-	}
-}
-
 
 /* handles changes in the state of a service */
 int handle_service_event(service *svc) {
@@ -313,8 +312,7 @@ int run_global_service_event_handler(nagios_macros *mac, service *svc) {
 #endif
 
 	/* run the command through a worker */
-	result = wproc_run_callback(processed_command, event_handler_timeout, evthandler_complete_with_msg,
-			"Warning: Global service event handler command '%s' timed out after %.2f seconds\n" , mac);
+	result = eventhandler_run_job(processed_command, event_handler_timeout, svc->host_ptr, svc, "global service event handler", mac);
 
 #ifdef USE_EVENT_BROKER
 	/* get end time */
@@ -406,8 +404,7 @@ int run_service_event_handler(nagios_macros *mac, service *svc) {
 #endif
 
 	/* run the command through a worker */
-	result = wproc_run_callback(processed_command, event_handler_timeout, evthandler_complete_with_msg,
-			"Warning: Service event handler command '%s' timed out after %.2f seconds\n" , mac);
+	result = eventhandler_run_job(processed_command, event_handler_timeout, svc->host_ptr, svc, "service event handler", mac);
 
 #ifdef USE_EVENT_BROKER
 	/* get end time */
@@ -543,8 +540,7 @@ int run_global_host_event_handler(nagios_macros *mac, host *hst) {
 #endif
 
 	/* run the command through a worker */
-	result = wproc_run_callback(processed_command, event_handler_timeout, evthandler_complete_with_msg,
-			"Warning: Global host event handler command '%s' timed out after %.2f seconds\n" , mac);
+	result = eventhandler_run_job(processed_command, event_handler_timeout, hst, NULL, "global host event handler", mac);
 
 #ifdef USE_EVENT_BROKER
 	/* get end time */
@@ -634,8 +630,7 @@ int run_host_event_handler(nagios_macros *mac, host *hst) {
 #endif
 
 	/* run the command through a worker */
-	result = wproc_run_callback(processed_command, event_handler_timeout, evthandler_complete_with_msg,
-			"Warning: Host event handler command '%s' timed out after %.2f seconds\n" , mac);
+	result = eventhandler_run_job(processed_command, event_handler_timeout, hst, NULL, "host event handler", mac);
 
 #ifdef USE_EVENT_BROKER
 	/* get end time */
diff --git a/base/workers.c b/base/workers.c
index a6db4f5..2c773b3 100644
--- a/base/workers.c
+++ b/base/workers.c
@@ -492,7 +492,7 @@ static void fo_reassign_wproc_job(void *job_)
 
 static int handle_worker_result(int sd, int events, void *arg)
 {
-	char *buf, *error_reason = NULL;
+	char *buf;
 	unsigned long size;
 	int ret;
 	static struct kvvec kvv = KVVEC_INITIALIZER;
@@ -563,26 +563,6 @@ static int handle_worker_result(int sd, int events, void *arg)
 		if (wpres.error_code == ETIME) {
 			wpres.early_timeout = TRUE;
 		}
-		if (wpres.early_timeout) {
-			asprintf(&error_reason, "timed out after %.2fs", tv_delta_f(&wpres.start, &wpres.stop));
-		}
-		else if (WIFSIGNALED(wpres.wait_status)) {
-			asprintf(&error_reason, "died by signal %d%s after %.2f seconds",
-			         WTERMSIG(wpres.wait_status),
-			         WCOREDUMP(wpres.wait_status) ? " (core dumped)" : "",
-			         tv_delta_f(&wpres.start, &wpres.stop));
-		}
-
-		if (error_reason) {
-			logit(NSLOG_RUNTIME_ERROR, TRUE, "wproc: job %d from worker %s %s",
-			      job->id, wp->name, error_reason);
-			logit(NSLOG_RUNTIME_ERROR, TRUE, "wproc:   command: %s\n", job->command);
-			logit(NSLOG_RUNTIME_ERROR, TRUE, "wproc:   early_timeout=%d; exited_ok=%d; wait_status=%d; error_code=%d;\n",
-			      wpres.early_timeout, wpres.exited_ok, wpres.wait_status, wpres.error_code);
-			wproc_logdump_buffer(NSLOG_RUNTIME_ERROR, TRUE, "wproc:   stderr", wpres.outerr);
-			wproc_logdump_buffer(NSLOG_RUNTIME_ERROR, TRUE, "wproc:   stdout", wpres.outstd);
-		}
-		my_free(error_reason);
 
 		/* Run the callback with the wpres struct as argument */
 		run_job_callback(job, &wpres, 0);
@@ -781,6 +761,48 @@ int init_workers(int desired_workers)
 	return 0;
 }
 
+int wproc_handle_err(struct wproc_result *wpres, int errflags,  const char *job_type, const char *fmt, ...) {
+	char *error_reason = NULL;
+	char *buffer = NULL;
+	va_list ap;
+
+	if (wpres->early_timeout && (errflags & WPROC_ERROR_EARLYTIMEOUT)) {
+		asprintf(&error_reason, "timed out after %.2fs", tv_delta_f(&wpres->start, &wpres->stop));
+		}
+	else if (WIFSIGNALED(wpres->wait_status) && (errflags & WPROC_ERROR_SIGNAL)) {
+		asprintf(&error_reason, "died by signal %d%s after %.2f seconds",
+			WTERMSIG(wpres->wait_status),
+			WCOREDUMP(wpres->wait_status) ? " (core dumped)" : "",
+			tv_delta_f(&wpres->start, &wpres->stop));
+		}
+    else if (WEXITSTATUS(wpres->wait_status) != 0 && (errflags & WPROC_ERROR_EXITCODE)) {
+		asprintf(&error_reason, "Exited with return code %d",
+				WEXITSTATUS(wpres->wait_status));
+		}
+
+	if (error_reason) {
+		logit(NSLOG_RUNTIME_ERROR, TRUE, "wproc: %s job %d: %s", job_type, wpres->job_id, error_reason);
+		logit(NSLOG_RUNTIME_ERROR, TRUE, "wproc:   command: %s\n", wpres->command);
+		logit(NSLOG_RUNTIME_ERROR, TRUE, "wproc:   early_timeout=%d; exited_ok=%d; wait_status=%d; error_code=%d;\n",
+		wpres->early_timeout, wpres->exited_ok, wpres->wait_status, wpres->error_code);
+
+		if(fmt) {
+			va_start(ap, fmt);
+			if(vasprintf(&buffer, fmt, ap) > 0) {
+				logit(NSLOG_RUNTIME_ERROR, TRUE, "wproc:   %s\n", buffer);
+				free(buffer);
+				}
+			va_end(ap);
+			}
+
+		wproc_logdump_buffer(NSLOG_RUNTIME_ERROR, TRUE, "wproc:   stderr", wpres->outerr);
+		wproc_logdump_buffer(NSLOG_RUNTIME_ERROR, TRUE, "wproc:   stdout", wpres->outstd);
+		my_free(error_reason);
+		return 1;
+		}
+	return 0;
+	}
+
 /*
  * Handles adding the command and macros to the kvvec,
  * as well as shipping the command off to a designated
diff --git a/include/workers.h b/include/workers.h
index cf28287..172e7b4 100644
--- a/include/workers.h
+++ b/include/workers.h
@@ -5,6 +5,12 @@
 
 #define WPROC_FORCE  (1 << 0)
 
+#define WPROC_ERROR_EXITCODE (1 << 0)
+#define WPROC_ERROR_EARLYTIMEOUT (1 << 1)
+#define WPROC_ERROR_SIGNAL (1 << 3)
+
+#define WPROC_ERROR_ALL ((1 << 4)-1)
+
 NAGIOS_BEGIN_DECL;
 
 /* FIXME: Needs to be somewhere better... */
@@ -36,6 +42,10 @@ extern int wproc_can_spawn(struct load_control *lc);
 extern void free_worker_memory(int flags);
 extern int workers_alive(void);
 extern int init_workers(int desired_workers);
+
+extern int wproc_handle_err(struct wproc_result *wpres, int errflags, const char *job_type, const char *fmt, ...)
+	__attribute__((__format__(__printf__, 4, 5)));
+
 extern int wproc_run_callback(char *command_line, int timeout, void (*cb)(struct wproc_result *, void *, int), void *data, nagios_macros *mac);
 
 NAGIOS_END_DECL;
diff --git a/xdata/xpddefault.c b/xdata/xpddefault.c
index 94bddbf..b8913ef 100644
--- a/xdata/xpddefault.c
+++ b/xdata/xpddefault.c
@@ -42,11 +42,17 @@ static int     host_perfdata_fd = -1;
 static int     service_perfdata_fd = -1;
 
 /******************************************************************/
-/********************** CHECK COMPLETE CALLBACK *******************/
+/********* PERFORMANCE DATA PROCESSING COMPLETE CALLBACK **********/
 /******************************************************************/
 
-static void perfdata_command_callback(struct wproc_result *wpres, void *data, int flags) {
-	/* Dummy method as callback. we don't care about the result for now */
+static void perfdata_service_callback(struct wproc_result *wpres, void *data, int flags) {
+	service *svc = (service*)data;
+	wproc_handle_err(wpres, WPROC_ERROR_ALL, "Service perfdata processing", "Host: %s Service: %s\n", svc->host_name, svc->description );
+}
+
+static void perfdata_host_callback(struct wproc_result *wpres, void *data, int flags) {
+	host *hst = (host*)data;
+	wproc_handle_err(wpres, WPROC_ERROR_ALL, "Host perfdata processing", "Host: %s\n", hst->name );
 }
 
 /******************************************************************/
@@ -345,7 +351,7 @@ int xpddefault_run_service_performance_data_command(nagios_macros *mac, service
 	log_debug_info(DEBUGL_PERFDATA, 2, "Processed service performance data command line: %s\n", processed_command_line);
 
 	/* run the command */
-	wproc_run_callback(processed_command_line, perfdata_timeout, perfdata_command_callback, NULL, NULL);
+	wproc_run_callback(processed_command_line, perfdata_timeout, perfdata_service_callback, (void*)svc, NULL);
 
 	/* free memory */
 	my_free(processed_command_line);
@@ -386,7 +392,7 @@ int xpddefault_run_host_performance_data_command(nagios_macros *mac, host *hst)
 	log_debug_info(DEBUGL_PERFDATA, 2, "Processed host performance data command line: %s\n", processed_command_line);
 
 	/* run the command */
-	wproc_run_callback(processed_command_line, perfdata_timeout, perfdata_command_callback, NULL, NULL);
+	wproc_run_callback(processed_command_line, perfdata_timeout, perfdata_host_callback, (void*)hst, NULL);
 
 	/* free memory */
 	my_free(processed_command_line);
-- 
1.7.1

Re: [Nagios-devel] [PATCH] wproc: Remove depencency for other parts of system

From: Andreas E. <ae...@op...> - 2013-08-26 09:35:23

On 2013-08-23 10:28, Max Sikstrom wrote:
> From: Max Sikström <msi...@op...>
>
> A worker is a closed defined system that executes commands, and distributes it
> over several processes, and does it quite well.
>
> Because of the nature of a job to be asyncrhonous, the simplest way is to let
> worker handler job types to know what should be done when finished; start job,
> do other things, do some stuff when job is done, dependent on what job type it
> was.
>
> But that concept doesn't scale well when new kinds of jobs is needed, for
> example through nebmods, or new systems in the system. The worker code gets
> cluttered with non-worker related functionality, and some logic about the other
> parts functionality needs to be done within the worker module.
>

This part of the message doesn't make much sense without changes being
isolated mostly to lib/worker.c. Everything else relates to how the
scheduler handles communication with the workers, and has nothing to do
with actual workers as such.

> The concept of callbacks and pointers to functions is a much cleaner way to
> handle job results. Each module that uses the worker can define how to handle
> the job result itself instead. No special cases is needed in the worker code.
>

Except there's no single-point-of-entry where we can log errors in case
we do that, so each handler would need its own messages for that. I'm
not exactly thrilled about that.

So while we might be gaining cleaner code, we're losing error logging of
helpers that for various reasons fail to do their job properly, or get
killed by various signals. That's a real benefit to have, and "realler"
than "clean code", so NAK in its current shape and form.

-- 
Andreas Ericsson                   and...@op...
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

[Nagios-devel] Nagios Core 3.5.1rc1

From: Eric S. <est...@na...> - 2013-08-24 15:05:08

I've created a tarball with a release candidate for Nagios Core 3.5.1. 

The tarball can be downloaded from http://sourceforge.net/projects/nagios/files/nagios-3.x/nagios-3.5.1/nagios-3.5.1rc1.tar.gz/download.

Feel free to compile it and try it (on a test system, please). If I don't hear of any serious issues by the end of next week, I'll release it.

The change log is as follows:

* Added handler for SIGXFSZ signal (Eric Stanley)
* Fixed bug #444: Nagios 3.5.0 problem with macro $ADMINEMAIL$ : @ is converted to %40 after 2 nagios reload (Duplicate of bug #407)
* Fixed bug #407: Reloading nagios config causes spaces in notifications to become plus signs (Alexey Dvoryanchikov)
* Fixed bug #445: Adding triggered downtime for child hosts causes a SIGSEGV on restart/reload (Eric Stanley)
* Fixed bug #375: Freshness expiration never reached and bug #427: freshness threshold doesn't work if it is set long (Scott Wilkerson, Eric Stanley)
* Fixed bug #432: Downtime scheduled as "Nagios Process" and not the Users name (Sam Lansing, Eric Stanley)

-- 
Eric Stanley
___
Developer
Nagios Enterprises, LLC
Email:  est...@na...
Web:    www.nagios.com

[Nagios-devel] [PATCH] wproc: Remove depencency for other parts of system

From: Max S. <max...@op...> - 2013-08-23 08:29:24

From: Max Sikström <msi...@op...>

A worker is a closed defined system that executes commands, and distributes it
over several processes, and does it quite well.

Because of the nature of a job to be asyncrhonous, the simplest way is to let
worker handler job types to know what should be done when finished; start job,
do other things, do some stuff when job is done, dependent on what job type it
was.

But that concept doesn't scale well when new kinds of jobs is needed, for
example through nebmods, or new systems in the system. The worker code gets
cluttered with non-worker related functionality, and some logic about the other
parts functionality needs to be done within the worker module.

The concept of callbacks and pointers to functions is a much cleaner way to
handle job results. Each module that uses the worker can define how to handle
the job result itself instead. No special cases is needed in the worker code.

Reviewed-by: Robin Sonefors <rso...@op...>
Signed-off-by: Max Sikström <msi...@op...>
---
 base/checks.c        |   45 +++++++-
 base/notifications.c |   38 ++++++-
 base/sehandlers.c    |   96 +++++++++--------
 base/workers.c       |  301 +++----------------------------------------------
 include/workers.h    |   24 +----
 xdata/xpddefault.c   |   11 ++-
 6 files changed, 160 insertions(+), 355 deletions(-)

diff --git a/base/checks.c b/base/checks.c
index 76a9b7d..c5c1ff2 100644
--- a/base/checks.c
+++ b/base/checks.c
@@ -61,6 +61,47 @@ int reap_check_results(void) {
 
 
 
+/******************************************************************/
+/********************** CHECK COMPLETE CALLBACK *******************/
+/******************************************************************/
+
+static void check_complete_callback(struct wproc_result *wpres, void *data, int flags) {
+	check_result *cr = (check_result*)data;
+	if(wpres) {
+		/* Got a result. If wpres == NULL, only clean up */
+
+		memcpy(&cr->rusage, &wpres->rusage, sizeof(wpres->rusage));
+		cr->start_time.tv_sec = wpres->start.tv_sec;
+		cr->start_time.tv_usec = wpres->start.tv_usec;
+		cr->finish_time.tv_sec = wpres->stop.tv_sec;
+		cr->finish_time.tv_usec = wpres->stop.tv_usec;
+		if (WIFEXITED(wpres->wait_status)) {
+			cr->return_code = WEXITSTATUS(wpres->wait_status);
+		} else {
+			cr->return_code = STATE_UNKNOWN;
+		}
+
+		if (wpres->outstd && *wpres->outstd) {
+			cr->output = strdup(wpres->outstd);
+		} else if (wpres->outerr) {
+			asprintf(&cr->output, "(No output on stdout) stderr: %s", wpres->outerr);
+		} else {
+			cr->output = NULL;
+		}
+
+		cr->early_timeout = wpres->early_timeout;
+		cr->exited_ok = wpres->exited_ok;
+		cr->engine = NULL;
+		cr->source = "something"; /* wp->name; */
+
+		process_check_result(cr);
+	}
+
+	/* Always cleanup */
+	free_check_result(cr);
+	free(cr);
+}
+
 
 /******************************************************************/
 /****************** SERVICE MONITORING FUNCTIONS ******************/
@@ -293,7 +334,7 @@ int run_async_service_check(service *svc, int check_options, double latency, int
 	svc->latency = old_latency;
 
 	/* paw off the check to a worker to run */
-	runchk_result = wproc_run_check(cr, processed_command, &mac);
+	runchk_result = wproc_run_callback(processed_command, service_check_timeout, check_complete_callback, (void*)cr, &mac);
 	if (runchk_result == ERROR) {
 		logit(NSLOG_RUNTIME_ERROR, TRUE, "Unable to run check for service '%s' on host '%s'\n", svc->description, svc->host_name);
 	}
@@ -2078,7 +2119,7 @@ int run_async_host_check(host *hst, int check_options, double latency, int sched
 	/* reset latency (permanent value for this check will get set later) */
 	hst->latency = old_latency;
 
-	runchk_result = wproc_run_check(cr, processed_command, &mac);
+	runchk_result = wproc_run_callback(processed_command, service_check_timeout, check_complete_callback, (void*)cr, &mac);
 	if (runchk_result == ERROR) {
 		logit(NSLOG_RUNTIME_ERROR, TRUE, "Unable to send check for host '%s' to worker (ret=%d)\n", hst->name, runchk_result);
 	} else {
diff --git a/base/notifications.c b/base/notifications.c
index 13ae651..e9b6dee 100644
--- a/base/notifications.c
+++ b/base/notifications.c
@@ -30,6 +30,13 @@
 #include "../include/neberrors.h"
 #include "../include/workers.h"
 
+
+struct notification_job {
+	char *contact_name;
+	char *host_name;
+	char *service_description;
+};
+
 /*** silly helpers ****/
 static contact *find_contact_by_name_or_alias(const char *name)
 {
@@ -59,11 +66,28 @@ const char *notification_reason_name(unsigned int reason_type)
 	return "(unknown)";
 }
 
-
 /******************************************************************/
 /***************** SERVICE NOTIFICATION FUNCTIONS *****************/
 /******************************************************************/
 
+static void notification_callback(struct wproc_result *wpres, void *data, int flags) {
+	struct notification_job *oj = (struct notification_job *)data;
+	if(wpres) {
+		double runtime = ((double)wpres->runtime.tv_sec) + ((double)wpres->runtime.tv_usec) / 1000000.0;
+		if (wpres->early_timeout) {
+			if (oj->service_description) {
+				logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: Notifying contact '%s' of service '%s' on host '%s' by command '%s' timed out after %.2f seconds\n",
+					  oj->contact_name, oj->service_description,
+					  oj->host_name, wpres->command, runtime);
+			} else {
+				logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: Notifying contact '%s' of host '%s' by command '%s' timed out after %.2f seconds\n",
+					  oj->contact_name, oj->host_name,
+					  wpres->command, runtime);
+			}
+		}
+	}
+	free(oj);
+}
 
 /* notify contacts about a service problem or recovery */
 int service_notification(service *svc, int type, char *not_author, char *not_data, int options) {
@@ -798,7 +822,11 @@ int notify_contact_of_service(nagios_macros *mac, contact *cntct, service *svc,
 			}
 
 		/* run the notification command */
-		wproc_notify(cntct->name, svc->host_name, svc->description, processed_command, mac);
+		struct notification_job *oj = calloc(1, sizeof(struct notification_job));
+		oj->contact_name = cntct->name;
+		oj->host_name = svc->host_name;
+		oj->service_description = svc->description;
+		wproc_run_callback(processed_command, notification_timeout, notification_callback, (void*)oj, mac);
 
 		/* free memory */
 		my_free(command_name);
@@ -1692,7 +1720,11 @@ int notify_contact_of_host(nagios_macros *mac, contact *cntct, host *hst, int ty
 			}
 
 		/* run the notification command */
-		wproc_notify(cntct->name, hst->name, NULL, processed_command, mac);
+		struct notification_job *oj = calloc(1, sizeof(struct notification_job));
+		oj->contact_name = cntct->name;
+		oj->host_name = hst->name;
+		oj->service_description = NULL;
+		wproc_run_callback(processed_command, notification_timeout, notification_callback, (void*)oj, mac);
 
 		/* @todo Handle nebmod stuff when getting results from workers */
 
diff --git a/base/sehandlers.c b/base/sehandlers.c
index c51d904..8df5f25 100644
--- a/base/sehandlers.c
+++ b/base/sehandlers.c
@@ -39,6 +39,18 @@
 /************* OBSESSIVE COMPULSIVE HANDLER FUNCTIONS *************/
 /******************************************************************/
 
+static void ocsp_complete_callback(struct wproc_result *wpres, void *data, int flags) {
+	service *svc = (service *)data;
+	if(wpres) {
+		if (wpres->early_timeout) {
+			double runtime = ((double)wpres->runtime.tv_sec) + ((double)wpres->runtime.tv_usec) / 1000000.0;
+			logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: OCSP command '%s' for service '%s' on host '%s' timed out after %.2f seconds\n",
+				  wpres->command, svc->description, svc->host_name,
+				  runtime);
+		}
+	}
+}
+
 
 /* handles service check results in an obsessive compulsive manner... */
 int obsessive_compulsive_service_check_processor(service *svc) {
@@ -90,7 +102,7 @@ int obsessive_compulsive_service_check_processor(service *svc) {
 	log_debug_info(DEBUGL_CHECKS, 2, "Processed obsessive compulsive service processor command line: %s\n", processed_command);
 
 	/* run the command through a worker */
-	wproc_run_service_job(WPJOB_OCSP, ocsp_timeout, svc, processed_command, &mac);
+	wproc_run_callback(processed_command, ocsp_timeout, ocsp_complete_callback, (void*)svc, &mac);
 
 	/* free memory */
 	clear_volatile_macros_r(&mac);
@@ -100,6 +112,16 @@ int obsessive_compulsive_service_check_processor(service *svc) {
 	}
 
 
+static void ochp_complete_callback(struct wproc_result *wpres, void *data, int flags) {
+	host *hst = (host *)data;
+	if(wpres) {
+		if (wpres->early_timeout) {
+			double runtime = ((double)wpres->runtime.tv_sec) + ((double)wpres->runtime.tv_usec) / 1000000.0;
+			logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: OCHP command '%s' for host '%s' timed out after %.2f seconds\n",
+				  wpres->command, hst->name, runtime);
+		}
+	}
+}
 
 /* handles host check results in an obsessive compulsive manner... */
 int obsessive_compulsive_host_check_processor(host *hst) {
@@ -145,7 +167,7 @@ int obsessive_compulsive_host_check_processor(host *hst) {
 	log_debug_info(DEBUGL_CHECKS, 2, "Processed obsessive compulsive host processor command line: %s\n", processed_command);
 
 	/* run the command through a worker */
-	wproc_run_host_job(WPJOB_OCHP, ochp_timeout, hst, processed_command, &mac);
+	wproc_run_callback(processed_command, ochp_timeout, ochp_complete_callback, hst, &mac);
 
 	/* free memory */
 	clear_volatile_macros_r(&mac);
@@ -161,6 +183,17 @@ int obsessive_compulsive_host_check_processor(host *hst) {
 /**************** SERVICE EVENT HANDLER FUNCTIONS *****************/
 /******************************************************************/
 
+static void evthandler_complete_with_msg(struct wproc_result *wpres, void *data, int flags) {
+	char *msg = (char *)data;
+	if(wpres) {
+		if (wpres->early_timeout) {
+			double runtime = ((double)wpres->runtime.tv_sec) + ((double)wpres->runtime.tv_usec) / 1000000.0;
+			logit(NSLOG_EVENT_HANDLER | NSLOG_RUNTIME_WARNING, TRUE, msg,
+					wpres->command, runtime);
+		}
+	}
+}
+
 
 /* handles changes in the state of a service */
 int handle_service_event(service *svc) {
@@ -212,7 +245,6 @@ int run_global_service_event_handler(nagios_macros *mac, service *svc) {
 	char *raw_logentry = NULL;
 	char *processed_logentry = NULL;
 	char *command_output = NULL;
-	int early_timeout = FALSE;
 	double exectime = 0.0;
 	int result = 0;
 #ifdef USE_EVENT_BROKER
@@ -269,7 +301,7 @@ int run_global_service_event_handler(nagios_macros *mac, service *svc) {
 	/* send event data to broker */
 	end_time.tv_sec = 0L;
 	end_time.tv_usec = 0L;
-	neb_result = broker_event_handler(NEBTYPE_EVENTHANDLER_START, NEBFLAG_NONE, NEBATTR_NONE, GLOBAL_SERVICE_EVENTHANDLER, (void *)svc, svc->current_state, svc->state_type, start_time, end_time, exectime, event_handler_timeout, early_timeout, result, global_service_event_handler, processed_command, NULL, NULL);
+	neb_result = broker_event_handler(NEBTYPE_EVENTHANDLER_START, NEBFLAG_NONE, NEBATTR_NONE, GLOBAL_SERVICE_EVENTHANDLER, (void *)svc, svc->current_state, svc->state_type, start_time, end_time, exectime, event_handler_timeout, FALSE, result, global_service_event_handler, processed_command, NULL, NULL);
 
 	/* neb module wants to override (or cancel) the event handler - perhaps it will run the eventhandler itself */
 	if(neb_result == NEBERROR_CALLBACKOVERRIDE) {
@@ -281,21 +313,15 @@ int run_global_service_event_handler(nagios_macros *mac, service *svc) {
 #endif
 
 	/* run the command through a worker */
-	/* XXX FIXME make base/workers.c handle the eventbroker stuff below */
-	result = wproc_run(WPJOB_GLOBAL_SVC_EVTHANDLER, processed_command, event_handler_timeout, mac);
-
-	/* check to see if the event handler timed out */
-	if(early_timeout == TRUE)
-		logit(NSLOG_EVENT_HANDLER | NSLOG_RUNTIME_WARNING, TRUE, "Warning: Global service event handler command '%s' timed out after %d seconds\n", processed_command, event_handler_timeout);
+	result = wproc_run_callback(processed_command, event_handler_timeout, evthandler_complete_with_msg,
+			"Warning: Global service event handler command '%s' timed out after %.2f seconds\n" , mac);
 
 #ifdef USE_EVENT_BROKER
 	/* get end time */
 	gettimeofday(&end_time, NULL);
-#endif
 
-#ifdef USE_EVENT_BROKER
 	/* send event data to broker */
-	broker_event_handler(NEBTYPE_EVENTHANDLER_END, NEBFLAG_NONE, NEBATTR_NONE, GLOBAL_SERVICE_EVENTHANDLER, (void *)svc, svc->current_state, svc->state_type, start_time, end_time, exectime, event_handler_timeout, early_timeout, result, global_service_event_handler, processed_command, command_output, NULL);
+	broker_event_handler(NEBTYPE_EVENTHANDLER_END, NEBFLAG_NONE, NEBATTR_NONE, GLOBAL_SERVICE_EVENTHANDLER, (void *)svc, svc->current_state, svc->state_type, start_time, end_time, exectime, event_handler_timeout, FALSE, result, global_service_event_handler, processed_command, command_output, NULL);
 #endif
 
 	/* free memory */
@@ -316,7 +342,6 @@ int run_service_event_handler(nagios_macros *mac, service *svc) {
 	char *raw_logentry = NULL;
 	char *processed_logentry = NULL;
 	char *command_output = NULL;
-	int early_timeout = FALSE;
 	double exectime = 0.0;
 	int result = 0;
 #ifdef USE_EVENT_BROKER
@@ -369,7 +394,7 @@ int run_service_event_handler(nagios_macros *mac, service *svc) {
 	/* send event data to broker */
 	end_time.tv_sec = 0L;
 	end_time.tv_usec = 0L;
-	neb_result = broker_event_handler(NEBTYPE_EVENTHANDLER_START, NEBFLAG_NONE, NEBATTR_NONE, SERVICE_EVENTHANDLER, (void *)svc, svc->current_state, svc->state_type, start_time, end_time, exectime, event_handler_timeout, early_timeout, result, svc->event_handler, processed_command, NULL, NULL);
+	neb_result = broker_event_handler(NEBTYPE_EVENTHANDLER_START, NEBFLAG_NONE, NEBATTR_NONE, SERVICE_EVENTHANDLER, (void *)svc, svc->current_state, svc->state_type, start_time, end_time, exectime, event_handler_timeout, FALSE, result, svc->event_handler, processed_command, NULL, NULL);
 
 	/* neb module wants to override (or cancel) the event handler - perhaps it will run the eventhandler itself */
 	if(neb_result == NEBERROR_CALLBACKOVERRIDE) {
@@ -381,21 +406,15 @@ int run_service_event_handler(nagios_macros *mac, service *svc) {
 #endif
 
 	/* run the command through a worker */
-	/* XXX FIXME make base/workers.c handle the eventbroker stuff below */
-	result = wproc_run(WPJOB_SVC_EVTHANDLER, processed_command, event_handler_timeout, mac);
-
-	/* check to see if the event handler timed out */
-	if(early_timeout == TRUE)
-		logit(NSLOG_EVENT_HANDLER | NSLOG_RUNTIME_WARNING, TRUE, "Warning: Service event handler command '%s' timed out after %d seconds\n", processed_command, event_handler_timeout);
+	result = wproc_run_callback(processed_command, event_handler_timeout, evthandler_complete_with_msg,
+			"Warning: Service event handler command '%s' timed out after %.2f seconds\n" , mac);
 
 #ifdef USE_EVENT_BROKER
 	/* get end time */
 	gettimeofday(&end_time, NULL);
-#endif
 
-#ifdef USE_EVENT_BROKER
 	/* send event data to broker */
-	broker_event_handler(NEBTYPE_EVENTHANDLER_END, NEBFLAG_NONE, NEBATTR_NONE, SERVICE_EVENTHANDLER, (void *)svc, svc->current_state, svc->state_type, start_time, end_time, exectime, event_handler_timeout, early_timeout, result, svc->event_handler, processed_command, command_output, NULL);
+	broker_event_handler(NEBTYPE_EVENTHANDLER_END, NEBFLAG_NONE, NEBATTR_NONE, SERVICE_EVENTHANDLER, (void *)svc, svc->current_state, svc->state_type, start_time, end_time, exectime, event_handler_timeout, FALSE, result, svc->event_handler, processed_command, command_output, NULL);
 #endif
 
 	/* free memory */
@@ -457,7 +476,6 @@ int run_global_host_event_handler(nagios_macros *mac, host *hst) {
 	char *raw_logentry = NULL;
 	char *processed_logentry = NULL;
 	char *command_output = NULL;
-	int early_timeout = FALSE;
 	double exectime = 0.0;
 	int result = 0;
 #ifdef USE_EVENT_BROKER
@@ -513,7 +531,7 @@ int run_global_host_event_handler(nagios_macros *mac, host *hst) {
 	/* send event data to broker */
 	end_time.tv_sec = 0L;
 	end_time.tv_usec = 0L;
-	neb_result = broker_event_handler(NEBTYPE_EVENTHANDLER_START, NEBFLAG_NONE, NEBATTR_NONE, GLOBAL_HOST_EVENTHANDLER, (void *)hst, hst->current_state, hst->state_type, start_time, end_time, exectime, event_handler_timeout, early_timeout, result, global_host_event_handler, processed_command, NULL, NULL);
+	neb_result = broker_event_handler(NEBTYPE_EVENTHANDLER_START, NEBFLAG_NONE, NEBATTR_NONE, GLOBAL_HOST_EVENTHANDLER, (void *)hst, hst->current_state, hst->state_type, start_time, end_time, exectime, event_handler_timeout, FALSE, result, global_host_event_handler, processed_command, NULL, NULL);
 
 	/* neb module wants to override (or cancel) the event handler - perhaps it will run the eventhandler itself */
 	if(neb_result == NEBERROR_CALLBACKOVERRIDE) {
@@ -525,21 +543,15 @@ int run_global_host_event_handler(nagios_macros *mac, host *hst) {
 #endif
 
 	/* run the command through a worker */
-	/* XXX FIXME make base/workers.c handle the eventbroker stuff below */
-	wproc_run(WPJOB_GLOBAL_HOST_EVTHANDLER, processed_command, event_handler_timeout, mac);
-
-	/* check for a timeout in the execution of the event handler command */
-	if(early_timeout == TRUE)
-		logit(NSLOG_EVENT_HANDLER | NSLOG_RUNTIME_WARNING, TRUE, "Warning: Global host event handler command '%s' timed out after %d seconds\n", processed_command, event_handler_timeout);
+	result = wproc_run_callback(processed_command, event_handler_timeout, evthandler_complete_with_msg,
+			"Warning: Global host event handler command '%s' timed out after %.2f seconds\n" , mac);
 
 #ifdef USE_EVENT_BROKER
 	/* get end time */
 	gettimeofday(&end_time, NULL);
-#endif
 
-#ifdef USE_EVENT_BROKER
 	/* send event data to broker */
-	broker_event_handler(NEBTYPE_EVENTHANDLER_END, NEBFLAG_NONE, NEBATTR_NONE, GLOBAL_HOST_EVENTHANDLER, (void *)hst, hst->current_state, hst->state_type, start_time, end_time, exectime, event_handler_timeout, early_timeout, result, global_host_event_handler, processed_command, command_output, NULL);
+	broker_event_handler(NEBTYPE_EVENTHANDLER_END, NEBFLAG_NONE, NEBATTR_NONE, GLOBAL_HOST_EVENTHANDLER, (void *)hst, hst->current_state, hst->state_type, start_time, end_time, exectime, event_handler_timeout, FALSE, result, global_host_event_handler, processed_command, command_output, NULL);
 #endif
 
 	/* free memory */
@@ -559,7 +571,6 @@ int run_host_event_handler(nagios_macros *mac, host *hst) {
 	char *raw_logentry = NULL;
 	char *processed_logentry = NULL;
 	char *command_output = NULL;
-	int early_timeout = FALSE;
 	double exectime = 0.0;
 	int result = 0;
 #ifdef USE_EVENT_BROKER
@@ -611,7 +622,7 @@ int run_host_event_handler(nagios_macros *mac, host *hst) {
 	/* send event data to broker */
 	end_time.tv_sec = 0L;
 	end_time.tv_usec = 0L;
-	neb_result = broker_event_handler(NEBTYPE_EVENTHANDLER_START, NEBFLAG_NONE, NEBATTR_NONE, HOST_EVENTHANDLER, (void *)hst, hst->current_state, hst->state_type, start_time, end_time, exectime, event_handler_timeout, early_timeout, result, hst->event_handler, processed_command, NULL, NULL);
+	neb_result = broker_event_handler(NEBTYPE_EVENTHANDLER_START, NEBFLAG_NONE, NEBATTR_NONE, HOST_EVENTHANDLER, (void *)hst, hst->current_state, hst->state_type, start_time, end_time, exectime, event_handler_timeout, FALSE, result, hst->event_handler, processed_command, NULL, NULL);
 
 	/* neb module wants to override (or cancel) the event handler - perhaps it will run the eventhandler itself */
 	if(neb_result == NEBERROR_CALLBACKOVERRIDE) {
@@ -623,20 +634,15 @@ int run_host_event_handler(nagios_macros *mac, host *hst) {
 #endif
 
 	/* run the command through a worker */
-	result = wproc_run(WPJOB_HOST_EVTHANDLER, processed_command, event_handler_timeout, mac);
-
-	/* check to see if the event handler timed out */
-	if(early_timeout == TRUE)
-		logit(NSLOG_EVENT_HANDLER | NSLOG_RUNTIME_WARNING, TRUE, "Warning: Host event handler command '%s' timed out after %d seconds\n", processed_command, event_handler_timeout);
+	result = wproc_run_callback(processed_command, event_handler_timeout, evthandler_complete_with_msg,
+			"Warning: Host event handler command '%s' timed out after %.2f seconds\n" , mac);
 
 #ifdef USE_EVENT_BROKER
 	/* get end time */
 	gettimeofday(&end_time, NULL);
-#endif
 
-#ifdef USE_EVENT_BROKER
 	/* send event data to broker */
-	broker_event_handler(NEBTYPE_EVENTHANDLER_END, NEBFLAG_NONE, NEBATTR_NONE, HOST_EVENTHANDLER, (void *)hst, hst->current_state, hst->state_type, start_time, end_time, exectime, event_handler_timeout, early_timeout, result, hst->event_handler, processed_command, command_output, NULL);
+	broker_event_handler(NEBTYPE_EVENTHANDLER_END, NEBFLAG_NONE, NEBATTR_NONE, HOST_EVENTHANDLER, (void *)hst, hst->current_state, hst->state_type, start_time, end_time, exectime, event_handler_timeout, FALSE, result, hst->event_handler, processed_command, command_output, NULL);
 #endif
 
 	/* free memory */
diff --git a/base/workers.c b/base/workers.c
index 881e434..a6db4f5 100644
--- a/base/workers.c
+++ b/base/workers.c
@@ -19,10 +19,10 @@ struct wproc_worker;
 
 struct wproc_job {
 	unsigned int id;
-	unsigned int type;
 	unsigned int timeout;
 	char *command;
-	void *arg;
+	void (*callback)(struct wproc_result *, void *, int);
+	void *data;
 	struct wproc_worker *wp;
 };
 
@@ -52,40 +52,9 @@ static struct wproc_list workers = {0, 0, NULL};
 static dkhash_table *specialized_workers;
 static struct wproc_list *to_remove = NULL;
 
-typedef struct wproc_callback_job {
-	void *data;
-	void (*callback)(struct wproc_result *, void *, int);
-} wproc_callback_job;
-
-typedef struct wproc_object_job {
-	char *contact_name;
-	char *host_name;
-	char *service_description;
-} wproc_object_job;
-
 unsigned int wproc_num_workers_online = 0, wproc_num_workers_desired = 0;
 unsigned int wproc_num_workers_spawned = 0;
 
-#define tv2float(tv) ((float)((tv)->tv_sec) + ((float)(tv)->tv_usec) / 1000000.0)
-
-static const char *wpjob_type_name(unsigned int type)
-{
-	switch (type) {
-	case WPJOB_CHECK: return "CHECK";
-	case WPJOB_NOTIFY: return "NOTIFY";
-	case WPJOB_OCSP: return "OCSP";
-	case WPJOB_OCHP: return "OCHP";
-	case WPJOB_GLOBAL_SVC_EVTHANDLER: return "GLOBAL SERVICE EVENTHANDLER";
-	case WPJOB_SVC_EVTHANDLER: return "SERVICE EVENTHANDLER";
-	case WPJOB_GLOBAL_HOST_EVTHANDLER: return "GLOBAL HOST EVENTHANDLER";
-	case WPJOB_HOST_EVTHANDLER: return "HOST EVENTHANDLER";
-	case WPJOB_CALLBACK: return "CALLBACK";
-	case WPJOB_HOST_PERFDATA: return "HOST PERFDATA";
-	case WPJOB_SVC_PERFDATA: return "SERVICE PERFDATA";
-	}
-	return "UNKNOWN";
-}
-
 static void wproc_logdump_buffer(int level, int show, const char *prefix, char *buf)
 {
 	char *ptr, *eol;
@@ -223,7 +192,7 @@ static struct wproc_worker *get_worker(const char *cmd)
 	return wp_list->wps[wp_list->idx++ % wp_list->len];
 }
 
-static struct wproc_job *create_job(int type, void *arg, time_t timeout, const char *cmd)
+static struct wproc_job *create_job(void (*callback)(struct wproc_result *, void *, int), void *data, time_t timeout, const char *cmd)
 {
 	struct wproc_job *job;
 	struct wproc_worker *wp;
@@ -240,8 +209,8 @@ static struct wproc_job *create_job(int type, void *arg, time_t timeout, const c
 
 	job->wp = wp;
 	job->id = get_job_id(wp);
-	job->type = type;
-	job->arg = arg;
+	job->callback = callback;
+	job->data = data;
 	job->timeout = timeout;
 	if (fanout_add(wp->jobs, job->id, job) < 0 || !(job->command = strdup(cmd))) {
 		free(job);
@@ -253,16 +222,10 @@ static struct wproc_job *create_job(int type, void *arg, time_t timeout, const c
 
 static void run_job_callback(struct wproc_job *job, struct wproc_result *wpres, int val)
 {
-	wproc_callback_job *cj;
-
-	if (!job || !job->arg)
+	if (!job || !job->callback)
 		return;
-	cj = (struct wproc_callback_job *)job->arg;
-
-	if (!cj->callback)
-		return;
-	cj->callback(wpres, cj->data, val);
-	cj->callback = NULL;
+	job->callback(wpres, job->data, val);
+	job->callback = NULL;
 }
 
 static void destroy_job(struct wproc_job *job)
@@ -270,31 +233,8 @@ static void destroy_job(struct wproc_job *job)
 	if (!job)
 		return;
 
-	switch (job->type) {
-	case WPJOB_CHECK:
-		free_check_result(job->arg);
-		free(job->arg);
-		break;
-	case WPJOB_NOTIFY:
-	case WPJOB_OCSP:
-	case WPJOB_OCHP:
-		free(job->arg);
-		break;
-
-	case WPJOB_GLOBAL_SVC_EVTHANDLER:
-	case WPJOB_SVC_EVTHANDLER:
-	case WPJOB_GLOBAL_HOST_EVTHANDLER:
-	case WPJOB_HOST_EVTHANDLER:
-		/* these require nothing special */
-		break;
-	case WPJOB_CALLBACK:
-		/* call with NULL result to make callback clean things up */
-		run_job_callback(job, NULL, 0);
-		break;
-	default:
-		logit(NSLOG_RUNTIME_WARNING, TRUE, "wproc: Unknown job type: %d\n", job->type);
-		break;
-	}
+	/* Run the callback once, but with wpres=NULL if the job is destroyed */
+	run_job_callback(job, NULL, 0);
 
 	my_free(job->command);
 	if (job->wp) {
@@ -433,41 +373,6 @@ static int str2timeval(char *str, struct timeval *tv)
 	return 0;
 }
 
-static int handle_worker_check(wproc_result *wpres, struct wproc_worker *wp, struct wproc_job *job)
-{
-	int result = ERROR;
-	check_result *cr = (check_result *)job->arg;
-
-	memcpy(&cr->rusage, &wpres->rusage, sizeof(wpres->rusage));
-	cr->start_time.tv_sec = wpres->start.tv_sec;
-	cr->start_time.tv_usec = wpres->start.tv_usec;
-	cr->finish_time.tv_sec = wpres->stop.tv_sec;
-	cr->finish_time.tv_usec = wpres->stop.tv_usec;
-	if (WIFEXITED(wpres->wait_status)) {
-		cr->return_code = WEXITSTATUS(wpres->wait_status);
-	} else {
-		cr->return_code = STATE_UNKNOWN;
-	}
-
-	if (wpres->outstd && *wpres->outstd) {
-		cr->output = strdup(wpres->outstd);
-	} else if (wpres->outerr) {
-		asprintf(&cr->output, "(No output on stdout) stderr: %s", wpres->outerr);
-	} else {
-		cr->output = NULL;
-	}
-
-	cr->early_timeout = wpres->early_timeout;
-	cr->exited_ok = wpres->exited_ok;
-	cr->engine = NULL;
-	cr->source = wp->name;
-
-	process_check_result(cr);
-	free_check_result(cr);
-
-	return result;
-}
-
 /*
  * parses a worker result. We do no strdup()'s here, so when
  * kvv is destroyed, all references to strings will become
@@ -493,7 +398,7 @@ static int parse_worker_result(wproc_result *wpres, struct kvvec *kvv)
 			wpres->job_id = atoi(value);
 			break;
 		case WPRES_type:
-			wpres->type = atoi(value);
+			/* FIXME: ignored */
 			break;
 		case WPRES_command:
 			wpres->command = value;
@@ -587,7 +492,6 @@ static void fo_reassign_wproc_job(void *job_)
 
 static int handle_worker_result(int sd, int events, void *arg)
 {
-	wproc_object_job *oj = NULL;
 	char *buf, *error_reason = NULL;
 	unsigned long size;
 	int ret;
@@ -650,12 +554,6 @@ static int handle_worker_result(int sd, int events, void *arg)
 				  wpres.job_id, wp->name);
 			continue;
 		}
-		if (wpres.type != job->type) {
-			logit(NSLOG_RUNTIME_WARNING, TRUE, "wproc: %s claims job %d is type %d, but we think it's type %d\n",
-				  wp->name, job->id, wpres.type, job->type);
-			break;
-		}
-		oj = (wproc_object_job *)job->arg;
 
 		/*
 		 * ETIME ("Timer expired") doesn't really happen
@@ -674,24 +572,11 @@ static int handle_worker_result(int sd, int events, void *arg)
 			         WCOREDUMP(wpres.wait_status) ? " (core dumped)" : "",
 			         tv_delta_f(&wpres.start, &wpres.stop));
 		}
-		else if (job->type != WPJOB_CHECK && WEXITSTATUS(wpres.wait_status) != 0) {
-			asprintf(&error_reason, "is a non-check helper but exited with return code %d",
-			         WEXITSTATUS(wpres.wait_status));
-		}
+
 		if (error_reason) {
-			logit(NSLOG_RUNTIME_ERROR, TRUE, "wproc: %s job %d from worker %s %s",
-			      wpjob_type_name(job->type), job->id, wp->name, error_reason);
+			logit(NSLOG_RUNTIME_ERROR, TRUE, "wproc: job %d from worker %s %s",
+			      job->id, wp->name, error_reason);
 			logit(NSLOG_RUNTIME_ERROR, TRUE, "wproc:   command: %s\n", job->command);
-			if (job->type != WPJOB_CHECK && oj) {
-				logit(NSLOG_RUNTIME_ERROR, TRUE, "wproc:   host=%s; service=%s; contact=%s\n",
-				      oj->host_name ? oj->host_name : "(none)",
-				      oj->service_description ? oj->service_description : "(none)",
-				      oj->contact_name ? oj->contact_name : "(none)");
-			} else if (oj) {
-				struct check_result *cr = (struct check_result *)job->arg;
-				logit(NSLOG_RUNTIME_ERROR, TRUE, "wproc:   host=%s; service=%s;\n",
-				      cr->host_name, cr->service_description);
-			}
 			logit(NSLOG_RUNTIME_ERROR, TRUE, "wproc:   early_timeout=%d; exited_ok=%d; wait_status=%d; error_code=%d;\n",
 			      wpres.early_timeout, wpres.exited_ok, wpres.wait_status, wpres.error_code);
 			wproc_logdump_buffer(NSLOG_RUNTIME_ERROR, TRUE, "wproc:   stderr", wpres.outerr);
@@ -699,74 +584,9 @@ static int handle_worker_result(int sd, int events, void *arg)
 		}
 		my_free(error_reason);
 
-		switch (job->type) {
-		case WPJOB_CHECK:
-			ret = handle_worker_check(&wpres, wp, job);
-			break;
-		case WPJOB_NOTIFY:
-			if (wpres.early_timeout) {
-				if (oj->service_description) {
-					logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: Notifying contact '%s' of service '%s' on host '%s' by command '%s' timed out after %.2f seconds\n",
-						  oj->contact_name, oj->service_description,
-						  oj->host_name, job->command,
-						  tv2float(&wpres.runtime));
-				} else {
-					logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: Notifying contact '%s' of host '%s' by command '%s' timed out after %.2f seconds\n",
-						  oj->contact_name, oj->host_name,
-						  job->command, tv2float(&wpres.runtime));
-				}
-			}
-			break;
-		case WPJOB_OCSP:
-			if (wpres.early_timeout) {
-				logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: OCSP command '%s' for service '%s' on host '%s' timed out after %.2f seconds\n",
-					  job->command, oj->service_description, oj->host_name,
-					  tv2float(&wpres.runtime));
-			}
-			break;
-		case WPJOB_OCHP:
-			if (wpres.early_timeout) {
-				logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: OCHP command '%s' for host '%s' timed out after %.2f seconds\n",
-					  job->command, oj->host_name, tv2float(&wpres.runtime));
-			}
-			break;
-		case WPJOB_GLOBAL_SVC_EVTHANDLER:
-			if (wpres.early_timeout) {
-				logit(NSLOG_EVENT_HANDLER | NSLOG_RUNTIME_WARNING, TRUE,
-					  "Warning: Global service event handler command '%s' timed out after %.2f seconds\n",
-					  job->command, tv2float(&wpres.runtime));
-			}
-			break;
-		case WPJOB_SVC_EVTHANDLER:
-			if (wpres.early_timeout) {
-				logit(NSLOG_EVENT_HANDLER | NSLOG_RUNTIME_WARNING, TRUE,
-					  "Warning: Service event handler command '%s' timed out after %.2f seconds\n",
-					  job->command, tv2float(&wpres.runtime));
-			}
-			break;
-		case WPJOB_GLOBAL_HOST_EVTHANDLER:
-			if (wpres.early_timeout) {
-				logit(NSLOG_EVENT_HANDLER | NSLOG_RUNTIME_WARNING, TRUE,
-					  "Warning: Global host event handler command '%s' timed out after %.2f seconds\n",
-					  job->command, tv2float(&wpres.runtime));
-			}
-			break;
-		case WPJOB_HOST_EVTHANDLER:
-			if (wpres.early_timeout) {
-				logit(NSLOG_EVENT_HANDLER | NSLOG_RUNTIME_WARNING, TRUE,
-					  "Warning: Host event handler command '%s' timed out after %.2f seconds\n",
-					  job->command, tv2float(&wpres.runtime));
-			}
-			break;
+		/* Run the callback with the wpres struct as argument */
+		run_job_callback(job, &wpres, 0);
 
-		case WPJOB_CALLBACK:
-			run_job_callback(job, &wpres, 0);
-			break;
-
-		default:
-			logit(NSLOG_RUNTIME_WARNING, TRUE, "Worker %d: Unknown jobtype: %d\n", wp->pid, job->type);
-			break;
-		}
 		destroy_job(job);
 	}
 
@@ -989,7 +809,7 @@ static int wproc_run_job(struct wproc_job *job, nagios_macros *mac)
 		return ERROR;
 
 	kvvec_addkv(&kvv, "job_id", (char *)mkstr("%d", job->id));
-	kvvec_addkv(&kvv, "type", (char *)mkstr("%d", job->type));
+	kvvec_addkv(&kvv, "type", "0"); /* Dummy field for compatibility */
 	kvvec_addkv(&kvv, "command", job->command);
 	kvvec_addkv(&kvv, "timeout", (char *)mkstr("%u", job->timeout));
 	kvvb = build_kvvec_buf(&kvv);
@@ -1010,94 +830,11 @@ static int wproc_run_job(struct wproc_job *job, nagios_macros *mac)
 	return result;
 }
 
-static wproc_object_job *create_object_job(char *cname, char *hname, char *sdesc)
-{
-	wproc_object_job *oj;
-
-	oj = calloc(1, sizeof(*oj));
-	if (oj) {
-		oj->host_name = hname;
-		if (cname)
-			oj->contact_name = cname;
-		if (sdesc)
-			oj->service_description = sdesc;
-	}
-
-	return oj;
-}
-
-int wproc_notify(char *cname, char *hname, char *sdesc, char *cmd, nagios_macros *mac)
-{
-	struct wproc_job *job;
-	wproc_object_job *oj;
-
-	if (!(oj = create_object_job(cname, hname, sdesc)))
-		return ERROR;
-
-	job = create_job(WPJOB_NOTIFY, oj, notification_timeout, cmd);
-
-	return wproc_run_job(job, mac);
-}
-
-int wproc_run_service_job(int jtype, int timeout, service *svc, char *cmd, nagios_macros *mac)
-{
-	struct wproc_job *job;
-	wproc_object_job *oj;
-
-	if (!(oj = create_object_job(NULL, svc->host_name, svc->description)))
-		return ERROR;
-
-	job = create_job(jtype, oj, timeout, cmd);
-
-	return wproc_run_job(job, mac);
-}
-
-int wproc_run_host_job(int jtype, int timeout, host *hst, char *cmd, nagios_macros *mac)
-{
-	struct wproc_job *job;
-	wproc_object_job *oj;
-
-	if (!(oj = create_object_job(NULL, hst->name, NULL)))
-		return ERROR;
-
-	job = create_job(jtype, oj, timeout, cmd);
-
-	return wproc_run_job(job, mac);
-}
-
-int wproc_run_check(check_result *cr, char *cmd, nagios_macros *mac)
-{
-	struct wproc_job *job;
-	int timeout;
-
-	if (cr->service_description)
-		timeout = service_check_timeout;
-	else
-		timeout = host_check_timeout;
-
-	job = create_job(WPJOB_CHECK, cr, timeout, cmd);
-	return wproc_run_job(job, mac);
-}
-
-int wproc_run(int jtype, char *cmd, int timeout, nagios_macros *mac)
-{
-	struct wproc_job *job;
-
-	job = create_job(jtype, NULL, timeout, cmd);
-	return wproc_run_job(job, mac);
-}
-
-int wproc_run_callback(char *cmd, int timeout,
+int wproc_run_callback(char *command_line, int timeout,
 		void (*cb)(struct wproc_result *, void *, int), void *data,
 		nagios_macros *mac)
 {
 	struct wproc_job *job;
-	struct wproc_callback_job *cj;
-	if (!(cj = calloc(1, sizeof(*cj))))
-		return ERROR;
-	cj->callback = cb;
-	cj->data = data;
-
-	job = create_job(WPJOB_CALLBACK, cj, timeout, cmd);
+	job = create_job(cb, data, timeout, command_line);
 	return wproc_run_job(job, mac);
 }
diff --git a/include/workers.h b/include/workers.h
index b11229b..cf28287 100644
--- a/include/workers.h
+++ b/include/workers.h
@@ -2,28 +2,15 @@
 #define INCLUDE_workers_h__
 #include "lib/libnagios.h"
 #include "lib/worker.h"
-#include "nagios.h" /* for check_result definition */
-
-/* different jobtypes. We add more as needed */
-#define WPJOB_CHECK   0
-#define WPJOB_NOTIFY  1
-#define WPJOB_OCSP    2
-#define WPJOB_OCHP    3
-#define WPJOB_GLOBAL_SVC_EVTHANDLER 4
-#define WPJOB_SVC_EVTHANDLER  5
-#define WPJOB_GLOBAL_HOST_EVTHANDLER 6
-#define WPJOB_HOST_EVTHANDLER 7
-#define WPJOB_CALLBACK 8
-#define WPJOB_HOST_PERFDATA 9
-#define WPJOB_SVC_PERFDATA 10
 
 #define WPROC_FORCE  (1 << 0)
 
 NAGIOS_BEGIN_DECL;
 
+/* FIXME: Needs to be somewhere better... */
 typedef struct wproc_result {
 	unsigned int job_id;
-	unsigned int type;
+	unsigned int type; /* not used */
 	time_t timeout;
 	struct timeval start;
 	struct timeval stop;
@@ -49,12 +36,7 @@ extern int wproc_can_spawn(struct load_control *lc);
 extern void free_worker_memory(int flags);
 extern int workers_alive(void);
 extern int init_workers(int desired_workers);
-extern int wproc_run_check(check_result *cr, char *cmd, nagios_macros *mac);
-extern int wproc_notify(char *cname, char *hname, char *sdesc, char *cmd, nagios_macros *mac);
-extern int wproc_run(int job_type, char *cmd, int timeout, nagios_macros *mac);
-extern int wproc_run_service_job(int jtype, int timeout, service *svc, char *cmd, nagios_macros *mac);
-extern int wproc_run_host_job(int jtype, int timeout, host *hst, char *cmd, nagios_macros *mac);
-extern int wproc_run_callback(char *cmt, int timeout, void (*cb)(struct wproc_result *, void *, int), void *data, nagios_macros *mac);
+extern int wproc_run_callback(char *command_line, int timeout, void (*cb)(struct wproc_result *, void *, int), void *data, nagios_macros *mac);
 
 NAGIOS_END_DECL;
 #endif
diff --git a/xdata/xpddefault.c b/xdata/xpddefault.c
index 62bfe56..94bddbf 100644
--- a/xdata/xpddefault.c
+++ b/xdata/xpddefault.c
@@ -41,6 +41,13 @@ static FILE    *service_perfdata_fp = NULL;
 static int     host_perfdata_fd = -1;
 static int     service_perfdata_fd = -1;
 
+/******************************************************************/
+/********************** CHECK COMPLETE CALLBACK *******************/
+/******************************************************************/
+
+static void perfdata_command_callback(struct wproc_result *wpres, void *data, int flags) {
+	/* Dummy method as callback. we don't care about the result for now */
+}
 
 /******************************************************************/
 /************** INITIALIZATION & CLEANUP FUNCTIONS ****************/
@@ -338,7 +345,7 @@ int xpddefault_run_service_performance_data_command(nagios_macros *mac, service
 	log_debug_info(DEBUGL_PERFDATA, 2, "Processed service performance data command line: %s\n", processed_command_line);
 
 	/* run the command */
-	wproc_run(WPJOB_SVC_PERFDATA, processed_command_line, perfdata_timeout, NULL);
+	wproc_run_callback(processed_command_line, perfdata_timeout, perfdata_command_callback, NULL, NULL);
 
 	/* free memory */
 	my_free(processed_command_line);
@@ -379,7 +386,7 @@ int xpddefault_run_host_performance_data_command(nagios_macros *mac, host *hst)
 	log_debug_info(DEBUGL_PERFDATA, 2, "Processed host performance data command line: %s\n", processed_command_line);
 
 	/* run the command */
-	wproc_run(WPJOB_HOST_PERFDATA, processed_command_line, perfdata_timeout, NULL);
+	wproc_run_callback(processed_command_line, perfdata_timeout, perfdata_command_callback, NULL, NULL);
 
 	/* free memory */
 	my_free(processed_command_line);
-- 
1.7.1

Re: [Nagios-devel] [PATCH] checks: Set check state to UNKNOWN when there's a timeout

From: Andreas E. <ae...@op...> - 2013-08-19 14:16:42

On 2013-08-19 14:59, Robin Sonefors wrote:
> When a check - host or service - fails to exit properly, it always
> becomes CRITICAL. When an active host check times out, it goes into
> UNKNOWN. When a service check times out, nothing is done to the state,
> which led to my system telling me that a check that timed out was an OK
> check.
>
> This thus sets the state to UNKNOWN when the check didn't exit in time,
> because that seems to make more sense and is analogous with what's done
> for host checks.
>
> Still, the whole CRITICAL vs UNKNOWN descibed above makes me a bit less
> confident in my fix than I'd like - does this need more work?
>

Well, it needs to honor "service_check_timeout_state", which is a global
variable. Apart from that, it might be nice to save the output, if
there is any such, or create the output in base/workers.c where we
handle all non-ok helper exit codes anyway. I think the check for jobs
of type WPJOB_CHECK already has their own case label, so the effort
shouldn't be huge.

Saving output will only be useful in very (very) rare cases though, so
concatenating it with the non-null plugin_output should work ok.

-- 
Andreas Ericsson                   and...@op...
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

[Nagios-devel] [PATCH] checks: Set check state to UNKNOWN when there's a timeout

From: Robin S. <rob...@op...> - 2013-08-19 13:15:34

When a check - host or service - fails to exit properly, it always
becomes CRITICAL. When an active host check times out, it goes into
UNKNOWN. When a service check times out, nothing is done to the state,
which led to my system telling me that a check that timed out was an OK
check.

This thus sets the state to UNKNOWN when the check didn't exit in time,
because that seems to make more sense and is analogous with what's done
for host checks.

Still, the whole CRITICAL vs UNKNOWN descibed above makes me a bit less
confident in my fix than I'd like - does this need more work?

Signed-off-by: Robin Sonefors <rob...@op...>
---
 base/checks.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/base/checks.c b/base/checks.c
index 76a9b7d..c601b9f 100644
--- a/base/checks.c
+++ b/base/checks.c
@@ -416,6 +416,7 @@ int handle_async_service_check_result(service *temp_service, check_result *queue
 	if(queued_check_result->early_timeout == TRUE) {
 		logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: Check of service '%s' on host '%s' timed out after %.3fs!\n", temp_service->description, temp_service->host_name, temp_service->execution_time);
 		asprintf(&temp_service->plugin_output, "(Service check timed out after %.2lf seconds)\n", temp_service->execution_time);
+		temp_service->current_state = STATE_UNKNOWN;
 		}
 	/* if there was some error running the command, just skip it (this shouldn't be happening) */
 	else if(queued_check_result->exited_ok == FALSE) {
-- 
1.8.3.1

Re: [Nagios-devel] Patches for inclusion to Nagios 4

From: Andreas E. <ae...@op...> - 2013-08-08 15:52:34

On 2013-08-08 15:17, Ton Voon wrote:
>
> On 7 Aug 2013, at 08:51, Andreas Ericsson wrote:
>
>> On 2013-08-06 19:16, Ton Voon wrote:
>>> Hi!
>>>
>>> We've published a list of patches for Nagios 4:
>>> http://www.opsview.com/whats-new/blog/opsview-patches-nagios-4
>>>
>>> We'd be happy if you could review if these are acceptable for
>>> future inclusion or if anyone else finds them useful.
>>>
>>
>> I'd like to get patches with commit messages and proper author and
>> signed-off-by info. Since we're using git for Nagios now, it'd go a
>> long way in making sure everyone gets credit for the work they've
>> done.
>>
>> The patches also need to apply cleanly to the latest master.
>>
>> You may want to clone the official Nagios repo, apply your patches
>> on top of it and then send me a pull request for github or some
>> such.
>>
>> git clone git://git.code.sf.net/p/nagios/nagios nagios-core
>>
>> should get you the very latest. If you apply your patches on top
>> of 'master' and make sure to always do "git pull --rebase" when you
>> want to get the latest and greatest you'll quickly see which
>> patches either have been applied or which no longer *can* be
>> applied. Then you can create a separate repository on github or
>> some such and push the changes there.
>
> OK, we'll convert them into git changes based off master.
> However....
>
> I'd like an assurance that the changes will be merged before we
> promise to convert. Style changes are fine, won't take much time and
> will not require retesting, but if we need to refactor object changes
> or make larger changes to logic, this will require retesting on our
> side. I'd like a reassurance that the time invested will result in a
> merge upstream, otherwise we're just wasting time for all of us.
>

> For instance, your assistance in the "environment macros per-command"
> was greatly appreciated and we've coded to the design agreed in the
> email conversations, but it hasn't been merged yet.
>

It has, but it hasn't. The original patch no longer applies, due to
extensive remodeling of the worker code. I still have it hanging
around though, and I'm fairly inclined towards using it for services
and hosts as well, so that environment macros can be set on a service 
level as well as on a command level. I've also been investigating the
chances of doing this without resorting to setenv() (in essence,
building a one-off block of the load-time environment macros, and then
extending that whenever we hit an environment variable).

But yea, I've got that one already.

> So I'd like to reverse the question and ask "which changes are most
> likely to be accepted, based on the amount of changes required" and
> we'll work through that list in order.
>

Thread-safe calls
It's harmless, and I'd apply it straight away if I had a sensible
commit message for it, but I don't know why you need this so I can't
really write one. While we're on the topic; Please write commit
messages in imperative form and present tense, as if you're giving
orders to the code for how it should change. Also give a short
rationale for why it should change that way. The rules about not
commenting out code you want to get rid of still applies though. It
just looks terribly hackish when patches meant for upstream contains
things like that.

Slice services within hosts
So long as it's configurable from cgi.cfg and the default stays the
same as it is today I'll apply it immediately. It has no impact on
loadable modules or other headaches, so that's a nobrainer, really.

Check command by time period
I feel this is somewhat lacking in efficiency and flexibility, and
a much cleaner solution would be to add a filtering functionality to
NERD so that checkresults can be shipped off to a third-party addon
which can transform checkresults and plugin output as much as it
likes.
Failing that (which would be enormously cool but also a lot of work),
I think it would most likely be best off as a separate module, with
custom variables or separate configuration to support it. Supporting
patches to run events when a timeperiod becomes active or inactive
would still be welcome, obviously.

Escalation via notification levels
This is best off written as a module, using custom variables to 
configure it. If core support is required to block notifications to a
particular contact, then such patches will naturally be accepted. The
normal NEBCALLBACK_CANCEL return code signalling should work just fine
for that.

Synchronising state data
Pretty invasive for quite a small benefit, and with enormous
complexities to deal with to make this work properly. How do you
handle reading the synced file while checks are in-flight and
awaiting being returned to the mothership? They may have been
spawned as a result of a bad checkresult on one node, but if the
sync status returns them to OK and the in-flight result sets them
as bad again, would that mean starting over on the retry-attempts?
I see nothing in the patch that handles situations like that, so
I'm forced to believe you haven't considered it. Considering the
fact that full reloads now take less than a second for your
recommended 250 hosts per system, I'm inclined to drop this patch
in favour of the regular retention data file unless you can
persuade me otherwise.

Dependency failure states
The patch is a bit hackish in that it internally submits a passive
check to make sure everything gets updated. That's not entirely
accurate though, and will cause statistical errors.
Rewrite it to just update the states as necessary and make sure
notifications and whatnot are sent if they should be.
That will also fix the problem of bugging out performance data
handling routines that expect to always get the same kind of data
from each check every time.
It will have to be configurable though. I'm thinking
"service_dependency_exec_fail_state=(warning,critical,unknown)"
and "service_dependency_exec_fail_output=Dependency failure".
I'd be all for changing the default, but for now the exec_fail_state
thing will have to be -1, meaning "leave it as-is", and I'll make
it default to UNKNOWN after 4.0 is released.
The same should go for service parents, btw.

Passive notifications respecting renotification interval
I'm inclined to making this default. It's not documented anywhere
which of those options take precedence. If notification_interval
is 0, then every notification should be sent for is_volatile
services, and if it's anything else we should respect the interval
no matter how many notifications happen to arrive. Make those
changes and I'll apply the patch and call it a bugfix.

Reducing SQL queries part 1:
No. Just no. This is a perfect place where you can use an array sized
after num_objects.services *inside the module* and look up the latest
version of the processed commandline there. Free it when you update it
and you'll be sure to always have it handy. Anyone who doesn't use
NDOUtils will otherwise be penalized for its shortcomings.

Reducing SQL queries part 2:
As above, but num_objects.hosts

Conditional debugging:
I'm more inclined towards making the debug-logging function a C99-style
variadic macro (or a gcc style one), while falling back to using the
current structure of a lot of unnecessary calls on too old compilers.
All that iffing should really be kept at arms length from the coder,
or debug-logging won't be used. I'm thinking
   dbg(DEBUGL_CHECKS, 0, "** Running async check of foo ... ");

so we get it short and sweet while we're at it.

FIFO processing of passive checkresults
I have several objections to this patch.
First off; You're sorting on alphanumerical name only, instead of mtime
first and then name in case of a tiebreak. That's just plain dumb and
has to be amended if this is to be anywhere near useful.
Secondly; This doesn't scale. I spent the better part of last year
removing everything that touches the filesystem on a scheduled and
frequent basis for those precise reasons. Putting crap like that back in
is not something I'll do without a very good reason.
Thirdly; It builds on obsolete code which is designated for oblivion,
so the code would be (relatively) shortlived anyway.
Fourthly; *Nothing* can be built on top of such a patch that doesn't
absolutely reek of hackishness and bad design.
A much more useful patch would be adding support for external commands
to the query handler, which doesn't have the limitations of the named
pipe (and can give responses to your requests so you know if they were
accepted or not) and use a secondary program to ship in passive results
any old way you want to the clean, nice and responsive interface which
is meant to be extended for things exactly like this.

Tolerant time jumps
I'm unsure how useful this is. How often does a system clock go
backwards except for daylight savings time? I also worry slightly
about the consequences. OTOH, most of the worry comes as a result from
the very lax default for time_change_threshold (900 seconds). A time
jump of 899 seconds wouldn't trigger a rescheduling, but would wreak
serious havoc with the checks.

Listing related contact groups
The patch needs to be changed so that the (possibly very long) list
of contactgroups isn't computed until it's requested instead of
before we even start macro substitution. Otherwise we'll be
computing that macro for each and every host and service check we
execute, which stupid beyond belief. Particularly considering the
fact that object config can't change during runtime (yet), so the
list could just as well be cached if one would so desire.
For bonus points, also add CONTACTLIST, which fetches contacts from
both the object directly and the contactgroups assigned.

Environment macros per-command
I wish you'd stop talking BS on that site of yours. Removal of
environment macros has very little to do with the speedups in Nagios 4.
Those improvements are the result of good design and proper
implementation of good design.
Now to technical issues; I have the patch (as I've said) and I do
intend to push it as soon as I've had time to polish it and bring it
forward to the current "master". It's quite invasive though, and
I've been trying to figure out what it is about it that feels wrong.
Then I've also forgotten about it, so this reminder was pretty good.
Thanks.

Allowing semi-colons in commands
Well, it's a neat idea, but the patch is so incredibly ugly that I
just can't bring myself to accept it. If you did something to keep
state and allowed semicolons in strings or as escaped creatures
(with a backslash, pretty please; There's enough home-cooked
escaping in Nagios already), I'd be willing to accept it.

Tilde symbol in commands
Absolutely out of the question. This patch deliberately breaks a
library to exploit its caller's fallback, which happens to suit your
particular needs in this case. No. Nonono. You can just configure
your way around it by using

    command_line  /bin/sh -c '$USER1$/plugin --arg=lala'

for the command in question, or implement it separately but making
it configurable as "expand_tilde_in_commands=<bool>" in nagios.cfg and
doing the replacement manually. It's not that hard, really.

Fix max_concurrent_checks decrementing
Obviously correct, although the patch is incomplete. In case we fail to
launch a job, it will have no reference in the scheduling queue but will
be marked as "is_executing" which will cause it to not even get
freshness checked. I'll amend the patch with everything it needs and
apply. Good catch though.

Extra external commands
This looks clean, correct and useful. I'll need a commit message, an
author and a signed-off-by before I can accept it though.

Thanks

-- 
Andreas Ericsson                   and...@op...
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

Re: [Nagios-devel] Patches for inclusion to Nagios 4

From: Ton V. <ton...@op...> - 2013-08-08 13:38:55

On 7 Aug 2013, at 08:51, Andreas Ericsson wrote:

> On 2013-08-06 19:16, Ton Voon wrote:
>> Hi!
>> 
>> We've published a list of patches for Nagios 4:
>> http://www.opsview.com/whats-new/blog/opsview-patches-nagios-4
>> 
>> We'd be happy if you could review if these are acceptable for future
>> inclusion or if anyone else finds them useful.
>> 
> 
> I'd like to get patches with commit messages and proper author and
> signed-off-by info. Since we're using git for Nagios now, it'd go
> a long way in making sure everyone gets credit for the work they've
> done.
> 
> The patches also need to apply cleanly to the latest master.
> 
> You may want to clone the official Nagios repo, apply your patches on
> top of it and then send me a pull request for github or some such.
> 
>   git clone git://git.code.sf.net/p/nagios/nagios nagios-core
> 
> should get you the very latest. If you apply your patches on top of
> 'master' and make sure to always do "git pull --rebase" when you want
> to get the latest and greatest you'll quickly see which patches either
> have been applied or which no longer *can* be applied. Then you can
> create a separate repository on github or some such and push the changes
> there.

OK, we'll convert them into git changes based off master. However....

I'd like an assurance that the changes will be merged before we promise to convert. Style changes are fine, won't take much time and will not require retesting, but if we need to refactor object changes or make larger changes to logic, this will require retesting on our side. I'd like a reassurance that the time invested will result in a merge upstream, otherwise we're just wasting time for all of us.

For instance, your assistance in the "environment macros per-command" was greatly appreciated and we've coded to the design agreed in the email conversations, but it hasn't been merged yet.

So I'd like to reverse the question and ask "which changes are most likely to be accepted, based on the amount of changes required" and we'll work through that list in order.

> 
> 
> On a first inspection though;
> * Don't comment out code. Bringing back dead code is what the VCS is
> for, and keeping it around is just plain dumb. If it shouldn't be in
> there, just remove it. In the same vein; Don't add commented-out code.
> 
> * Avoid C++ comments. I know they're supported in C99, which I'm rooting
> for as the least version supported, but it's against the current style.
> 
> * Don't mix spaces and tabs for indentation, unless it's continuation-
> indentation of a multi-line statement or condition. Stick to the style
> in the file you're editing, and *look* at the patch before you send it
> somewhere.
> 
> * Avoid comments saying things like "Opsview specific foobar" if you
> want to have the patches included. If you *don't* want those patches
> included, don't send them to me or point me to where they are. It takes
> up a lot of time to remove crap like that, and I have no interest in
> cleaning up after anyone else. I'm messy enough as it is on my own.
> 
> * Don't augment objects (such as hosts, services, commands) with new
> variables. Doing so means Nagios 4.1, and I can't take patches like
> that until Nagios 4 is at least released as stable. All objects have
> an 'id' field which means you can look up any extra data you want in
> O(1) time, provided you just initialize an array of size
> num_objects.$desired_object_type_in_plural before we enter the event
> loop.
> 
> * Make patches the most scalable you can. For the "check_time_period"
> thing, you'd be far better off adding code to detect timeperiod changes,
> notice which timeperiods are used to change commands and make a one-off
> swap for all the affected commands as the desired timeperiod comes
> either in or out of effect.
> 
> * Don't build on deprecated technology, such as external files for
> commands and/or check results

Re: [Nagios-devel] Patches for inclusion to Nagios 4

From: Andreas E. <ae...@op...> - 2013-08-07 07:51:49

On 2013-08-06 19:16, Ton Voon wrote:
> Hi!
>
> We've published a list of patches for Nagios 4:
> http://www.opsview.com/whats-new/blog/opsview-patches-nagios-4
>
> We'd be happy if you could review if these are acceptable for future
> inclusion or if anyone else finds them useful.
>

I'd like to get patches with commit messages and proper author and
signed-off-by info. Since we're using git for Nagios now, it'd go
a long way in making sure everyone gets credit for the work they've
done.

The patches also need to apply cleanly to the latest master.

You may want to clone the official Nagios repo, apply your patches on
top of it and then send me a pull request for github or some such.

   git clone git://git.code.sf.net/p/nagios/nagios nagios-core

should get you the very latest. If you apply your patches on top of
'master' and make sure to always do "git pull --rebase" when you want
to get the latest and greatest you'll quickly see which patches either
have been applied or which no longer *can* be applied. Then you can
create a separate repository on github or some such and push the changes
there.

On a first inspection though;
* Don't comment out code. Bringing back dead code is what the VCS is
for, and keeping it around is just plain dumb. If it shouldn't be in
there, just remove it. In the same vein; Don't add commented-out code.

* Avoid C++ comments. I know they're supported in C99, which I'm rooting
for as the least version supported, but it's against the current style.

* Don't mix spaces and tabs for indentation, unless it's continuation-
indentation of a multi-line statement or condition. Stick to the style
in the file you're editing, and *look* at the patch before you send it
somewhere.

* Avoid comments saying things like "Opsview specific foobar" if you
want to have the patches included. If you *don't* want those patches
included, don't send them to me or point me to where they are. It takes
up a lot of time to remove crap like that, and I have no interest in
cleaning up after anyone else. I'm messy enough as it is on my own.

* Don't augment objects (such as hosts, services, commands) with new
variables. Doing so means Nagios 4.1, and I can't take patches like
that until Nagios 4 is at least released as stable. All objects have
an 'id' field which means you can look up any extra data you want in
O(1) time, provided you just initialize an array of size
num_objects.$desired_object_type_in_plural before we enter the event
loop.

* Make patches the most scalable you can. For the "check_time_period"
thing, you'd be far better off adding code to detect timeperiod changes,
notice which timeperiods are used to change commands and make a one-off
swap for all the affected commands as the desired timeperiod comes
either in or out of effect.

* Don't build on deprecated technology, such as external files for
commands and/or check results.

-- 
Andreas Ericsson                   and...@op...
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

Re: [Nagios-devel] Patches for inclusion to Nagios 4

From: Daniel W. <dwi...@gm...> - 2013-08-07 02:45:02

Looks like some good additions.  Since I'm in the middle of doing some in-depth performance tuning/comparison between 3.2.3 and 4.0 I might to throw a couple of these on and see how things change.

Thanks!
Dan

On Aug 6, 2013, at 12:16 PM, Ton Voon <ton...@op...> wrote:

> Hi!
> 
> We've published a list of patches for Nagios 4: http://www.opsview.com/whats-new/blog/opsview-patches-nagios-4
> 
> We'd be happy if you could review if these are acceptable for future inclusion or if anyone else finds them useful.
> 
> Ton
> 
> 
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead. 
> Download for free and get started troubleshooting in minutes. 
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
> _______________________________________________
> Nagios-devel mailing list
> Nag...@li...
> https://lists.sourceforge.net/lists/listinfo/nagios-devel

[Nagios-devel] Patches for inclusion to Nagios 4

From: Ton V. <ton...@op...> - 2013-08-06 17:36:38

Hi!

We've published a list of patches for Nagios 4: http://www.opsview.com/whats-new/blog/opsview-patches-nagios-4

We'd be happy if you could review if these are acceptable for future inclusion or if anyone else finds them useful.

Ton

[Nagios-devel] [PATCH 3/3] Macros: All macros should be possible to urlencode

From: Max S. <max...@op...> - 2013-08-05 12:24:46

From: Max Sikström <msi...@op...>

Default behaviour is still that everything should be possible to urlencode. So
keep it that way.

When zeroing clean_option (=flags for which types of escaping the macro is
allowed to pass), it should actually be set to url encode, not zero.

Signed-off-by: Max Sikström <msi...@op...>
---
 common/macros.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/common/macros.c b/common/macros.c
index fa22233..e30634a 100644
--- a/common/macros.c
+++ b/common/macros.c
@@ -407,8 +407,9 @@ int grab_macro_value_r(nagios_macros *mac, char *macro_buffer, char **output, in
 	if(macro_buffer == NULL || free_macro == NULL)
 		return ERROR;
 
+	/* Everything (if not stated otherwise) should be possible to urlencode */
 	if(clean_options)
-		*clean_options = 0;
+		*clean_options = URL_ENCODE_MACRO_CHARS;
 
 	/*
 	 * We handle argv and user macros first, since those are by far
-- 
1.7.1

115 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 2 3 4 .. 380 > >> (Page 2 of 380)

nagios-devel Mailing List for Nagios Core (Page 2)

Nagios network monitoring software is enterprise server monitoring

nagios-devel — Developers List