Suspected NRPE weirdness

Posted on Sunday, 10th August, 2008 in Life

Well, I just noticed a really weird thing, when you have command line arguments enabled.

Here’s a snippet from my nrpe.cfg:

1
2
dont_blame_nrpe=1
command[check_disk]=/usr/lib/nagios/plugins/check_disk -E -w $ARG1$ -c $ARG2$ -p $ARG3$

Now, if you’d check the free space for the root, it ain’t gonna show any inode percentage (that one isn’t what I’m talking about). But if you have to use bind mounts like I do (Tivoli needs a separate “domain” — that is a separate mount point for each domain), you might wanna check the free space on the *real* device, rather than the free space on the bind mount (which is gonna show you the free space of the parent file system - in my case the root fs).

Let’s take a look at what I’m talking about. If you use the check_disk locally like this:

1
2
# ./check_disk -w 20% -c 10% -p /apache/
DISK OK - free space: /apache 11090 MB (36% inode=36%);| /apache=19629MB;24575;27647;0;30719

Means, everything is okay, you have to pass the extra trailing slash to the –partition argument, as otherwise it would pick up the bind mount at /backup.

Now, if we do the above by means of NRPE, that’s gonna get you a different result. As I showed above, I have the check_disk command in my nrpe.cfg, I also specifically enabled command arguments during compile time.

1
2
# ./check_nrpe -H nagios.home.barfoo.org -c check_disk -a 20% 5% /apache/
DISK CRITICAL: /apache/ not found

Now, why the hell isn’t it picking up the *original* mount point of the file system ? Guess why … Because I added -E to the command, because it didn’t use the original mount point but rather the bind mount in /backup. Removing the -E and it picks up the *original* mount point without any trouble *shrug*.


Nagios 3 and hostgroup inheritance

Posted on Friday, 8th August, 2008 in Life

As I wrote some time ago, I was trying to utilize Nagios 3.x’s neat feature of “nested” hostgroups. Well, as it turned out I thought it worked differently; basically like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
define hostgroup {
        hostgroup_name      a-parent-hostgroup
        alias               Our toplevel parent hostgroup
}
 
define service {
       use                  generic-service
       check_command        check_dummy!0!
       service_description  SSH
       hostgroup_name       a-parent-hostgroup
}
 
define hostgroup {
        hostgroup_name      a-child-hostgroup
        hostgroup_members   a-parent-hostgroup
        alias               Our child hostgroup
}
 
define service {
       use                  generic-service
       check_command        check_dummy!0!
       service_description  LOAD
       hostgroup_name       a-child-hostgroup
}

As you can cleary see on line 14, I thought you define the relation between two hostgroups in the child hostgroup. The problem with it was basically (as I said in the earlier posts), that all the services defined for the child hostgroups are handed on upwards to the parent hostgroup(s).

But after talking to Tobi, I quickly found out, that the relation is in fact defined within the parent hostgroup. So if you simply put hostgroup_members within the parent hostgroup and define all child hostgroups which should inherit from the parent one, you should be just fine.


Nagios 3.x and check_pcmeasure.pl

Posted on Thursday, 7th August, 2008 in Life

Recently we purchased a MessPC station for our server room, and my co-worker and myself had the wish it to be integrated within Nagios. Well, so far so good. The first I did was put both keywords into Google.

That pretty fast brought up the manufacturer’s page (sorry it’s German only) about the device supporting Nagios by means of either SNMP or a specific plugin called pcmeasure. So I went ahead and tried both ways.

Using SNMP has the advantage that it’s quickly integrated into Nagios and it doesn’t need a separate plugin for that to work. But it also has a huge disadvantage. check_snmp doesn’t support performance data, which is quite handy if you do want to do graphing from Nagios’ results.

Next I tried the pcmeasure plugin. At first it worked great (that is from plain command line), but then I tried to integrate it into Nagios (well, I did integrate it); but got “Plugin did not exit properly”.

Today, after I had the plugin commented out for about two weeks, I finally had time to look at the issue again. First I thought, simply using utils.pm’s error values would be sufficient for ePN to quit yapping, but apparently it had *real* problems with the pod2usage used within.

So I basically rewrote the plugin (well, not really; it’s still the same - but without the pod2usage and working in Nagios 3.0.3).