The usual IT babble
Posts tagged Nagios
Nagios: SNMP OID’s for IBM’s RSA II adapter
Apr 1st
Well, after some poking around I finally found some OID’s for the RSA’s (only through these two links: check_rsa_fan and check_rsa_temp).
For Nagios, I dismissed the fans, since the fan speed is only passed on in percent values. So I only added this:
define hostgroup{
hostgroup_name rsa-snmp
alias Remote Supervisor Adapter (allowing SNMP connections)
}
define service{
use generic-perfdata
check_command check_rsa_snmpv1_public!.1.3.6.1.4.1.2.3.51.1.2.1.2.1.1!45!60!°C!Temperature CPU0!
hostgroup_name rsa-snmp
service_description TEMP CPU0
}
define service{
use generic-perfdata
check_command check_rsa_snmpv1_public!.1.3.6.1.4.1.2.3.51.1.2.1.2.2.1!45!60!°C!Temperature CPU1!
hostgroup_name rsa-snmp
service_description TEMP CPU1
}
define service{
use generic-perfdata
check_command check_rsa_snmpv1_public!.1.3.6.1.4.1.2.3.51.1.2.1.5.1.0!29!35!°C!Temperature Ambient!
hostgroup_name rsa-snmp
service_description TEMP AMBIENT
}
Oh, and if anyone else is curious like me, here’s the list with the OID’s, courtesy of Gerhard Gschlad and Leonardo Calamai.
For the fans:
Fan1: .1.3.6.1.4.1.2.3.51.1.2.3.1.0
Fan2: .1.3.6.1.4.1.2.3.51.1.2.3.2.0
Fan3: .1.3.6.1.4.1.2.3.51.1.2.3.3.0
Fan4: .1.3.6.1.4.1.2.3.51.1.2.3.4.0
Fan5: .1.3.6.1.4.1.2.3.51.1.2.3.5.0
Fan6: .1.3.6.1.4.1.2.3.51.1.2.3.6.0
Fan7: .1.3.6.1.4.1.2.3.51.1.2.3.7.0
Fan8: .1.3.6.1.4.1.2.3.51.1.2.3.8.0
Fan9: .1.3.6.1.4.1.2.3.51.1.2.3.9.0
Fan10: .1.3.6.1.4.1.2.3.51.1.2.3.10.0
Fan11: .1.3.6.1.4.1.2.3.51.1.2.3.11.0
Fan12: .1.3.6.1.4.1.2.3.51.1.2.3.12.0
And for the temperatures:
CPU1: .1.3.6.1.4.1.2.3.51.1.2.1.2.1.1
CPU2: .1.3.6.1.4.1.2.3.51.1.2.1.2.2.1
CPU3: .1.3.6.1.4.1.2.3.51.1.2.1.2.3.1
CPU4: .1.3.6.1.4.1.2.3.51.1.2.1.2.4.1
Ambient: .1.3.6.1.4.1.2.3.51.1.2.1.5.1.0
I just found a proper list of OID’s for the IBM RSA adapter. That’s rather nice, since I really was looking for the OID’s for the VRM failure OID and other warning/critical events.
Nagios: check_snmp again
Feb 27th
Well, today I had to grind my head again, regarding the way check_snmp handles WARNING and CRITICAL events. From my point of view, check_snmp is really just retarded sometimes.
As you know, all the other plugins accept WARNING and CRITICAL-thresholds based on the calculation, if the return integer is above this threshold it reached WARNING/CRITICAL state. But check_snmp doesn’t play that way.
It expects only ranges, which are NOT gonna result in warning or critical events. Which is kinda stupid, since you gotta rethink twice about the thresholds
define service {
use generic-service
host_name ibm-bc1-mgmt
service_description Chassis Cooling - Bay 1
check_command check_snmpv1_public!.1.3.6.1.4.1.2.3.51.2.2.3.20.0!\
1900:8000!1900:0,10000:8000!\
RPM!Chassis Cooling - Bay 1
action_url /pnp/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$
notes View PNP RRD grap
}
All in all, another lesson learned
Nagios: NSclient++ in a clustered Environment
Feb 26th
Well, most of you already know that I’m a Nagios fanatic. I like to watch as many aspects as I possibly can. So, yesterday I started figuring out ways to watch our different cluster groups (housing a bunch — try above 20.000 — of file shares).
Now, my first tries failed horribly. I brought down a complete cluster group, resulting in a major annoyance. Now, today I went at it a bit smarter
I cloned myself two VM’s off my Windows Server 2003 Enterprise R2 template, created a new cluster.
After that, I tried it on the test cluster again, same result. The resource is successfully created, but once I try to take it online, it breaks and moves the whole cluster group to the other node (as cyclic moving between the cluster nodes with no end).
After that, I figured something has to be wrong with the command I’m trying to use, the one as instructed by the NSClient++ wiki. I then tried the command on the command line, but as soon as hitting <TAB> (oooold bash habit
), it completed the path, but put quotes around it … Don’t ask me.
If I try the path without the quotes, no-joy at all. Once you put quotes around it, everything becomes honky-dory and the resource comes online without the slightest trouble!
Hint to self: When creating a NSClient++ cluster resource (or any application resource using a command that needs switches for that matter), use a quoted command line along the lines of this:
"Q:\_nsclient\nsclient.exe" /test
Nagios: Integrating Cisco switches
Feb 13th
Well, as I wrote recently, we received a new BladeCenter a few weeks back. Now, as we slowly take it into service I was interested in watching the utilization of the back planes as well as the CPU utilization of the Cisco Catalyst 3012 network switches.
The first mistake I made, was to trust Cisco with their guide about how to get the utilization from the device using SNMP. They stated some OID’s, which I tried with snmpwalk and got a result from.
snmpwalk -v1 -c public -O n 10.0.0.35 .1.3.6.1.4.1.9.5.1.1.8
.1.3.6.1.4.1.9.5.1.1.8.0 = INTEGER: 0
Now, as I tried retrieving the SNMP data by means of the check_snmp plugin, I got some flaky results:
/usr/lib/nagios/plugins/check_snmp -H 10.0.0.35 -C public \
.1.3.6.1.4.1.9.5.1.1.8
SNMP problem - No data received from host
CMD: /usr/bin/snmpget -t 1 -r 5 -m '' -v 1 [authpriv] 10.0.0.35:161
Those of you, who read the excerpts carefully will notice the difference between snmpwalk and the OID I passed on to check_snmp.
The point being, the OID’s Cisco gave in their Design tech notes are either old, or just not accurate at all. After passing on the .0 to each value given by Cisco, the check_snmp is all honky dory and integrated into Nagios.
As usual, the Nagios definitions are further down, for those interested. More >
Monitoring the IBM BladeCenter chassis with Nagios
Feb 10th
Today I ended up working out the details on what we want to monitor regarding our BladeCenter. The most interesting details (for us that is) are these:
- Fan speeds for Chassis Cooling/Power Module Cooling Bay(s)
- Temperature
- Power Domain utilization
It wasn’t *that* hard to implement. Only trouble(s) I ran into, were (1) IBM did a real shitty job with the MIB’s. If you look closely into the mmblade.mib, you’re gonna notice, that not a single OID is specified for the events. (2) As the MIB’s weren’t documented anywhere, I had to look them up via snmpwalk (which I had never used before). So as a reminder (to myself), here’s how it is done:
snmpwalk -v1 -c public -O n 10.0.0.35 .1.3.6.1.4.1.2.3.51.2.2
This will get you a list, with a lot of output (5154 lines to be exact). Lucky me, the web interface of the management module/ssh interface is rather verbose, so all you need to do is compare those values with what you are looking for.
So for myself (and anyone interested) read ahead for the list of checks we are currently running on the management module. More >