The usual IT babble
Posts tagged Nagios
Monitoring Brocade FC switches with SNMP/Nagios
Dec 21st
I looked into the mess a bit more, and as it turns out, the weird crap I was talking about only happens if you have a port with LossofSynchronization, LossofSignal or LinkFailures value with the base of ten (i.e. 10, 101 or 10.000).
Additionally, the OID’s for those three failure elements seem to be dependent on the firmware version, as with 6.3.x they appear as different OIDs. So I may need to introduce another command-line switch, which selects the firmware version and depending on that, the OID.
Even despite those problems I just described, I ended up using the plugin to watch our SAN infrastructure. I even wrote a simple pnp4nagios template, so all the data would show up in a single graph and not a graph per data source.
Monitoring Brocade FC switches with Nagios
Nov 23rd
The last four days I spent looking for ways on monitoring a Brocade Fibrechannel switch (in my case IBM 2145 B32/F40). The first thing I came up with, is using SNMP. As it was already configured for the previous monitoring with Munin, getting information should be quite easy. After looking through Google for a bit, there is already one script that worked for me.
Only trouble I had with that script, is that it crams every single port into one result. As I wanted something, that a) could watch a single port and b) return performance data, I went ahead an used the script to do a basic rewrite. But after a short while, I grew antsy and started writing a script from scratch, using the OIDs I got from that script and a Cacti template.
More >
Configuring nagios-plugins-zypper
Nov 12th
Since I’m running check_zypper via nrpe (which in turn runs as nobody), I need to set up sudo. In order for the plugin to work, we need to add the following line to /etc/sudoers (by means of visudo):
nobody ALL = NOPASSWD: /usr/bin/zypper sl, /usr/bin/zypper --non-interactive --no-gpg-checks --terse list-updates
(Keep in mind this needs to be a single line …)
Praxisbuch Nagios by Tobias Scherbaum
Jul 25th
Tobi recently finished writing yet another book, which he also talked about in a blog post. Shortly after, I asked him a rather curious question. What exactly is the plant or animal on the cover of the book ? He was kind enough to send a voucher copy of the book my way.
He actually mentions it in the credits at the beginning of the book. Turns out it is an animal, a sea pen or sea feather (I’m guessing at Pennatula aculeata).
Now as for the content of the book itself, I do have to admit that I haven’t read the whole book. I just picked a few topics (SNMP-Traps with Nagios, notifications) which I did find rather well written. My (soon ex-) trainee, Michel, however already bugged Tobias about some errors in the book itself, or rather some changes which happened after 3.0.3 (that’s the Nagios version the book is based on).
All in all, I guess I can congratulate Tobias on yet another good written book!
Nagios: Service Check Timed Out
Apr 3rd
Since I got the pleasure of watching some Windows boxen with Nagios, I took the Windows Update plugin from Michal Jankowski and implemented it. It took me some time, to initially set up the nsclient++ correctly so it just works, but up till now the check plugin sometimes reported the usual “Service Check Timed Out”.
Usually I ended up increasing the cscript timeout, or the nsclient++ socket timeout, but it still kept showing up. Since I rely heavily on my surveillance tools, I have the demand, that as few as possible false positives show up. So I ended up chasing down this error today, and after that I have to say it was quite simple.
In my case, it wasn’t cscript (that timeout is set to 300 seconds), neither nsclient++ (socket timeout is set to 300 seconds too), nor the nrpe plugin itself (that has 300 seconds as well).
As it turns out, Nagios got an additional setting controlling these things, called service_check_timeout which defaults to 60 seconds. Sadly the plugin, or rather Windows needs longer than those 60 seconds to figure out whether or not it needs updating, thus Nagios is killing the plugin and returning a CRITICAL message.
After increasing the value of service_check_timeout that’ll be fixed hopefully.

