===== check_generic in 30 minutes - step by step tutorial =====
//check_generic// could be the swiss knife in your monitoring tool box, but you have to endeavor to find its real capabilities and strengths. Let's have a short session and see what we can do with //check_generic//.
=== 1. Always test the stuff as 'nagios' user ===
First do what you always should do when exploring plugins: login as 'nagios' user. Lots of errors occur while the people are testing as root and are wondering while the plugins are behaving different under UID 'nagios' in the real productive operation.
# su - nagios
$ _
Secondly we want to have a brief overview and just call the plugin without any option
$ ./check_generic
check_generic error: no commandline specified
check_generic -e -o|u|w|c [-f false_state] [-n name] [-t timeout] [-r level]
check_generic [-h | --help]
check_generic [-V | --version]
Good start. ;-) Complains about a missing command. Time to fix it...\\
=== 2. check_generic --execute "command" --critical "perl expr" ===
OK, let's begin with the two parts you need for every call of //check_generic//:
- ''-e/--execute'' to select the command to be run and
- ''-c/--critical'' to define the condition which makes the plugins state critical.
It can be also ''-w/--warning'' or ''-u/--unknown'' but anyway: we start small and grow lateron.\\
=== 3. Nagiostats to monitor Nagios itself ===
Now we're looking for something to monitor. (Normally you should know this before you begin writing or configuring a plugin ;-)).
What's about monitoring Nagios itself? There's a small program called [[http://nagios.sourceforge.net/docs/3_0/nagiostats.html|nagiostats]] which is part of each nagios installation. Now lets see what we can do with it.\\
If you start ''nagiostats'' on a running Nagios system it gives lots of figures which describe the number of checks and the performance of the whole Nagios system.\\ We want to concentrate on the performance, and this is described by the latency. //Latency// for a service means that Nagios schedules a check for a certain service. But mostly the service check is executed a bit later than scheduled. The difference between the time scheduled and really executed is the Service Check Latency.\\
The next step for our check is to extract this figure from nagiostats output. We could do this with the small Unix command line
$ /usr/local/nagios/bin/nagiostats | grep "Active Service Latency:" | awk '{print $8}'
But there is a better way - we can use the MRTG output option
$ /usr/local/nagios/bin/nagiostats -m -d AVGACTSVCLAT
It returns the average latency in milliseconds. So now try it:
$ ./check_generic -e "/usr/local/nagios/bin/nagiostats -m -d AVGACTSVCLAT"
Sh...its still complaining something like \\ ''Sorry Dave. No evaluation expression specified''.\\
Okidok - now comes the trick:
=== 4. Perl expression to evaluate the command output ===
Every perl expression is allowed to evaluate the command output.
* You can do a ">100" or a "<50" if you have numerical comparison.
* If you want to check a string, just take "eq abc".
* Regular expressions are allowed: "=~/perl-regex/"
For our example we begin with 1 minute latency, which is 60000 milliseconds:
$ ./check_generic -e "/usr/local/nagios/bin/nagiostats -m -d AVGACTSVCLAT" -c ">60000"
By the way: for my opinion this threshold notation is much easier and much less confusing like the original nagios threshold mimik with -c "2:5"...
But no more time to lose, lets see what our plugin is doing:
$ ./check_generic -e "/usr/local/nagios/bin/nagiostats -m -d AVGACTSVCLAT" -c ">60000"
CHANGEME OK - result:533 match:none
What does it mean?
Our plugin has done a simple perl evaluation: "533>60000" -> false \\
But see the details of the result ''CHANGEME OK - result:533 match:none'':
* CHANGEME - you can define a name here with the -n option
* OK - there is no critical state, everything is fine
* result: 533 - the current service check latency is 533ms
* match:none - there is no match against any (here: the critical) threshold
=== 5. Congretulation, your first check_generic monitoring is running ===
Now lets enhance it a little bit. First of all we want also a warning threshold. Just add
-w ">30000". OK, we want to see something. So we try the following commandline (we now also have a name for our check!)
$ ./check_generic -n nagios_service_latency -e "/usr/local/nagios/bin/nagiostats -m -d AVGACTSVCLAT" -c ">60000" -w ">500"
nagios_service_latency WARNING - result:616 match:>500 severities:warning
Wow, there's something new:
* The plugin has a name: nagios_service_latency (due to the -n option)
* it has the state WARNING
* it matches against the rule ">500" (see match:>500)
* severities:warning means that the warning rule matched here. It can be more than one rule that is matching.
=== 6. You can now add this check to your Nagios config now ===
That's it. It took some time, but tell me: was it really difficult? ;-)
For the end some hints to 'configure' your //check_generic// settings:
* Logon as 'nagios'. Just play with the plugin until your config fits.
* If you are running a more complicated command, first check this command outside //check_generic// until it works.
* Play with the thresholds to provoke warning and critical events. Just to see that everything is working.
* Enjoy (and have a look onto the [[projects:check_generic:examples:linux]] page).