===== Linux examples =====
Note: All of these Examples are written for/on a Linux box. They can vary on your box, and you may have to adopt the command / evaluation expression. \\
If you want to run them e.g. on a Solaris server you have to do some research on your own ;-)\\
But that's one of the main goals of this example collection: it should awoke your interest to create your own commands.
===== Network =====
I know that these values can be monitored by several SNMP driven plugins. But this approach is very simple and does not need anything to be configured specially. You can combine all network checks in one [[http://www.my-plugin.de/check_multi|check_multi]] call.
=== Network packets incoming ===
check_generic -n "net_pkt_in" -e "IF=`awk '\$2 == "00000000" {print \$1}' /proc/net/route`; grep \$IF /proc/net/dev | awk -F: '{print \$2}' | awk '{print \$2}'" -w '>1000' -c '>2000' -y delta -p "pkt_in"
=== Network packets outgoing ===
check_generic -n "net_pkt_out" -e "IF=`awk '\$2 == "00000000" {print \$1}' /proc/net/route`; grep \$IF /proc/net/dev | awk -F: '{print \$2}' | awk '{print \$10}'" -w '>1000' -c '>2000' -y delta -p "pkt_out"
=== Network bytes incoming ===
check_generic -n "net_bytes_in" -e "IF=`awk '\$2 == "00000000" {print \$1}' /proc/net/route`; grep \$IF /proc/net/dev | awk -F: '{print \$2}' | awk '{print \$1}'" -w '>1000000' -c '>2000000' -y delta -p "bytes_in"
=== Network bytes outgoing ===
check_generic -n "net_bytes_out" -e "IF=`awk '\$2 == "00000000" {print \$1}' /proc/net/route`; grep \$IF /proc/net/dev| awk -F: '{print \$2}' | awk '{print \$9}'" -w '>1000000' -c '>2000000' -y delta -p "bytes_out"
=== Network errors incoming ===
check_generic -n "net_errs_in" -e "IF=`awk '\$2 == "00000000" {print \$1}' /proc/net/route`; grep \$IF /proc/net/dev | sed 's/^.*\$IF://' | awk '{print \$3}'" -w '>5' -c '>10' -y delta -p "errs_in"
=== Network errors outgoing ===
check_generic -n "net_errs_out" -e "IF=`awk '\$2 == "00000000" {print \$1}' /proc/net/route`; grep \$IF /proc/net/dev| sed 's/^.*\$IF://' | awk '{print \$11}'" -w '>5' -c '>10' -y delta -p "errs_out"
=== Parse ifconfig output directly ===
Another approach has kindly been provided by coffy from the German Nagios forum.\\ He directly parses the output of ifconfig to avoid permission problems on the /proc/net files. On the other hand this approach needs the knowledge of the particular interface to check.\\ Example: check network errors
check_generic -n Net_TX_Errors -e "ifconfig eth0 | grep 'TX packets' | cut -d: -f3 | awk '{print \$1}'" -w ">5" -c ">10"
===== Memory / Swap =====
=== Memory free ===
check_generic -n "proc_meminfo_memfree" -e "grep -i memfree /proc/meminfo | awk '{print \$2}'" -w '<5000' -c '<2000' -p "free_KB"
=== Memory dirty ===
check_generic -n "proc_meminfo_dirty" -e "grep -i dirty /proc/meminfo | awk '{print \$2}'" -w '>50000' -c '>100000' -p "dirty_KB"
=== Swap in use ===
check_generic -n "proc_meminfo_swapinuse" -e "grep -i swapcached /proc/meminfo | awk '{print \$2}'" -w '>50000' -c '>100000' -p "swap_KB"
===== Process environment =====
=== Interrupt number ===
check_generic -n "proc_stat_intr" -e "grep -i intr /proc/stat | awk '{print \$2}'" -w '>500' -c '>1000' -y delta -p "intr"
=== Context changes ===
check_generic -n "proc_stat_context" -e "grep -i ctxt /proc/stat | awk '{print \$2}'" -w '>1000' -c '>2000' -y delta -p "ctxt"
=== Blocked processes ===
Processes can be blocked eg waiting for a IO operation to be finished.
check_generic -n "proc_stat_blocked" -e "grep -i procs_blocked /proc/stat | awk '{print \$2}'" -w '>3' -c '>5' -y delta -p "procs"
=== Forks per second ===
check_generic -n forks -e "vmstat -f | awk '{print \$1}'" -c ">20" -w ">10" -y delta -p forks
===== Miscellaneous =====
=== RTC clock ok ===
check_generic -n "proc_driver_rtc" -e "grep -i batt_status /proc/driver/rtc | awk '{print \$3}'" -c '!~/okay/'
=== Smartctl disk /dev/hda ===
check_generic -n smartctl_hda -e "smartctl -H /dev/hda | grep 'test result' | awk '{print \$6}'" -c '!~/PASSED/'
=== ACPI remaining battery capacity (last full capacity in % of design capacity)===
check_generic -n ACPI_battery_capacity -e "DC=`grep 'design capacity:' /proc/acpi/battery/BAT0/info | awk '{print \$3}'`; LC=`grep 'last full capacity:' /proc/acpi/battery/BAT0/info | awk '{print \$4}'`; echo \"$LC/($DC/100)\" | bc" -c "<50" -w "<70"
===== Nagios =====
=== Service checks number ===
check_generic -n nagios_services_number -e "/usr/local/nagios/bin/nagiostats -m -d NUMSERVICES" -w ">1800" -c ">2000" -p "NUMSERVICES"
=== Service checks per minute ===
check_generic -n nagios_services_checks -e "/usr/local/nagios/bin/nagiostats -m -d NUMACTSVCCHECKS1M" -w ">300" -c ">500" -p "NUMACTSVCCHECKS1M"
=== Service check latency ===
check_generic -n nagios_services_latency -e "/usr/local/nagios/bin/nagiostats -m -d AVGACTSVCLAT" -w ">30000" -c ">60000" -p "AVGACTSVCLAT"
=== Host checks number ===
check_generic -n nagios_hosts_number -e "/usr/local/nagios/bin/nagiostats -m -d NUMHOSTS" -w ">400" -c ">500" -p "NUMHOSTS"
=== Host checks per minute ===
check_generic -n nagios_hosts_checks -e "/usr/local/nagios/bin/nagiostats -m -d NUMHSTACTCHK1M" -w ">100" -c ">200" -p "NUMHSTACTCHK1M"
=== Host check latency ===
check_generic -n nagios_hosts_latency -e "/usr/local/nagios/bin/nagiostats -m -d AVGACTHSTLAT" -w ">30000" -c ">60000" -p "AVGACTHSTLAT"