mars/contrib/Nagios/mars.rules

66 lines
3.6 KiB
Plaintext

# Config file for mars_check.sh
#
# This file is part of MARS project: http://schoebel.github.io/mars/
#
# Copyright (C) 2015 Thomas Schoebel-Theuer
# Copyright (C) 2015 1&1 Internet AG
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright
# notice and this notice are preserved. This file is offered as-is,
# without any warranty.
# For each variable $Var (as documented in "mars_check.sh --help"), the following relatives are always defined:
#
# $LastVar the old value from the last run of mars_check.sh (whenever it was called; recommendation: 5 minutes)
# $DeltaLastVar the difference between $Var and $LastVar
# $RateLastVar the $DeltaLastVar normalized to the elapsed time (unit: per minutes)
#
# $MediumVar the old value from a medium-term run of mars_check.sh ($window_medium, default 3600s)
# $DeltaMediumVar the difference between $Var and $MediumVar
# $RateMediumVar the $DeltaMediumVar normalized to the elapsed time (unit: per minutes)
#
# $LongtermVar the old value from a longterm run of mars_check.sh ($window_longterm, default 24h)
# $DeltaLongtermVar the difference between $Var and $LongtermVar
# $RateLongtermVar the $DeltaLongtermVar normalized to the elapsed time (unit: per minutes)
# The first number in each line is the priority class.
# Lower number = higher priority = takes precence over higher class numbers
# Exception: 0 means that the check will appear unconditionally (class is irrelevant)
#
# Hint: checks can be simply disabled by commenting them out
#####################################################################################
# List of global checks
1 ModuleLoaded <= 0 CRITICAL: mars module is not loaded
2 Responsive <= 0 CRITICAL: mars_light thread is not responsive / possibly hanging
5 SpaceRest <= 4 CRITICAL: only $SpaceRest GiB left on /mars/
6 SpacePercent >= 70 CRITICAL: Used space on /mars/ is $SpacePercent %
7 SpacePercent >= 30 WARNING: Used space on /mars/ is $SpacePercent %
#####################################################################################
# List of local checks = per resource. The resource name can be substituted via $res
# all hosts
10 AliveAge >= 300 CRITICAL: resource $res: primary host ${Designated[$res]} is not reachable for $AliveAge seconds
11 Alive <= 0 WARNING: resource $res: primary host ${Designated[$res]} is not reachable
12 Emergency >= 1 CRITICAL: resource $res is in emergency mode, too less space on /mars/
13 SplitBrain >= 1 CRITICAL: split brain on $res detected
# only secondaries
30 Sync <= 0 WARNING: resource $res sync is switched off
31 Fetch <= 0 WARNING: resource $res fetch is switched off
32 Replay <= 0 WARNING: resource $res replay is switched off
40 SyncRest >= 999999 WARNING: resource $res SyncRest=${SyncRest[$res]} is too large
41 FetchRest >= 999999 WARNING: resource $res FetchRest=${FetchRest[$res]} is too large
42 ReplayRest >= 999999 WARNING: resource $res ReplayRest=${ReplayRest[$res]} is too large
50 DeltaLastSyncRest <= 99999 && SyncRest >= 1 WARNING: resource $res SyncRest=${SyncRest[$res]} sync has stopped
51 DeltaLastFetchRest <= 99999 && FetchRest >= 1 WARNING: resource $res FetchRest=${FetchRest[$res]} fetch has stopped
52 DeltaLastReplayRest <= 99999 && ReplayRest >= 1 WARNING: resource $res ReplayRest=${ReplayRest[$res]} replay has stopped