# ceph-nagios-plugins **Repository Path**: mirrors_Ceph/ceph-nagios-plugins ## Basic Information - **Project Name**: ceph-nagios-plugins - **Description**: Nagios plugins for Ceph - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-10-29 - **Last Updated**: 2025-09-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Nagios plugins for Ceph A collection of nagios plugins to monitor a [Ceph][] cluster. ## Authentication Ceph is normally configured to use [cephx] to authenticate its client. To run the `check_ceph_health` or other plugins as user `nagios` you have to create a special keyring: root# ceph auth get-or-create client.nagios mon 'allow r' > ceph.client.nagios.keyring And use this keyring with the plugin: nagios$ ./check_ceph_health --id nagios --keyring ceph.client.nagios.keyring ## check_ceph_health The `check_ceph_health` nagios plugin monitors the ceph cluster, and report its health. Can be filtered to only look at certain [health checks](https://docs.ceph.com/en/latest/rados/operations/health-checks/). ### Usage usage: check_ceph_health [-h] [-e EXE] [-c CONF] [-m MONADDRESS] [-n NAME] [-i ID] [-k KEYRING] [-w WHITELIST] [-d] 'ceph health' nagios plugin. optional arguments: -h, --help show this help message and exit -e EXE, --exe EXE ceph executable [/usr/bin/ceph] -c CONF, --conf CONF alternative ceph conf file -m MONADDRESS, --monaddress MONADDRESS ceph monitor address[:port] -i ID, --id ID ceph client id -n NAME, --name NAME ceph client name -k KEYRING, --keyring KEYRING ceph client keyring file --check CHECK regexp of which check(s) to check (luminous+) Can be inverted, e.g. '^((?!PG_DEGRADED|OBJECT_MISPLACED).)*$' -w, --whitelist REGEXP whitelist regexp for ceph health warnings -d, --detail exec 'ceph health detail' -V, --version show version and exit ### Example nagios$ ./check_ceph_health --name client.nagios --keyring ceph.client.nagios.keyring HEALTH WARNING: 1 pgs degraded; 1 pgs recovering; 1 pgs stuck unclean; recovery 4448/28924462 degraded (0.015%); 2/9857830 unfound (0.000%); nagios$ echo $? 1 nagios$ nagios$ ./check_ceph_health --id nagios --whitelist 'requests.are.blocked(\s)*32.sec' nagios$ ./check_ceph_health --id nagios WARNING: MON_CLOCK_SKEW( clock skew detected on mon.a ) OBJECT_MISPLACED( 1937172/695961284 objects misplaced (0.278%) ) PG_DEGRADED( Degraded data redundancy: 98/695961284 objects degraded (0.000%), 1 pg degraded ) nagios$ ./check_ceph_health --id nagios --check 'PG_DEGRADED|OBJECT_MISPLACED' WARNING: OBJECT_MISPLACED( 1937172/695961284 objects misplaced (0.278%) ) PG_DEGRADED( Degraded data redundancy: 98/695961284 objects degraded (0.000%), 1 pg degraded ) nagios$ ./check_ceph_health --id nagios --check '^((?!PG_DEGRADED|OBJECT_MISPLACED).)*$' WARNING: MON_CLOCK_SKEW( clock skew detected on mon.a ) ## check_ceph_mon The `check_ceph_mon` nagios plugin monitors an individual mon daemon, reporting its status. Possible result includes OK (up), WARN (missing). ### Usage usage: check_ceph_mon [-h] [-e EXE] [-c CONF] [-m MONADDRESS] [-i ID] [-k KEYRING] [-V] [-I MONID] 'ceph quorum_status' nagios plugin. optional arguments: -h, --help show this help message and exit -e EXE, --exe EXE ceph executable [/usr/bin/ceph] -c CONF, --conf CONF alternative ceph conf file -m MONADDRESS, --monaddress MONADDRESS ceph monitor to use for queries (address[:port]) -i ID, --id ID ceph client id -k KEYRING, --keyring KEYRING ceph client keyring file -V, --version show version and exit -I MONID, --monid MONID mon ID to be checked for availability ### Example nagios$ ./check_ceph_mon -I node1 MON OK nagios$ ./check_ceph_mon --monid node2 MON WARN: no mon 'node2' found in quorum ## check_ceph_osd The `check_ceph_osd` nagios plugin monitors an individual osd daemon or host, reporting its status. Possible result includes OK (up), WARN (down or missing). ### Usage usage: check_ceph_osd [-h] [-e EXE] [-c CONF] [-m MONADDRESS] [-i ID] [-k KEYRING] [-V] -H HOST [-I OSDID] [-o] 'ceph osd' nagios plugin. optional arguments: -h, --help show this help message and exit -e EXE, --exe EXE ceph executable [/usr/bin/ceph] -c CONF, --conf CONF alternative ceph conf file -m MONADDRESS, --monaddress MONADDRESS ceph monitor address[:port] -i ID, --id ID ceph client id -k KEYRING, --keyring KEYRING ceph client keyring file -V, --version show version and exit -H HOST, --host HOST osd host -I OSDID, --osdid OSDID osd id -o, --out check osds that are set OUT ### Example nagios$ ./check_ceph_osd -H 172.17.0.2 -I 0 OSD OK nagios$ ./check_ceph_osd -H 172.17.0.2 -I 0 OSD WARN: OSD.0 is down at 172.17.0.2 nagios$ ./check_ceph_osd -H 172.17.0.2 -I 100 OSD WARN: no OSD.100 found at host 172.17.0.2 nagios$ ./check_ceph_osd -H 172.17.0.2 OSD WARN: Down OSD on 172.17.0.2: osd.0 ## check_ceph_rgw The `check_ceph_rgw` nagios plugin monitors a ceph rados gateway, reporting its status and buckets usage. Possible result includes OK (up), WARN (down or missing). ### Usage usage: check_ceph_rgw [-h] [-d] [-B] [-e EXE] [-c CONF] [-i ID] [-V] 'radosgw-admin bucket stats' nagios plugin. optional arguments: -h, --help show this help message and exit -d, --detail output perf data for all buckets -B, --byte output perf data in Byte instead of KB -e EXE, --exe EXE radosgw-admin executable [/usr/bin/radosgw-admin] -c CONF, --conf CONF alternative ceph conf file -i ID, --id ID ceph client id -n NAME, --name NAME ceph client name -V, --version show version and exit ### Example nagios$ ./check_ceph_rgw RGW OK: 4 buckets, 102276 KB total | /=102276KB nagios$ ./check_ceph_rgw --detail --byte RGW OK: 4 buckets, 102276 KB total | /=104730624B bucket-test1=151552B bucket-test0=12288B bucket-test2=104566784B bucket-test=0B ## check_ceph_rgw_api The `check_ceph_rgw_api` nagios plugin monitors a ceph rados gateway, reporting its status and buckets usage. ##### Difference with `check_ceph_rgw`: `check_ceph_rgw` is designed for connect to cluster, `check_ceph_rgw_api` is connected to radosgw directly via [admin api](http://docs.ceph.com/docs/master/radosgw/adminops/). You can check each instance of radosgw or only one endpoint via proxy/balancer (or both). #### Possible results - OK - bucket info recieved from radosgw; - WARNING - connected, but wrong admin entry or usage caps; - UNKNOWN - can't connect to proxy/balancer or radosgw directly; #### Requirements 1. Install [requests-aws](//github.com/tax/python-requests-aws) python library: ``` pip install requests-aws ``` 2. Configure admin entry point (default is 'admin'): ``` rgw admin entry = "admin" ``` 3. Enable admin API (default is enabled): ``` rgw enable apis = "s3, admin" ``` 4. Add capability `buckets=read` for your user who performed checks, see [Admin Guide](http://docs.ceph.com/docs/master/radosgw/admin/#add-remove-admin-capabilities) for more details. ### Usage usage: check_ceph_rgw_api [-h] -H HOST [-k] [-e ADMIN_ENTRY] -a ACCESS_KEY -s SECRET_KEY [-d] [-b] [-v] 'radosgw api bucket stats' nagios plugin. optional arguments: -h, --help show this help message and exit -H HOST, --host HOST Server URL for the radosgw api (example: http://objects.dreamhost.com/) -k, --insecure Allow insecure server connections when using SSL -e ADMIN_ENTRY, --admin_entry ADMIN_ENTRY The entry point for an admin request URL [default is 'admin'] -a ACCESS_KEY, --access_key ACCESS_KEY S3 access key -s SECRET_KEY, --secret_key SECRET_KEY S3 secret key -d, --detail output perf data for all buckets -b, --byte output perf data in Byte instead of KB -v, --version show version and exit ### Example nagios$ ./check_ceph_rgw_api -H https://objects.dreamhost.com/ -a JXUABTZZYHAFLCMF9VYV -s jjP8RDD0R156atS6ACSy2vNdJLdEPM0TJQ5jD1pw RGW OK: 1 buckets, 7696 KB total | /=7696KB nagios$ ./check_ceph_rgw_api -H objects.dreamhost.com -a JXUABTZZYHAFLCMF9VYV -s jjP8RDD0R156atS6ACSy2vNdJLdEPM0TJQ5jD1pw --detail --byte RGW OK: 1 buckets, 7696 KB total | /=7880704B k0ste=7880704B ## check_ceph_df The `check_ceph_df` nagios plugin monitors a ceph cluster, reporting its percentual RAW capacity usage, or specific pool usage. Possible result includes OK, WARN and CRITICAL. ### Usage usage: check_ceph_df [-h] [-e EXE] [-c CONF] [-m MONADDRESS] [-i ID] [-n NAME] [-k KEYRING] [-d] [-W WARN] [-C CRITICAL] [-V] 'ceph df' nagios plugin. optional arguments: -h, --help show this help message and exit -e EXE, --exe EXE ceph executable [/usr/bin/ceph] -c CONF, --conf CONF alternative ceph conf file -m MONADDRESS, --monaddress MONADDRESS ceph monitor address[:port] -i ID, --id ID ceph client id -n NAME, --name NAME ceph client name -k KEYRING, --keyring KEYRING ceph client keyring file -p POOL, --pool POOL ceph pool name -d, --detail show pool details on warn and critical -W WARN, --warn WARN warn above this percent RAW USED -C CRITICAL, --critical CRITICAL critical alert above this percent RAW USED -V, --version show version and exit ### Example nagios$ ./check_ceph_df -i nagios -k /etc/ceph/ceph.client.nagios.keyring -W 29.12 -C 30.22 -d RAW usage 28.36% nagios$ ./check_ceph_df -i nagios -k /etc/ceph/ceph.client.nagios.keyring -W 26.14 -C 30 WARNING: global RAW usage of 28.36% is above 26.14% (783G of 1093G free) nagios$ ./check_ceph_df -i nagios -k /etc/ceph/ceph.client.nagios.keyring -W 60 -C 70 -p hdd CRITICAL: Pool 'hdd' usage of 71.71% is above 70.0% (9703G used) nagios$ ./check_ceph_df -i nagios -k /etc/ceph/ceph.client.nagios.keyring -W 60 -C 70 -p nvme CRITICAL: Pool 'nvme' usage of 76.08% is above 70.0% (223G used) nagios$ ./check_ceph_df -i nagios -k /etc/ceph/ceph.client.nagios.keyring -W 26.14 -C 30 -d WARNING: global RAW usage of 28.36% is above 26.14% (783G of 1093G free) POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd 0 96137M 8.59 348G 24441 cephfs_data 1 61785M 5.52 348G 99940 cephfs_metadata 2 40380k 0 348G 8037 libvirt-pool 3 145 0 348G 2 ## check_ceph_mds The `check_ceph_mds` nagios plugin monitors an individual mds daemon, reporting its status. Possible result includes OK, WARN (laggy) and Error (not found). ### Usage usage: check_ceph_mds [-h] [-e EXE] [-c CONF] [-m MONADDRESS] [-i ID] [-k KEYRING] [-V] -n NAME -f FILESYSTEM 'ceph mds stat' nagios plugin. optional arguments: -h, --help show this help message and exit -e EXE, --exe EXE ceph executable [/usr/bin/ceph] -c CONF, --conf CONF alternative ceph conf file -m MONADDRESS, --monaddress MONADDRESS ceph monitor to use for queries (address[:port]) -i ID, --id ID ceph client id -k KEYRING, --keyring KEYRING ceph client keyring file -V, --version show version and exit -n NAME, --name NAME mds daemon name -f FILESYSTEM, --filesystem FILESYSTEM mds filesystem name ### Example nagios$ ./check_ceph_mds -f cephfs -n ceph-mds-1 MDS OK: MDS 'ceph-mds-1' is up:active nagios$ ./check_ceph_mds -f cephfs -n ceph-mds-2 MDS OK: MDS 'ceph-mds-2' is up:standby nagios$ ./check_ceph_mds -f cephfs -n ceph-mds-1 MDS WARN: MDS 'ceph-mds-1' is up:active (laggy or crashed) nagios$ ./check_ceph_mds -f cephfs -n ceph-mds-3 MDS ERROR: MDS 'ceph-mds-3' is not found (offline?) ## check_ceph_mgr The `check_ceph_mgr` nagios plugin monitors the mgr. ### Usage usage: check_ceph_mgr [-h] [-e EXE] [-c CONF] [-m MONADDRESS] [-i ID] [-n NAME] [-k KEYRING] [-V] 'ceph mgr dump' nagios plugin. optional arguments: -h, --help show this help message and exit -e EXE, --exe EXE ceph executable [/usr/bin/ceph] -c CONF, --conf CONF alternative ceph conf file -m MONADDRESS, --monaddress MONADDRESS ceph monitor to use for queries (address[:port]) -i ID, --id ID ceph client id -n NAME, --name NAME ceph client name -k KEYRING, --keyring KEYRING ceph client keyring file -V, --version show version and exit ### Example nagios$ ./check_ceph_mgr MGR OK: active: zhdk0013, standbys: zhdk0009, zhdk0025 ## check_ceph_osd_db The `check_ceph_osd_db` checks the percentage usage of the BlueStore DB for the OSD and reports it as critical if it's above the threshold. [ceph]: http://www.ceph.com [cephx]: http://ceph.com/docs/master/rados/operations/authentication/