# check PostgreSql server ## Introduction **check_psqlserver** is a plugin execute different checks on a postgreSql server instance. The kind of check is defined by a parameter `-m METHOD`. ## Requirements * psql (cli tool) * The icinga user needs to connect to the database server (see Installation). ## Syntax `$ check_psqlserver [-i|-u|-m METHOD]` ```txt ./check_psqlserver -h ______________________________________________________________________ CHECK_PSQLSERVER :: v0.7 (c) Institute for Medical Education - University of Bern Licence: GNU GPL 3 ______________________________________________________________________ USAGE: check_psqlserver [OPTIONS] -m METHOD OPTIONS: -h this help -i install monitoring user (must be executed as root) -u uninstall monitoring user (must be executed as root) PARAMETERS: -m method; valid methods are: activity Count running processes and queries conflicts Count of detected conflicts dbrows Count of database row actions diskblock Count of diskblocks physically read or coming from cache problems Count of problems and troublemakers replication Replication status and lag time transactions Count of transactions over all databases EXAMPLES: check_psqlserver -i check_psqlserver -m activity ``` ## Installation To give access to the database there is the param `-i`(for "install"). This command needs to be run as root. ```txt ./check_psqlserver -i ``` This creates a database user "icingamonitor" with a 64 byte random password. To store the credentials a file file be created: `/etc/icingaclient/.psql.conf`. It is a shell script that will be sourced by check_psqlserver. ```shell cat /etc/icingaclient/.psql.conf # # generated on Thu Jun 8 03:44:06 CEST 2023 # export PGUSER=icingamonitor export PGPASSWORD=87B9jUcRp38DgkLDL3uHzC3V8YjgX6KZMxIxqpWieTWWIBgNoFYt8yvK9Y8RmdL0 export PGHOST=localhost # set default database because a user db won't be created export PGDATABASE=postgres ``` To test the connection run `./check_psqlserver -m activity`. If the config was written and the connect fails then search for pg_hba.conf (/var/lib/pgsql/data/pg_hba.conf or /etc/postgresql/13/main/pg_hba.conf). If local authentication for ipv4 and v6 is set to "ident" ```txt host all all 127.0.0.1/32 ident ``` ... try to set it to "md5" and restart the pgsql service. ## Checks The checks are done on the server and summarize data from statistic tables for all databases. see <https://www.postgresql.org/docs/current/monitoring-stats.html> If you need to troubleshot and want to see which of your databases causes the trouble you can execute the statement which is ggiven in the hint. ### activity Show count of running processeses and sum of process states. Possible states in pg_stat_activity are: * active: The backend is executing a query. * idle: The backend is waiting for a new client command. * idle in transaction: The backend is in a transaction, but is not currently executing a query. * idle in transaction (aborted): This state is similar to idle in transaction, except one of the statements in the transaction caused an error. * fastpath function call: The backend is executing a fast-path function. * disabled: This state is reported if track_activities is disabled in this backend. The check summarizes: * total - the total count of all processes * active - processes with state "active" * idle - processes with state "idle", "idle in transaction" and "idle in transaction (aborted)" * fastpath - processes with state "fastpath function call" * other - count of psql base processes having no value in state column The state of the check is always "OK". To analyze a troublemaker on high number of processes run `select * from pg_stat_activity` to see the queries and the database name. Example output: ```txt ./check_psqlserver -m activity OK: Pgsql activity :: Running total: 33 ... active: 2 idle: 25 fastpath: 0 other: 6 Hint: Query for details: select * from pg_stat_activity. |running-total=33;; running-active=2;; running-idle=25;; running-fastpath=0;; running-other=6;; ``` ### conflicts Show number of detected conflicts from pg_stat_database_conflicts. The values are counters. Therefor there is a calculation per minute to find newly occured changes. The columns in pg_stat_database_conflicts are: * confl_tablespace bigint - Number of queries in this database that have been canceled due to dropped tablespaces * confl_lock bigint - Number of queries in this database that have been canceled due to lock timeouts * confl_snapshot bigint - Number of queries in this database that have been canceled due to old snapshots * confl_bufferpin bigint - Number of queries in this database that have been canceled due to pinned buffers * confl_deadlock bigint - Number of queries in this database that have been canceled due to deadlocks The check summarizes all conflicts of all databases. The check switches to "critical" if one of the delta values per min is <> 0. Example output: ```txt ./check_psqlserver -m conflicts OK: Pgsql conflicts :: ... OK, nothing was found confl_tablespace : 0 ... delta = 0 per min confl_lock : 0 ... delta = 0 per min confl_snapshot : 0 ... delta = 0 per min confl_bufferpin : 0 ... delta = 0 per min confl_deadlock : 0 ... delta = 0 per min Hint: Query for details per database: select * from pg_stat_database_conflicts. |confltablespace=0;; confllock=0;; conflsnapshot=0;; conflbufferpin=0;; confldeadlock=0;; ``` ### dbrows Count of database row actions. From pg_stat_database we read the following columns and add them for all databases. * tup_fetched bigint - Number of live rows fetched by index scans in this database * tup_inserted bigint - Number of rows inserted by queries in this database * tup_updated bigint - Number of rows updated by queries in this database * tup_deleted bigint - Number of rows deleted by queries in this database The values are counters. Therefor there is a calculation per sec to find current changes. The state of the check is always "OK". Example output: ```txt ./check_psqlserver -m dbrows OK: Pgsql dbrows :: Count of database row actions tup_returned : 744761392127 ... delta = 0 per sec tup_fetched : 21597149760 ... delta = 0 per sec tup_inserted : 42683720 ... delta = 0 per sec tup_updated : 1282902 ... delta = 0 per sec tup_deleted : 1878883 ... delta = 0 per sec Hint: Query for details per database: select * from pg_stat_database. |tupreturned=0;; tupfetched=0;; tupinserted=0;; tupupdated=0;; tupdeleted=0;; ``` ### diskblock Count of diskblocks physically read or coming from cache From pg_stat_database we read the following columns and add them for all databases. * blks_read bigint - Number of disk blocks read in this database * blks_hit bigint - Number of times disk blocks were found already in the buffer cache, so that a read was not necessary (this only includes hits in the PostgreSQL buffer cache, not the operating system's file system cache) The values are counters. Therefor there is a calculation per sec to find current changes. The state of the check is always "OK". Example output: ```txt ./check_psqlserver -m diskblock OK: Pgsql diskblock :: Count of diskblocks physically read or coming from cache (from pg_stat_database) blks_read : 20391240674 ... delta = 439 per sec blks_hit : 103015908887 ... delta = 6549 per sec Hint: Query for details per database: select * from pg_stat_database. |blksread=439;; blkshit=6549;; ``` ### problems Problems and troublemakers From pg_stat_database we read the following columns and add them for all databases. * conflicts bigint - Number of queries canceled due to conflicts with recovery in this database. (Conflicts occur only on standby servers; see pg_stat_database_conflicts for details.) * deadlocks bigint - Number of deadlocks detected in this database * checksum_failures bigint - Number of data page checksum failures detected in this database (or on a shared object), or NULL if data checksums are not enabled. * temp_files bigint - Number of temporary files created by queries in this database. All temporary files are counted, regardless of why the temporary file was created (e.g., sorting or hashing), and regardless of the log_temp_files setting. * temp_bytes bigint - Total amount of data written to temporary files by queries in this database. All temporary files are counted, regardless of why the temporary file was created, and regardless of the log_temp_files setting. The values are counters. Therefor there is a calculation per min to find current changes. The state of the check switches to critical if a minimum problem was detected in the delta value. Example output: ```txt ./check_psqlserver -m problems OK: Pgsql problems :: Problems and troublemakers (from pg_stat_database) ... OK, nothing was found conflicts : 0 ... delta = 0 per min deadlocks : 0 ... delta = 0 per min checksumfailures : 0 ... delta = 0 per min temp_files : 5617739 ... delta = 0 per min temp_bytes : 2896790521548 ... delta = 0 per min Hint: Query for details per database: select * from pg_stat_database. |conflicts=0;; deadlocks=0;; checksumfailures=0;; tempfiles=0;; tempbytes=0;; ``` ### replication Replication status. It shows the defined replication and their status. Aditionally it fetches the maximum lag of write, flush and replay of all replications. The state of the check switches "warning" if ... * one of the replications is not "streaming" * the maximum lag is larger 1 sec (just experimental). Example output: ```txt ./check_psqlserver -m replication OK: Pgsql replication :: status OK: all replications have the state 'streaming'. OK: maximum lag is 0.2077 sec (below 1 sec). application_name : client_addr : state : write_lag : flush_lag : replay_lag : max_lag : sync_state ------------------+----------------+-----------+-----------------+-----------------+---------------+---------------+------------ psqlbackup : 192.168.10.21 : streaming : 00:00:00.000673 : 00:00:00.003771 : 00:00:00.2077 : 00:00:00.2077 : async (1 row) Hint: Query for details per replication: select * from pg_stat_replication. |total=1;; state-streaming=1;; state-other=0;; max-lag=0.2077;; ``` ### transactions Count of transactions over all databases From pg_stat_database we fetch these columns and summarize it for all database: * xact_commit bigint - Number of transactions in this database that have been committed * xact_rollback bigint - Number of transactions in this database that have been rolled back The values are counters. Therefor there is a calculation per sec to show the current speed. The state of the check is always "OK". Example output: ```txt ./check_psqlserver -m transactions OK: Pgsql transactions :: Count of transactions over all databases commit : 380856524 ... delta = 0 per sec rollback : 13173 ... delta = 0 per sec Hint: Query for details per database: select * from pg_stat_database. |commit=0;; rollback=0;; ``` ## Run a query on command line As root or icingaclient user you can read the configuration for the database monitoring user (created with param `-i`). In a terminal you can source the created config file. Then run a query using psql. Example: ```txt . /etc/icingaclient/.psql.conf psql -c 'select * from pg_stat_activity' ```