diff --git a/check_psqlserver b/check_psqlserver index 16e823ffb6f82db8f80e06603a38f68cc4c57dd5..59af0f07dfc55be5018722b0156dfe291741ebf2 100755 --- a/check_psqlserver +++ b/check_psqlserver @@ -18,12 +18,13 @@ # 2023-06-08 v0.4 <axel.hahn@unibe.ch> get summary for cronflicts and problems # 2023-06-09 v0.5 <axel.hahn@unibe.ch> deltaunit can be set as parameter # 2023-06-13 v0.6 <axel.hahn@unibe.ch> no output on activity; update replication check +# 2023-06-16 v0.7 <axel.hahn@unibe.ch> update help text # ====================================================================== . $(dirname $0)/inc_pluginfunctions self_APPNAME=$( basename $0 | tr [:lower:] [:upper:] ) -self_APPVERSION=0.6 +self_APPVERSION=0.7 # --- other vars... cfgfile=/etc/icingaclient/.psql.conf @@ -126,12 +127,12 @@ OPTIONS: PARAMETERS: -m method; valid methods are: - activity running processes and queries - conflicts Detected conflicts from pg_stat_database_conflicts + activity Count running processes and queries + conflicts Count of detected conflicts dbrows Count of database row actions diskblock Count of diskblocks physically read or coming from cache - problems Problems and troublemakers - replication Replication status (table output only) + problems Count of problems and troublemakers + replication Replication status and lag time transactions Count of transactions over all databases EXAMPLES: @@ -392,7 +393,7 @@ case "${sMode}" in ;; *) - echo ERRROR: [${sMode}] is an INVALID mode + echo "ERRROR: [${sMode}] is an INVALID mode" _usage ph.abort diff --git a/docs/20_Checks/_index.md b/docs/20_Checks/_index.md index 8187155a706a938f9866cf222c13cb9fc8637788..0b318f19f2ead0664bfaaa1428dff82844205fc6 100644 --- a/docs/20_Checks/_index.md +++ b/docs/20_Checks/_index.md @@ -32,6 +32,7 @@ There is one include script used by all checks: * check_proc_mem * check_proc_ressources * check_proc_zombie +* [check_psqlserver](check_psqlserver.md) * [check_reboot_required](check_reboot_required.md) * check_sensuplugins * check_smartstatus diff --git a/docs/20_Checks/check_psqlserver.md b/docs/20_Checks/check_psqlserver.md index b343d528d9b47903684e2ebd876a4ff6a3afda2c..50f468d4aab5a3a45da60d221d42182463c5852a 100644 --- a/docs/20_Checks/check_psqlserver.md +++ b/docs/20_Checks/check_psqlserver.md @@ -3,21 +3,22 @@ ## Introduction **check_psqlserver** is a plugin execute different checks on a postgreSql server instance. -The kind of check is defined by a paameter `-m METHOD`. +The kind of check is defined by a parameter `-m METHOD`. ## Requirements -The icinga user needs to connect to the database server. +* psql (cli tool) +* The icinga user needs to connect to the database server (see Installation). ## Syntax `$ check_psqlserver [-i|-u|-m METHOD]` ```txt -./check_psqlserver +./check_psqlserver -h ______________________________________________________________________ -CHECK_PSQLSERVER :: v0.6 +CHECK_PSQLSERVER :: v0.7 (c) Institute for Medical Education - University of Bern Licence: GNU GPL 3 @@ -33,12 +34,12 @@ OPTIONS: PARAMETERS: -m method; valid methods are: - activity running processes and queries - conflicts Detected conflicts from pg_stat_database_conflicts + activity Count running processes and queries + conflicts Count of detected conflicts dbrows Count of database row actions diskblock Count of diskblocks physically read or coming from cache - problems Problems and troublemakers - replication Replication status (table output only) + problems Count of problems and troublemakers + replication Replication status and lag time transactions Count of transactions over all databases EXAMPLES: @@ -72,6 +73,17 @@ export PGHOST=localhost export PGDATABASE=postgres ``` +To test the connection run `./check_psqlserver -m activity`. + +If the config was written and the connect fails then search for pg_hba.conf (/var/lib/pgsql/data/pg_hba.conf or /etc/postgresql/13/main/pg_hba.conf). +If local authentication for ipv4 and v6 is set to "ident" + +```txt +host all all 127.0.0.1/32 ident +``` + +... try to set it to "md5" and restart the pgsql service. + ## Checks The checks are done on the server and summarize data from statistic tables for all databases. @@ -82,7 +94,29 @@ If you need to troubleshot and want to see which of your databases causes the tr ### activity -Show running processes and queries +Show count of running processeses and sum of process states. +Possible states in pg_stat_activity are: + +* active: The backend is executing a query. +* idle: The backend is waiting for a new client command. +* idle in transaction: The backend is in a transaction, but is not currently executing a query. +* idle in transaction (aborted): This state is similar to idle in transaction, except one of the statements in the transaction caused an error. +* fastpath function call: The backend is executing a fast-path function. +* disabled: This state is reported if track_activities is disabled in this backend. + +The check summarizes: + +* total - the total count of all processes +* active - processes with state "active" +* idle - processes with state "idle", "idle in transaction" and "idle in transaction (aborted)" +* fastpath - processes with state "fastpath function call" +* other - count of psql base processes having no value in state column + +The state of the check is always "OK". + +To analyze a troublemaker on high number of processes run `select * from pg_stat_activity` to see the queries and the database name. + +Example output: ```txt ./check_psqlserver -m activity @@ -96,7 +130,21 @@ select * from pg_stat_activity. ### conflicts -Detected conflicts from pg_stat_database_conflicts +Show number of detected conflicts from pg_stat_database_conflicts. The values are counters. Therefor there is a calculation per minute to find newly occured changes. + +The columns in pg_stat_database_conflicts are: + +* confl_tablespace bigint - Number of queries in this database that have been canceled due to dropped tablespaces +* confl_lock bigint - Number of queries in this database that have been canceled due to lock timeouts +* confl_snapshot bigint - Number of queries in this database that have been canceled due to old snapshots +* confl_bufferpin bigint - Number of queries in this database that have been canceled due to pinned buffers +* confl_deadlock bigint - Number of queries in this database that have been canceled due to deadlocks + +The check summarizes all conflicts of all databases. + +The check switches to "critical" if one of the delta values per min is <> 0. + +Example output: ```txt ./check_psqlserver -m conflicts @@ -113,10 +161,22 @@ select * from pg_stat_database_conflicts. |confltablespace=0;; confllock=0;; conflsnapshot=0;; conflbufferpin=0;; confldeadlock=0;; ``` - ### dbrows -Count of database row actions +Count of database row actions. + +From pg_stat_database we read the following columns and add them for all databases. + +* tup_fetched bigint - Number of live rows fetched by index scans in this database +* tup_inserted bigint - Number of rows inserted by queries in this database +* tup_updated bigint - Number of rows updated by queries in this database +* tup_deleted bigint - Number of rows deleted by queries in this database + +The values are counters. Therefor there is a calculation per sec to find current changes. + +The state of the check is always "OK". + +Example output: ```txt ./check_psqlserver -m dbrows @@ -137,6 +197,17 @@ select * from pg_stat_database. Count of diskblocks physically read or coming from cache +From pg_stat_database we read the following columns and add them for all databases. + +* blks_read bigint - Number of disk blocks read in this database +* blks_hit bigint - Number of times disk blocks were found already in the buffer cache, so that a read was not necessary (this only includes hits in the PostgreSQL buffer cache, not the operating system's file system cache) + +The values are counters. Therefor there is a calculation per sec to find current changes. + +The state of the check is always "OK". + +Example output: + ```txt ./check_psqlserver -m diskblock OK: Pgsql diskblock :: Count of diskblocks physically read or coming from cache (from pg_stat_database) @@ -153,6 +224,20 @@ select * from pg_stat_database. Problems and troublemakers +From pg_stat_database we read the following columns and add them for all databases. + +* conflicts bigint - Number of queries canceled due to conflicts with recovery in this database. (Conflicts occur only on standby servers; see pg_stat_database_conflicts for details.) +* deadlocks bigint - Number of deadlocks detected in this database +* checksum_failures bigint - Number of data page checksum failures detected in this database (or on a shared object), or NULL if data checksums are not enabled. +* temp_files bigint - Number of temporary files created by queries in this database. All temporary files are counted, regardless of why the temporary file was created (e.g., sorting or hashing), and regardless of the log_temp_files setting. +* temp_bytes bigint - Total amount of data written to temporary files by queries in this database. All temporary files are counted, regardless of why the temporary file was created, and regardless of the log_temp_files setting. + +The values are counters. Therefor there is a calculation per min to find current changes. + +The state of the check switches to critical if a minimum problem was detected in the delta value. + +Example output: + ```txt ./check_psqlserver -m problems OK: Pgsql problems :: Problems and troublemakers (from pg_stat_database) ... OK, nothing was found @@ -173,10 +258,15 @@ select * from pg_stat_database. Replication status. It shows the defined replication and their status. -It switches to state warning if one of the replications is not "streaming". Aditionally it fetches the maximum lag of write, flush and replay of all replications. -The state switches to warning if it is larger 1 sec (just experimental). + +The state of the check switches "warning" if ... + +* one of the replications is not "streaming" +* the maximum lag is larger 1 sec (just experimental). + +Example output: ```txt ./check_psqlserver -m replication @@ -197,6 +287,17 @@ select * from pg_stat_replication. Count of transactions over all databases +From pg_stat_database we fetch these columns and summarize it for all database: + +* xact_commit bigint - Number of transactions in this database that have been committed +* xact_rollback bigint - Number of transactions in this database that have been rolled back + +The values are counters. Therefor there is a calculation per sec to show the current speed. + +The state of the check is always "OK". + +Example output: + ```txt ./check_psqlserver -m transactions OK: Pgsql transactions :: Count of transactions over all databases @@ -208,3 +309,16 @@ select * from pg_stat_database. |commit=0;; rollback=0;; ``` + +## Run a query on command line + +As root or icingaclient user you can read the configuration for the database monitoring user (created with param `-i`). + +In a terminal you can source the created config file. Then run a query using psql. + +Example: + +```txt +. /etc/icingaclient/.psql.conf +psql -c 'select * from pg_stat_activity' +``` \ No newline at end of file