From d0616279019bfab3dbcf67796b2ca2c7a7f43a20 Mon Sep 17 00:00:00 2001 From: "Hahn Axel (hahn)" <axel.hahn@unibe.ch> Date: Fri, 7 Jun 2024 13:03:39 +0200 Subject: [PATCH] add docs for check_smartstatus --- docs/20_Checks/_index.md | 2 +- docs/20_Checks/check_smartstatus.md | 151 ++++++++++++++++++++++++++++ 2 files changed, 152 insertions(+), 1 deletion(-) create mode 100644 docs/20_Checks/check_smartstatus.md diff --git a/docs/20_Checks/_index.md b/docs/20_Checks/_index.md index 0a06f3a..f8a7774 100644 --- a/docs/20_Checks/_index.md +++ b/docs/20_Checks/_index.md @@ -47,7 +47,7 @@ There is one include script used by all checks: * [check_rearbackup](check_rearbackup.md) * [check_reboot_required](check_reboot_required.md) * [check_requirements](check_requirements.md) -* check_smartstatus +* [check_smartstatus](check_smartstatus.md) * [check_snmp_data](check_snmp_data.md) * check_snmp_printer * check_snmp_switch diff --git a/docs/20_Checks/check_smartstatus.md b/docs/20_Checks/check_smartstatus.md new file mode 100644 index 0000000..7131c83 --- /dev/null +++ b/docs/20_Checks/check_smartstatus.md @@ -0,0 +1,151 @@ +# Check_smartstatus + +## Introduction + +**check_smartstatus** is a plugin run a smartctl check to verify the disk status of all local harddisks/ ssds. + +It works on physical machines only. + +## Requirements + +* `smartctl` + +The icinga user needs sudo permissions on the smartctl binary. + +```txt +icingaclient ALL=(ALL) NOPASSWD: /sbin/smartctl +``` + +## Syntax + +```txt +______________________________________________________________________ + +CHECK_SMARTSTATUS +v1.6 + +(c) Institute for Medical Education - University of Bern +Licence: GNU GPL 3 + +https://os-docs.iml.unibe.ch/icinga-checks/Checks/check_smartstatus.html +______________________________________________________________________ + +Show status of local S.M.A.R.T. devices. + +SYNTAX: + check_smartstatus [-h] [-l] [devices] + +OPTIONS: + + -h|--help show this help. + -l|--list list devices only. + +PARAMETERS: + +EXAMPLES + + check_smartstatus + Scan all local disks + + check_smartstatus -l + List all local disks without scanning them. + +``` + +### Parameters + +(none) + +## Examples + +Fort testing purposes: Show devices only without scanning them: + +```txt +./check_smartstatus -l +Devices to scan: +- /dev/nvme0 -d nvme # /dev/nvme0, NVMe device +``` + +Without parameter `check_smartstatus` will loop over all found devices and perform a SMART scan on each. You get a status line with a summary followed by the output sections for each disk. + +This is the output of a single SSD: + +```txt +OK: SMART check on 1 Disks - 0 errors - /dev/nvme0: PASSED +SMART/Health Information (NVMe Log 0x02) +---------------------------------------------------------------------- + +/dev/nvme0 + +sudo smartctl -Ha /dev/nvme0 +smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.9.2-1-MANJARO] (local build) +Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org + +=== START OF INFORMATION SECTION === +Model Number: SKHynix_HFS001TEJ9X162N +Serial Number: AJC9N469110209D22 +Firmware Version: 51730A10 +PCI Vendor/Subsystem ID: 0x1c5c +IEEE OUI Identifier: 0xace42e +Controller ID: 0 +NVMe Version: 1.4 +Number of Namespaces: 1 +Namespace 1 Size/Capacity: 1,024,209,543,168 [1.02 TB] +Namespace 1 Formatted LBA Size: 512 +Namespace 1 IEEE EUI-64: ace42e 0035db84db +Local Time is: Fri Jun 7 12:59:02 2024 CEST +Firmware Updates (0x16): 3 Slots, no Reset required +Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test +Optional NVM Commands (0x00df): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify +Log Page Attributes (0x1e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg +Maximum Data Transfer Size: 64 Pages +Warning Comp. Temp. Threshold: 86 Celsius +Critical Comp. Temp. Threshold: 87 Celsius + +Supported Power States +St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat + 0 + 7.50W - - 0 0 0 0 5 305 + 1 + 3.9000W - - 1 1 1 1 30 330 + 2 + 1.5000W - - 2 2 2 2 100 400 + 3 - 0.0500W - - 3 3 3 3 500 1500 + 4 - 0.0050W - - 4 4 4 4 1000 9000 + +Supported LBA Sizes (NSID 0x1) +Id Fmt Data Metadt Rel_Perf + 0 + 512 0 0 + +=== START OF SMART DATA SECTION === +SMART overall-health self-assessment test result: PASSED + +SMART/Health Information (NVMe Log 0x02) +Critical Warning: 0x00 +Temperature: 43 Celsius +Available Spare: 100% +Available Spare Threshold: 10% +Percentage Used: 0% +Data Units Read: 6,589,009 [3.37 TB] +Data Units Written: 3,879,914 [1.98 TB] +Host Read Commands: 39,241,205 +Host Write Commands: 72,717,841 +Controller Busy Time: 2,112 +Power Cycles: 176 +Power On Hours: 642 +Unsafe Shutdowns: 21 +Media and Data Integrity Errors: 0 +Error Information Log Entries: 0 +Warning Comp. Temperature Time: 0 +Critical Comp. Temperature Time: 0 +Temperature Sensor 1: 40 Celsius +Temperature Sensor 2: 37 Celsius + +Error Information (NVMe Log 0x01, 16 of 256 entries) +No Errors Logged + +Self-test Log (NVMe Log 0x06) +Self-test status: No self-test in progress +No Self-tests Logged + +/dev/nvme0 - rc=0 + +PASSED SMART/Health Information (NVMe Log 0x02) +``` -- GitLab