Home
SMART is
is defined as the Self Monitoring Analysis & Reporting Technology. It is a
monitoring system for computer hard disks to detect and report on various
indicators of reliability, in the hope of anticipating failures. In effect,
SMART can be used to monitor the health of the hard drive.
Fundamentally, hard drives can suffer one of two classes of failures:
- Predictable ones, when some failure modes, especially mechanical wear and aging, happen gradually over time. A monitoring device can detect these, much as a temperature dial on the dashboard of an automobile can warn a driver before serious damage occurs that the engine has started to overheat.
- Unpredictable ones, when other failures may occur suddenly and unpredictably, such as an electronic component failing.
Mechanical failures, which are usually predictable failures, account for 60 percent of drive failure. The purpose of S.M.A.R.T. is to warn a user or system administrator of impending drive failure while time remains to take preventative action such as copying the data to a replacement device. Approximately 30% of failures can be predicted by S.M.A.R.T.
Work at Google on over 100,000 drives has shown little overall predictive value of S.M.A.R.T. status as a whole, but that certain sub-categories of information S.M.A.R.T. implementations might track do correlate with actual failure rates - specifically that following the first scan error, drives are 39 times more likely to fail within 60 days than drives with no such errors and first errors in reallocations, offline reallocations, and probational counts are also strongly correlated to higher failure probabilities.Wikipedia, SMART
The most basic information that SMART provides is the SMART status. It provides only two values, "threshold not exceeded" or "threshold exceeded". Often these are represented as "drive OK" or "drive fail" respectively. A "threshold exceeded" value is intended to indicate that there is a relatively high probability that the drive will not be able to honor its specification in the future: that is, it's "about to fail". The predicted failure may be catastrophic or may be something as subtle as inability to write to certain sectors or slower performance than the manufacturer's minimum.
The SMART status does not necessarily indicate the drive's reliability now or in the past. If the drive has already failed catastrophically, the SMART status may be inaccessible. If the drive was experiencing problems in the past, but now the sensors indicate that the problems no longer exist, the SMART status may indicate the drive is OK, depending on the manufacturer's programming.
Lets take a look at what SMART can do through the "atactl" binary in OpenBSD.
You can look at the attributes of our example drive by using the readattr as seen below in the scrollable window. These value cover most of the common functions of the hard drive including retry amounts and failure counts. This drive has not reported any errors.
root@machine: /sbin/atactl /dev/wd0c readattr Attributes table revision: 16 ID Attribute name Threshold Value 3 Spin Up Time 63 176 4 Start/Stop Count 0 253 5 Reallocated Sector Count 63 253 6 Unknown 100 253 7 Seek Error Rate 0 253 8 Seek Time Performance 187 240 9 Power-on Hours Count 0 235 10 Spin Retry Count 157 253 11 Unknown 223 253 12 Device Power Cycle Count 0 253 192 Power-off Retract Count 0 253 193 Load Cycle Count 0 253 194 Temperature 0 25 195 Unknown 0 253 196 Reallocation Event Count 0 253 197 Current Pending Sector Count 0 253 198 Off-line Scan Uncorrectable Sect 0 253 199 Ultra DMA CRC Error Count 0 199
You can use the binary "atactl" to monitor the SMART values of your hard drive too. In this example we are going to use the binary to check the primary hard drive of a OpenBSD system disk and notify us by email of any errors.
When using the one or both of the following options "atactl" will check the drive for errors. If an error is found then an email will be sent to root by means of the "smartenable" argument. It only sends out an email if an error is found to reduce the spam.
Option 1: Run Once - The following can be run SMART once when the system boots to check the primary hard drive. Put these lines into your /etc/rc.local to check SMART stats on boot:
## SMART hard drive boot check if [ -x /sbin/atactl ]; then echo -n ' smartenable'; /sbin/atactl /dev/wd0c smartenable fi
Option 2: Run Periodically through Cron - By runnning the SMART check through cron you can make sure you have a heads up of any serious problems with the drive. The following will run the command every morning at 5:30am. You will only receive an email to root if there is a problem with the drive.
#minute (0-59) #| hour (0-23) #| | day of the month (1-31) #| | | month of the year (1-12 or Jan-Dec) #| | | | day of the week (0-6 with 0=Sun or Sun-Sat) #| | | | | commands #| | | | | | #### SMART Hard Drive Status 30 5 * * * /sbin/atactl /dev/wd0c smartstatus >> /dev/null 2>&1
Questions, comments, or suggestions? Contact Calomel.org