"S.M.A.R.T" is for "Self-Monitoring, Analysis and Reporting Technology".
protocols stores disk sensor-values and event counters on a special area of the disk drive. Depending on thresholds, defined by the vendor for each value, the disk can self-evaluate its health and predict malfunction.
A detailed overview of the values reportet can be found at wikipedia
Visit the smartmontools sourceforge page
for further information
checks can be daemonized to automatically generate logs and send warning emails.
configure with /etc/smartd.conf
smartctl -d ata -a /dev/sda
Prints the current Values and thresholds of the Hdd on /dev/sda
seems to be disabled in BIOS
on most Racknodes in the Prototypenhalle.
Disk drives store data in sectors of 512 bytes. In addition to these 512 bytes of user data, there is also additional ECC (Error Checking and Correction) data, typically 40 to 60 bytes long. This ECC data permits the detection and correction of common types of errors, that take place when the user data is read by the disk head. An Uncorrectable Sector is a disk drive sector for which the user data is NOT consistent with the ECC data. This means that the disk drive is unable to 'reconstruct' the correct data for that sector. A command to READ an Uncorrectable Sector returns an error (and NO data).
Uncorrectable sectors are
a problem. They might
indicate that the disk is defective and needs replacement. But they also can occur when nothing
is wrong with the disk drive. For example if the power fails when the disk is writing, this can result in the user data being correctly written but the ECC data not being updated. Then they are inconsistent and the sector is Uncorrectable (UNC).
There are two types of uncorrectable sectors: offline and pending. Offline uncorrectable sectors are those found during SMART
self-test read scans. Pending sectors are those which occured during normal operation of the computer. Each pending sector generates a kernel error in /var/log/syslog, since the OS tried to read something from the disk, and the disk failed. You can see these counters with
smartctl -a -d ata /dev/sda
Find Uncorrectable Sectors in the Cluster
On each nodes runs a smartd which, eventually, sends out an email if a uncorrectable sector is detected.
Manually, one can check the status using the following command line:
dsh -o "-x" -M -F 20 -a "/usr/sbin/smartctl -a /dev/sda -d ata | grep Pending" | tee pend.txt
sort < pend.txt -k 11 -n -r | head -30
A variation of the previous command shows no Offline Uncorrectable sectors:
dsh -o "-x" -M -F 20 -a "/usr/sbin/smartctl -a /dev/sda -d ata | grep Offline_Uncorrectable"| tee unc.txt
sort < unc.txt -k 11 -n -r | head -30
How should uncorrectable sectors be handled?
Proper testing, and possible repair, requires wiping out the data on the disk, and recloning the nodes.
If there is nothing wrong with the disk drive, then simply writing to the UNC sector will result in consistent data there, and the sector will be readable again. Of course this wipes out the data stored at that sector! Here is one procedure to follow:
(1) Shut down node, remove disk drive, and write the Date and the Uncorrectable/Pending sector counts on the drive with a non-washable marker.
(2) Put a NEW disk drive into the node and reclone the node.
(3) Run IBM/Hitach Drive Fitness Test (DFT) on the old disk. Use the most aggressive and thorough sector repair options (these will destroy old data).
(4) If the drive is identified by DFT as defective, then a Technical Return Code will be generated. You can save the DFT results to a file, print the report and attach it to the disk drive, then put the drive into the pile of parts to return to Pyramid.
(5) If DFT identifies the drive as OK, write the date and ‘DFT OK’ on the drive with a marker, and put it onto the pile of good parts for future drive replacements.
It would be useful to set up a machine whose main purpose is to run DFT on bad drives. This should have a cheap printer attached to print out the test reports, and an external drive bay to make it easy to swap drives.
- Perhaps it is possible to set up a PXE-boot DOS image that automatically runs DFT. That might be very useful, especially if a report could automatically be saved from the DOS environment. Then when problems occur one could simply reboot the node into this PXE image.
- Perhaps one can boot the node into a PXE DOS boot image with DFT, and then run this remotely via the IMPI Serial-Over-LAN connection. This would permit several disks to be simultaneously repaired at the same time, without ever touching the node or removing the disk drive.