Abstract
damage the raid area to test the error recovery function
Initial setup
Controller Name ARC-1261
Firmware Version V1.43 2007-4-17
BOOT ROM Version V1.43 2007-4-17
Serial Number
Y712CAAYAR600144
Unit Serial #
Main Processor 800MHz IOP341
CPU ICache Size 32KBytes
CPU DCache Size 32KBytes/Write Back
CPU SCache Size 512KBytes/Write Back
System Memory 2048MB/533MHz/ECC
Raid Set Name RAID
Member Disks 4
Total Raw Capacity 3000.6GB
Free Raw Capacity 0.6GB
Min Member Disk Size 750.2GB
Raid Set Power State Operating
Raid Set State Normal
Volume Set Name SYSTEM
Raid Set Name RAID
Volume Capacity 1500.0GB
SCSI Ch/Id/Lun 0/0/0
Raid Level Raid 6
Stripe Size 16KBytes
Block Size 512Bytes
Member Disks 4
Cache Mode Write Back
Tagged Queuing Enabled
Volume State Normal
IDE Channels
Channel / Usage / Capacity / Model
Ch01 RAID 750.2GB Hitachi
HUA721075KLA330
Ch02 RAID 750.2GB Hitachi
HUA721075KLA330
Ch03 RAID 750.2GB Hitachi
HUA721075KLA330
Ch04 RAID 750.2GB Hitachi
HUA721075KLA330
IDE Channel 3
Model Name Hitachi
HUA721075KLA330
Serial Number
GTE200P8G1ZMRE
Firmware Rev.
GK8OA70M
Disk Capacity 750.2GB
Current SATA Mode SATA300+NCQ(Depth32)
Supported SATA Mode SATA300+NCQ(Depth32)
Device State NORMAL
Timeout Count 0
Media Error Count 0
SMART Read Error Rate 100(16)
SMART Spinup Time 109(24)
SMART Reallocation Count 100(5)
SMART Seek Error Rate 100(67)
SMART Spinup Retries 100(60)
SMART Calibration Retries N.A.(N.A.)
Test procedure
HDD damage
- drive down the server and move the drive to a node with normal SATA controller (one need direct access to the drive) and modified hdparm and smarttools installed
- execute do_bad.sh
#!/bin/bash
# confiure the script here THINK TWICE
# allowed 201GB - 209GB
LBA_LIST="1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 21000 22000 23000 24000 25000 26000 27000 28000 29000 30000 31000 32000 33000 34000 35000 36000 37000 38000 39000 40000"
LBA_FIRST=0
LBA_LAST=50000
BYTE_SIZE=512
HARD_DRIVE=/dev/sdb
OUT_PRE_FILE=pre_arica_test_1.data
OUT_POST_FILE=post_arica_test_1.data
OUT_POST_ARECA=post_areca_restore_1.data
# DO NOT TOUCH THE LINE BELOW
let FILE_SIZE=$LBA_LAST-$LBA_FIRST
echo
echo Calculated size: $FILE_SIZE
echo
echo Create $OUT_PRE_FILE
echo
dd if=$HARD_DRIVE of=$OUT_PRE_FILE bs=$BYTE_SIZE skip=$LBA_FIRST count=$FILE_SIZE
echo
echo Corrupt data
echo
for i in $LBA_LIST; do
echo
make_bad_sector $HARD_DRIVE $i
done
dd if=$HARD_DRIVE of=$OUT_POST_FILE bs=$BYTE_SIZE skip=$LBA_FIRST count=$FILE_SIZE
#dd if=$HARD_DRIVE of=$OUT_POST_ARECA bs=$BYTE_SIZE skip=$LBA_FIRST count=$FILE_SIZE
output:
Calculated size: 50000
Create pre_arica_test_1.data
50000+0 Datensätze ein
50000+0 Datensätze aus
25600000 Bytes (26 MB) kopiert, 0,319796 Sekunden, 80,1 MB/s
Corrupt data
/dev/sdb: readback test LBA=1000
/dev/sdb: success
/dev/sdb: writing LBA=1000
/dev/sdb: readback test LBA=1000 (this should fail!)
/dev/sdb: readback failed
/dev/sdb: readback test LBA=2000
/dev/sdb: success
/dev/sdb: writing LBA=2000
/dev/sdb: readback test LBA=2000 (this should fail!)
/dev/sdb: readback failed
...
/dev/sdb: readback test LBA=39000
/dev/sdb: success
/dev/sdb: writing LBA=39000
/dev/sdb: readback test LBA=39000 (this should fail!)
/dev/sdb: readback failed
/dev/sdb: readback test LBA=40000
/dev/sdb: success
/dev/sdb: writing LBA=40000
/dev/sdb: readback test LBA=40000 (this should fail!)
/dev/sdb: readback failed
dd: Lesen von â: Eingabe-/Ausgabefehler
1000+0 Datensätze ein
1000+0 Datensätze aus
512000 Bytes (512 kB) kopiert, 368,993 Sekunden, 1,4 kB/s
ATTENTION: dd broke @ 1000 as expected!
HDD restore
- move the drive back to the storage
- run
./cli64-1.72 vsf check vol=1
result:
Volume Set Name SYSTEM
Raid Set Name RAID
Volume Capacity 1500.0GB
SCSI Ch/Id/Lun 0/0/0
Raid Level Raid 6
Stripe Size 16KBytes
Block Size 512Bytes
Member Disks 4
Cache Mode Write Back
Tagged Queuing Enabled
Volume State Checking
Progress 0.5%
Errors Found 40
The test could be broken now
Final check
- mount the HDD drive in the node again
- disable the first two dd and make_bad_sectors commands in do_bad.sh but enable the last dd
- compare the dd output
cmp --verbose pre_areca_test_1.data post_areca_restore_1.data
no output => same data!