You are here: Foswiki>ATLAS Web>ArecaTest1 (02 Feb 2008, Shaltev)Edit Attach

Areca test 1

Abstract

damage the raid area to test the error recovery function

Initial setup

Raid Subsystem Information

Controller Name ARC-1261

Firmware Version V1.43 2007-4-17

BOOT ROM Version V1.43 2007-4-17

Serial Number Y712CAAYAR600144

Unit Serial #

Main Processor 800MHz IOP341

CPU ICache Size 32KBytes

CPU DCache Size 32KBytes/Write Back

CPU SCache Size 512KBytes/Write Back

System Memory 2048MB/533MHz/ECC

Raid Set Information

Raid Set Name RAID

Member Disks 4

Total Raw Capacity 3000.6GB

Free Raw Capacity 0.6GB

Min Member Disk Size 750.2GB

Raid Set Power State Operating

Raid Set State Normal

Volume Set Information

Volume Set Name SYSTEM

Raid Set Name RAID

Volume Capacity 1500.0GB

SCSI Ch/Id/Lun 0/0/0

Raid Level Raid 6

Stripe Size 16KBytes

Block Size 512Bytes

Member Disks 4

Cache Mode Write Back

Tagged Queuing Enabled

Volume State Normal

IDE Channels

Channel / Usage / Capacity / Model

Ch01 RAID 750.2GB Hitachi HUA721075KLA330

Ch02 RAID 750.2GB Hitachi HUA721075KLA330

Ch03 RAID 750.2GB Hitachi HUA721075KLA330

Ch04 RAID 750.2GB Hitachi HUA721075KLA330

IDE Drive Information (HDD for the damage test)

IDE Channel 3

Model Name Hitachi HUA721075KLA330

Serial Number GTE200P8G1ZMRE

Firmware Rev. GK8OA70M

Disk Capacity 750.2GB

Current SATA Mode SATA300+NCQ(Depth32)

Supported SATA Mode SATA300+NCQ(Depth32)

Device State NORMAL

Timeout Count 0

Media Error Count 0

SMART Read Error Rate 100(16)

SMART Spinup Time 109(24)

SMART Reallocation Count 100(5)

SMART Seek Error Rate 100(67)

SMART Spinup Retries 100(60)

SMART Calibration Retries N.A.(N.A.)

Test procedure

HDD damage

  1. drive down the server and move the drive to a node with normal SATA controller (one need direct access to the drive) and modified hdparm and smarttools installed
  2. execute do_bad.sh

 #!/bin/bash
 # confiure the script here THINK TWICE
 # allowed 201GB - 209GB
 LBA_LIST="1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 21000 22000 23000 24000 25000 26000 27000 28000 29000 30000 31000 32000 33000 34000 35000 36000 37000 38000 39000 40000"
 LBA_FIRST=0
 LBA_LAST=50000
 BYTE_SIZE=512
 HARD_DRIVE=/dev/sdb
 OUT_PRE_FILE=pre_arica_test_1.data
 OUT_POST_FILE=post_arica_test_1.data
 OUT_POST_ARECA=post_areca_restore_1.data
 # DO NOT TOUCH THE LINE BELOW
 let FILE_SIZE=$LBA_LAST-$LBA_FIRST
 echo
 echo Calculated size: $FILE_SIZE
 echo
 echo Create $OUT_PRE_FILE
 echo
 dd if=$HARD_DRIVE of=$OUT_PRE_FILE bs=$BYTE_SIZE skip=$LBA_FIRST count=$FILE_SIZE
 echo
 echo Corrupt data
 echo
 for i in $LBA_LIST; do
 echo
 make_bad_sector $HARD_DRIVE $i
 done
 dd if=$HARD_DRIVE of=$OUT_POST_FILE bs=$BYTE_SIZE skip=$LBA_FIRST count=$FILE_SIZE
 #dd if=$HARD_DRIVE of=$OUT_POST_ARECA bs=$BYTE_SIZE skip=$LBA_FIRST count=$FILE_SIZE
output:

Calculated size: 50000

Create pre_arica_test_1.data

50000+0 Datensätze ein

50000+0 Datensätze aus

25600000 Bytes (26 MB) kopiert, 0,319796 Sekunden, 80,1 MB/s

Corrupt data

/dev/sdb: readback test LBA=1000

/dev/sdb: success

/dev/sdb: writing LBA=1000

/dev/sdb: readback test LBA=1000 (this should fail!)

/dev/sdb: readback failed

/dev/sdb: readback test LBA=2000

/dev/sdb: success

/dev/sdb: writing LBA=2000

/dev/sdb: readback test LBA=2000 (this should fail!)

/dev/sdb: readback failed

...

/dev/sdb: readback test LBA=39000

/dev/sdb: success

/dev/sdb: writing LBA=39000

/dev/sdb: readback test LBA=39000 (this should fail!)

/dev/sdb: readback failed

/dev/sdb: readback test LBA=40000

/dev/sdb: success

/dev/sdb: writing LBA=40000

/dev/sdb: readback test LBA=40000 (this should fail!)

/dev/sdb: readback failed

dd: Lesen von â: Eingabe-/Ausgabefehler

1000+0 Datensätze ein

1000+0 Datensätze aus

512000 Bytes (512 kB) kopiert, 368,993 Sekunden, 1,4 kB/s

ATTENTION: dd broke @ 1000 as expected!

HDD restore

  1. move the drive back to the storage
  2. run

 ./cli64-1.72 vsf check vol=1

result:

Volume Set Information

Volume Set Name SYSTEM

Raid Set Name RAID

Volume Capacity 1500.0GB

SCSI Ch/Id/Lun 0/0/0

Raid Level Raid 6

Stripe Size 16KBytes

Block Size 512Bytes

Member Disks 4

Cache Mode Write Back

Tagged Queuing Enabled

Volume State Checking

Progress 0.5%

Errors Found 40

The test could be broken now

Final check

  • mount the HDD drive in the node again
  • disable the first two dd and make_bad_sectors commands in do_bad.sh but enable the last dd
  • compare the dd output

 cmp --verbose pre_areca_test_1.data post_areca_restore_1.data

no output => same data!
Topic revision: r1 - 02 Feb 2008, Shaltev
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback