You are here: Foswiki>ATLAS Web>FirstTestLVStorage (04 Jan 2008, Carsten)Edit Attach

FirstTest LV storage

The following items need to be checked for compute nodes (please add your name to the test you have performed). If you need much more space please add footnotes).

The system design should be as simple as possible and all the components should be standard/commodity ones.:
State items present in box (just the single items for our records, details will be queried later) [CA]
2 x Intel Xeon E5345
4 x 4 GB ATP AP12K72D4BGE6S (Head: AP12K72G4BJE6S) Difference?
1 x Areca 1280ML + 2 GB Cache
5 x Fans
1 x Supermicro X7VDL-E Motherboard Rev 1.21
1 x Supermicro IPMI SIM1U+ Rev 3.00
16 x Hitachi Ultrastar HUA721075KLA330 (Head: 4 x )
2 x Intel e1000 NIC (Head: none)

All nodes should be able to run the same Linux operating system: version.
[X] Debian etch able to run? [CA]
All components (compute nodes, storage nodes, head nodes) should be installed on slide rails that permit easy removal and internal access. The slide rails must be rated for the necessary weight.:
[X] Slide rails available? [CA]
[X] Easy operation? [CA]
Of this, the amount of power and cooling available for this cluster are 220W per HU on average.:
[X] Wattage of box less than 220W per HU
How much? 2x17 W (off), 2x250W (peak, boot), 190W+210W (idle) ,273W+268W (full-load) [CA]
What is the power factor?:
[X] Cos phi > 0.9 [CA]
Measured: -0.95 (start-up, idle), -0.96 (full-load) [CA]
It should be possible to shut down and power up the entire cluster remotely. Hardware monitoring (IPMI~v2.0 or better) should include the monitoring of the basic health (temperatures, voltages, fans) of all types of nodes and Serial over LAN (SoL) features for remote console access.:
[X] Hardware monitoring via IPMI possible? [CA]
[X] IPMI temperatures (CPU1, CPU2, System) [CA]
[X] IPMI voltages (CPU1, CPU2, 3.3V, 5V, 12V, -12V, 1.5V, -12V, 1.5V, 5VSB, VBAT) [CA]
[X] IPMI fan speeds (Fan1, FAn2, Fan3, FAn5, FAn6) [CA]
[ ] SoL working?
[ [ KVM working?
When the nodes are provided with sufficient volumes of cooling air at 23 degrees Celsius and operated at full load (CPU and disk) the MTBF (Mean Time Between Failure) of each system power supply will be at least 100,000 hours.:
[X] MTBF of power supply > 100,000 hours?
RAID systems should be RAID-6.:
[X] RAID6 is possible [CA]
Hard disks should come with a 5~years manufacturer warranty and should be designed for 24/7~operation in RAID arrays. The error rate (non-recoverable read errors per bits read) should be less than $2\times 10^{-15}$.:
[X] Fulfilled [CA]
File servers should provide at least 15~MB/sec of NFS read/write bandwidth per TB of net storage. Thus a file server with more than 7TB of usable storage space must have multiple channel-bonded Gb/s ethernet connections to the core switch. :
[ ] We have about 10 TB of usable storage thus 150 MB/s NFS read/write must be possible.
For performance reasons we require a minimum of 4~CPU cores and 8~GB memory per 7~TB net storage on general purpose Linux systems. :
[X] System comes with 8 cores and 16 GB of memory [CA]
RAID controller cards cache memory sizes should be maxed out.:
[X] 2 GB module present [CA]
File servers should have redundant hot-swap power supplies :
[ ] System still works on single PSU?
RAID battery backup units to preserve RAID write caches if power fails.:
[X] NO BBU package (according to contract) [CA]
At least one of the following notifications must be present to indicate a failed power supply: visual, IPMI, SNMP, other kind of network notification. Audible beep or audible alarm indicators alone are not sufficient!:
[ ] Fulfilled, which methods are available?
_____________________
_____________________
_____________________
File servers should provide hands-off IPMI~v2.0 management similar to the requirements given for the compute nodes. :
[ ] Fulfilled
IPMI network connections must not be tunneled through on-board network.:
[X] Fulfilled [CA]
All file server disks (including system disks if separate from data disks) should be hot-swappable from the front or top of the system without removing the system from the rack or from service.:
[X] Fulfilled [CA]
LEDs on the disk carriers or an equivalent front panel display should clearly indicate disk status, including power on or power off, or disk failure or attention needed.:
[X] Fulfilled [CA]
RAID controllers with built-in hardware ethernet ports for a web interface (optionally connected to the management network provided by AEI) for configuration and supervision are desirable. Also a Command Line Interface (CLI) is considered desirable for scripting and automation purposes.:
[X] Fulfilled [CA]
A high bandwidth internal bus connection, PCI express, is required for the RAID controller cards.:
[X] Fulfilled [CA]
During a READ operation, if the RAID controller finds an unreadable (uncorrectable) disk sector, hen it will immediately reconstruct the missing data for that sector using redundant data from the rest of the array, and WRITE that data to the unreadable (uncorrectable) sector to force sector reallocation f needed by the disk.:
[X] Fulfilled (According to Areca) [CA]
The RAID controller will perform a continuous or regular (at least daily) background scan of all disk sectors to identify and repair any unreadable sectors as described above.:
[X] Fulfilled (Manually via CLI) [CA]
The RAID controller will perform continuous or regular scans of redundant RAID data (parity) to verify and maintain data consistency. This must identify and repair any silent corruption on the disks (data sectors that are readable but whose values have changed for unknown reasons).:
[X] Fulfilled (According to Areca)
Internal SATA cables should be professional and neat. All SATA cable requirements, such as length, must be met. Internal cooling and airflow must not be impeded by cabling.:
[X] Fulfilled (Head node: too much strain on cables!) [CA]
Access to S.M.A.R.T. status and data of individual disks is required.:
[ ] Fulfilled
Channel Bonding/Link aggregation must be supported over multiple Gb/s ethernet ports.:
[ ] Fulfilled
On each fileserver, the system must support a single partition and a single filesystem such as XFS over all available net RAID disk space accessible by a recent Linux kernel.:
[X] Fulfilled [CA]
Each node should have an IPMI v2.0 management card (BMC) installed; the ethernet connection can be shared with the on-board ethernet network connection for remote access, or the Vendor can provide a separate oversubscribed low-performance management network for this purpose.:
[X] IPMI present and working? [CA]
Nodes must be able to perform full power-off, reset, and wakeup via IPMI under all power cycling conditions (e.g. even if external power has been switched off and back on before wakeup). Wake-on-LAN is considered a bonus.:
[ ] Box is running, power off via IPMI works?
[ ] Box is running, reset (power cycle) via IPMI works?
[ ] Box is powered off via IPMI, restart via IPMI possible?
[ ] Box is powered off via OS, restart via IPMI possible?
[ ] Box is powered off via power switch, restart via IPMI possible?
[ ] Box is running, power cables is removed (> 1 minute), restart via IPMI possible?
[ ] Box is powered off via OS, power cable removed (> 1 minute), restart via IPMI possible?
[ ] Box is powered off via IPMI, restart via WoL possible?
[ ] Box is powered off via OS, restart via WoL possible?
[ ] Box is powered off via power switch, restart via WoL possible?
[ ] Box is running, power cables is removed (> 1 minute), restart via WoL possible?
[ ] Box is powered off via OS, power cable removed (> 1 minute), restart via WoL possible?
Storage nodes must be bootable via the network (PXE) to allow remote, ands-off installation.:
[X] PXE is working [CA]
Floppy and CD drives are not needed:
[ ] Floppy is missing NO [CA]
[X] CD is missing [CA]
Vendor should provide a hands-off, network-based method for BIOS and IPMI upgrades and configuration. :
[ ] BIOS can be flashed in hands-off manner?
[ ] BIOS values (CMOS) can be set in hands-off manner?
[ ] IPMI can be flashed in hands-off manner?
[ ] IPMI values can be set in hands-off manner?
Nodes should be delivered with BIOS settings as per AEI specifications. Vendor must have an automated system to set BIOS values.:
[X] fulfilled? [CA]
Sound components, mice and keyboards are not required. The nodes are required to work without any mouse and/or keyboard connected.:
[X] Audio is missing [CA]
[X] Mouse missing [CA]
[X] Keyboard missing [CA]
[X] System works with missing audio, mouse and/or keyboard [CA]
Mainboard sensors for all vital components (fans, temperatures, voltages) must be present and can be queried using at least one of lm_sensors or ipmitools.:
[ ] IPMI values readable:
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
[ ] lm_sensors working?
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
______________________
Major/minor revision numbers of major components and embedded software (BIOS/Firmware) must be identical. (List of revision numbers, firmware version):
Supermicro SC-836 (Chassis)
Supermicro X7VDL-E Rev 1.21(Motherboard)
Supermicro SIM1U+ Rev 3.00 (IPMI)
4 x 4 GB ATP AP12K72D4BGE6S (Head: AP12K72G4BJE6S) (Memory)
2 x Intel Xeon E5345 (CPU)
1 x Areca 1280ML + 2 GB Cache (RAID). Internally 1261ML
16 x Hitachi Ultrastar HUA721075KLA330 (Head: 4 x )
2 x Intel 631xESB/632xESB NIC (Head: none)
The operating system on the cluster will be a 64-bit version of Linux, with a recent 2.6 kernel. Therefore, it is required that all hardware works under this OS.:
[X] Kernel 2.6.20 and 2.6.23.1 are working [CA]
The cluster must work with any major Linux distribution coming with a recent 2.6 kernel.:
[X] Debian etch is working [CA]
All compute nodes, head nodes, storage nodes and switches should be placed into 19-inch racks provided by AEI. The maximum depth of any installed equipment must not exceed 750mm.:
[X] Fulfilled [CA]
The air flow should be from the front to the back, i.e. cool air from the front should be taken in by the fans and the hot air blown out in the rear. This should be valid for the whole rack meaning the cooling air flow within the rack is horizontal.:
[X] Fulfilled [CA]
Each node will be clearly labeled with the node number and ethernet MAC address of the network card and of the IPMI interfaces. Three labels (node number, MAC ethernet and IPMI addresses) will appear on the front {\it and} the same three labels will appear at the rear of the chassis. The characters in labels should be as large as permitted by space on the chassis. Labels must be permanent and not peel or discolor after time. The exact naming scheme will be discussed and finalized after the order has been placed.:
[ ] Node number readable from front? Only when pulled out ~5cm [CA]
[X] Node number readable from back? [CA]
[ ] MAC of eth0 is readable from front? Only when pulled out ~5cm [CA]
[ ] MAC of eth0 is readable from back? NO [CA]
[ ] MAC of eth1 is readable from front? Only when pulled out ~5cm [CA]
[ ] MAC of eth1 is readable from back? NO [CA]
[ ] MAC of eth2 is readable from front? NO [CA]
[ ] MAC of eth3 is readable from back? NO [CA]
[ ] MAC of eth4 is readable from front? NO [CA]
[ ] MAC of eth4 is readable from back? NO [CA]
[ ] MAC of IPMI is readable from front? Only when pulled out ~5cm [CA]
[ ] MAC of IMPI is readable from back? NO [CA]
[ ] MAC of RAID is readable from front? NO [CA]
[ ] MAC of RAID is readable from back? NO [CA]
The Vendor will supply on a CD or on a floppy disk an ASCII text file containing a list of ethernet and IPMI MAC addresses and node names in a 4-column format.:
NOT FULFILLED [CA]
I had to type in the numbers myself
Contacted Vendor about this, should be fixed for future
[ ] Fulfilled - NO [CA] I had to type these numbers into file frown, sad smile The Vendor will run a burn-in on each node before delivery, at least 24 hours long, using a memory/disk/CPU/network-card/video-card exercise program of their choice. The Vendor will supply documentation of the burn-in tests, including specification of what programs and parameters have been used, and the name of a contact person for queries regarding these tests.:
NOT FULFILLED [CA]
(Contacted Vendor about this, should be fixed for future)
iozone and bonnie++ benchmark results agree with supplied results?:
[ ] Fulfilled
Topic revision: r1 - 04 Jan 2008, Carsten
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback