Tasks to do:
Initial work
(HP)
Manual work
- Blank disk of node, wipe by:
dd if=/dev/zero of=/dev/sda; sync
- Put MAC address into DHCP table on server
- PXE boot
Automatic from here (check list):
- Flash BIOS (Dos ) -> DONE
- Flash Firmware (Dos ) -> DONE
- Setup BIOS/IPMI ( (Dos )/Linux) -> DONE
- Clone/Install operating system ((FAI )) -> in progress
- Reboot -> DONE
Slave tests
Sensors
(HP)
remotely get
- temperatures? -> DONE
- fan speeds? -> DONE
- voltages? -> DONE
- SMART values?
by using
IPMI
Warnings
(all)
Are there any warnings in
- dmesg
- /var/log/{messages|syslog}
Partitions
- correct partition tables, inode number customization -> DONE (config in FAI)
Clean?
- are /boot, /lib/modules clean? -> DONE
- does startx work? Which WindowManagers? dwm -> DONE
- not standard way to boot -> DONE
- gcc/ddd/gdb/prof/gprof/valgrind/g++ + other vital tools on the head nodes??? -> DONE
Network
- networking runs wire-speed, full-duplex (netperf/netpipe)
- (VLAN?) wait for the core swicth
- correct identity for machine (hostname, IP) -> DONE
Power
(MS)
- shutdown -hf now (maybe only shutdown -h) -> DONE
- After shutdown, power cycle (disconnect, reconnect cable) box needs to stay off -> DONE
- shutdown -rf now reboots -> DONE
- unplug box, plug back in, box should stay off -> DONE
- remotely power on machines (IPMI, etherwake)? Under any of given conditions (except reboot) -> DONE
- cut UPS power for less than 60s, nodes should stay on -> DONE
- cut UPS pwer for more than 60s, nodes should shut down -> DONE
- What about not full UPS?
Time
- no files with dates in the future -> DONE
cd /; touch temp.dat; find / -xdev -cnewer temp.dat
- no files with dates with "early dates" (e.g. 1980) -> DONE
find / -xdev -type f -printf "%TY %p\n"| grep "^19[0-7][0-9] "
ntpq -p
Benchmarks/Tests
- benchmarks run at full speed?
- disk speeds (guesstimate > 50 MB/s and > 800 MB/s) -> DONE
hdparm -tT /dev/sda
- big file support (>2 GB) -> DONE
- /proc/meminfo should show full memory
Automount
- Automounting works? E.g. /net/s1234/data
- cd to automounted directory?
cd /net/s1234/data
- Copy to/from automounted partitions (permissions for users/root correct - what is correct?)
- low NFS time-out values
rsh/ssh
- Can root on master rsh to any node? -> DONE
- Can ordinary rsh to any node? -> DONE
- Node to node rsh should work as well (host.allow/host.deny) -> DONE
- rsh uptime on any nodes? -> DONE
master $ rsh s1234 uptime
Misc
- does /root/cloned-date (svn-tagnumber) exist? Maybe better in /etc? -> Puts date in /etc/cloned-date '-> DONE '
- does recloning preserve data? -> DONE '' if it is covered by fai softupdate
- does email work on the nodes (outgoing)? (X) -> DONE * garbage bag test. Computer starts to overheat and should shutdown cleanly. -> DONE