We have got a
Sun Fire X4500 , a dual-Opteron storage server with 48 SATA 500GB drives from Sun Microsystems.
Installation
It weights about 80kg - so putting out harddisks and power supplies for installation is a good idea!
A good starting point for X4500 installation issues is available at our collegues' site at UWM
here and
here
Starting the Server
As soon as you plug the power to the to supplies the server makes noise and starts with some self-tests which take about one minute. After that bios, grub and finally solaris boots. The vga-ports output is redirected, and thats why after the grub menu, you will just see a blinking cursor. Anyway accessing the server via SP/remote console is the better alternative.
SP
You can access the servers BMC, or Service Processor/ILOM how Sun calls it, via several ways.
ILOM administration guide
- SSH/Command line: You have got to connect the NET MGT. Look at the servers BIOS settings for activating SP DHCP and getting the SPs MAC-Address. Then you can access the SP via
ssh root@192.168.0.112 (or whatever is SPs IP - default password is changeme)
Type "ESC" and "(" to leave the remote console.
- IPMI and SNMP interfaces
- Web-GUI on the SPs IP
- Serial connection (SER MGT port) - there is an adapter for RJ45 to serial port
- Connecting a monitor to the box wont work, cause output is redirected to remote console (see above)
Now we want to log in to Solaris:
start /SYS (switch on the thumper, if not done yet)
start /SP/console (start the remote console - log in to Solaris)
During first boot of Solaris you have to answer some questions via installation menu in order to specify language, clock etc.
But SP-CLI can do so many more usefull things, e.g.
show /SYS/HD/HDD35
set /SYS/LOCATE/ value=off
But all this can also be done with stuff like ipmitools.
Configuration
Install patches via pca
- Download pca here
- wget can be found under /usr/sfw/bin
- if you want more output, look into pca and delete the -q for WGET
- run the tool, it will show you how many patches are missing
Install software via blastwave
As root run
pkgadd -d http://www.blastwave.org/pkg_get.pkg
and answer to all questions either all or yes.
Add a suitable mirror, e.g.
url=http://ftp.uni-erlangen.de/pub/mirrors/blastwave.org/stable
to
/opt/csw/etc/pkg-get.conf
If you are feeling lazy and don't want to answer many questions during installation, please run
cp -p /var/pkg-get/admin-fullauto /var/pkg-get/admin
Now you can start installing new software, e.g.
/opt/csw/bin/pkg-get -i wget rsync emacs vim
(don't forget to put set PATH accordingly)
Mirror System disk
After a basic installation the system disk is not mirrored anymore. To do so, use the following script, but make sure you edit the variables first!
#!/bin/sh
## $Id: SunFireX4500.txt,v 1.4 2009/02/06 14:22:57 HenningFehrmann Exp $
##
## mirror_system_disk.sh
##
## Made by Carsten Aulbert
## Login
##
## Started on Fri Mar 21 13:01:16 2008 Carsten Aulbert
## Last update Fri Mar 21 13:01:16 2008 Carsten Aulbert
##
# system & mirror disk
SYSTEMDISK=c6t0d0
MIRRORDISK=c6t4d0
CYLSTART=5611
ROOTSLICE=s0
VARSLICE=s1
SWAPSLICE=s3
LOGFILE=/tmp/mirror.out
echo "Starting mirror process" > $LOGFILE
# show current layout
echo -e "p\np\nq\nq\n" | format -d $SYSTEMDISK | tee -a $LOGFILE
echo -e "p\np\nq\nq\n" | format -d $MIRRORDISK | tee -a $LOGFILE
# add seventh partition (no magic performed)
echo -e "p\n7\n\n\n$CYLSTART\n32130b\np\nlabel\ny\nq\nq\n" | format -d $SYSTEMDISK | tee -a $LOGFILE
echo -e "p\n7\n\n\n$CYLSTART\n32130b\np\nlabel\ny\nq\nq\n" | format -d $MIRRORDISK | tee -a $LOGFILE
# more output
prtvtoc /dev/dsk/${SYSTEMDISK}s2 | tee -a $LOGFILE
prtvtoc /dev/dsk/${SYSTEMDISK}s2 | fmthard -i -s - /dev/rdsk/${MIRRORDISK}s2 | tee -a $LOGFILE
# do it
prtvtoc /dev/dsk/${SYSTEMDISK}s2 | fmthard -s - /dev/rdsk/${MIRRORDISK}s2 | tee -a $LOGFILE
# init metadb
metadb -a -f ${SYSTEMDISK}s7 ${MIRRORDISK}s7
metadb | tee -a $LOGFILE
# work on swap
swap -l | tee -a $LOGFILE
swap -d /dev/dsk/${SYSTEMDISK}${SWAPSLICE} | tee -a $LOGFILE
swap -l | tee -a $LOGFILE
metainit d20 1 1 ${SYSTEMDISK}${SWAPSLICE} | tee -a $LOGFILE
metainit d21 1 1 ${MIRRORDISK}${SWAPSLICE} | tee -a $LOGFILE
metastat | tee -a $LOGFILE
metainit d2 -m d20 d21 | tee -a $LOGFILE
metastat | tee -a $LOGFILE
swap -a /dev/md/dsk/d2 | tee -a $LOGFILE
swap -l | tee -a $LOGFILE
# now /
metainit -f d10 1 1 ${SYSTEMDISK}${ROOTSLICE} | tee -a $LOGFILE
metainit -f d11 1 1 ${MIRRORDISK}${ROOTSLICE} | tee -a $LOGFILE
metainit d1 -m d10 | tee -a $LOGFILE
metaroot d1 | tee -a $LOGFILE
cat /etc/vfstab | tee -a $LOGFILE
# and /var
metainit -f d30 1 1 ${SYSTEMDISK}${VARSLICE} | tee -a $LOGFILE
metainit -f d31 1 1 ${MIRRORDISK}${VARSLICE} | tee -a $LOGFILE
metainit d3 -m d30 | tee -a $LOGFILE
metattach d3 d31 | tee -a $LOGFILE
# user's TODO
cat <
Create zpool
Right now this is suboptimal, after we will get rid off the system disks, we will optimize it more:
#!/bin/sh
## $Id: SunFireX4500.txt,v 1.4 2009/02/06 14:22:57 HenningFehrmann Exp $
##
## zpool-setup.sh
##
## Made by Carsten Aulbert
## Login
##
## Started on Fri Mar 21 13:55:18 2008 Carsten Aulbert
## Last update Fri Mar 21 13:55:18 2008 Carsten Aulbert
##
# Assuming system disks are c6t0d0 and c6t4d0
ZPOOLNAME=atlashome
# block 1
zpool create -f $ZPOOLNAME raidz2 c{0,1,5,7,8}t0d0 c{6,0,1,5,7,8}t1d0
# block 2
zpool add -f $ZPOOLNAME raidz2 c{6,0,1,7,8}t2d0 c{6,0,1,5,7,8}t3d0
# block 3
zpool add -f $ZPOOLNAME raidz2 c{0,1,5,7,8}t4d0 c{6,0,1,5,7,8}t5d0
# block 4
zpool add -f $ZPOOLNAME raidz2 c{6,0,1,5,8}t6d0 c{6,0,1,5,7,8}t7d0
# two hot spares
zpool add -f $ZPOOLNAME spare c5t2d0 c7t6d0
# create a small zfs file system with reservation, this will help to keep ZFS working once it's filled up
zfs create -o reservation=10M $ZPOOLNAME/badtimes
Exchange faulty disk
Evidence
Every once in a while we will get this message from the fault detector:
# fmdump
TIME UUID SUNW-MSG-ID
Apr 07 08:07:22.4844 658924e5-75a4-c9cf-ff7e-c84b73bb8a6c DISK-8000-0X
This can also be found via IPMI:
# ipmitool sel list
8f00 | 04/07/2008 | 10:07:24 | Drive Slot #0x7a | Drive Fault | Asserted
How to identify that disk
Again, fmdump can help:
# fmdump -v -u 658924e5-75a4-c9cf-ff7e-c84b73bb8a6c
TIME UUID SUNW-MSG-ID
Apr 07 08:07:22.4844 658924e5-75a4-c9cf-ff7e-c84b73bb8a6c DISK-8000-0X
100% fault.io.disk.predictive-failure
Problem in: hc://:product-id=Sun-Fire-X4500:chassis-id=0746AMT037:server-id=s01:serial=KRVN67ZBHUDGEF:part=HITACHI-HDS7250SASUN500G-0737KUDGEF:revision=K2AOAJ0A/motherboard=0/hostbridge=0/pcibus=0/pcidev=2/pcifn=0/pcibus=2/pcidev=1/pcifn=0/sata-port=2/disk=0
Affects: hc://:serial=KRVN67ZBHUDGEF/component=sata1/2
FRU: hc:///component=HD_ID_32
Location: -
Run cfgadm -v, grepping for the sata1/2 label yields:
# cfgadm -v | grep sata1/2
sata1/2::dsk/c1t2d0 connected configured ok Mod: HITACHI HDS7250SASUN500G 0737KUDGEF FRev: K2AOAJ0A SN: KRVN67ZBHUDGEF
So, we need to replace c1t2d0, but please double check the serial number!
Replacing the device
Please make sure not to miss a single step!
- Check if it's a system or data disk, e.g. by running "zpool status". The next steps assume you are handling a disk from your zpool.
# zpool status
pool: atlashome
state: ONLINE
scrub: scrub completed with 0 errors on Tue Apr 8 11:29:09 2008
config:
NAME STATE READ WRITE CKSUM
atlashome ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c0t0d0 ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
c5t0d0 ONLINE 0 0 0
c7t0d0 ONLINE 0 0 0
c8t0d0 ONLINE 0 0 0
c6t1d0 ONLINE 0 0 0
c0t1d0 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
c5t1d0 ONLINE 0 0 0
c7t1d0 ONLINE 0 0 0
c8t1d0 ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c6t2d0 ONLINE 0 0 0
c0t2d0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c7t2d0 ONLINE 0 0 0
c8t2d0 ONLINE 0 0 0
c6t3d0 ONLINE 0 0 0
c0t3d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
c5t3d0 ONLINE 0 0 0
c7t3d0 ONLINE 0 0 0
c8t3d0 ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c0t4d0 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c5t4d0 ONLINE 0 0 0
c7t4d0 ONLINE 0 0 0
c8t4d0 ONLINE 0 0 0
c6t5d0 ONLINE 0 0 0
c0t5d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
c5t5d0 ONLINE 0 0 0
c7t5d0 ONLINE 0 0 0
c8t5d0 ONLINE 0 0 0
raidz2 ONLINE 0 0 0
c6t6d0 ONLINE 0 0 0
c0t6d0 ONLINE 0 0 0
c1t6d0 ONLINE 0 0 0
c5t6d0 ONLINE 0 0 0
c8t6d0 ONLINE 0 0 0
c6t7d0 ONLINE 0 0 0
c0t7d0 ONLINE 0 0 0
c1t7d0 ONLINE 0 0 0
c5t7d0 ONLINE 0 0 0
c7t7d0 ONLINE 0 0 0
c8t7d0 ONLINE 0 0 0
spares
c5t2d0 AVAIL
c7t6d0 AVAIL
So this disk belongs to the second raidz2 sub pool.
zpool offline atlashome c1t2d0
- Check that the zpool is degraded now by "zpool status"
- Unconfigure that device
cfgadm -c unconfigure sata1/2
- the light should now be flashing and you can exchange the drive
- after exchanging the parts, fill out the Global Part Return Tag (PRT), help will be added soon
- notify system that new disk is available again
cfgadm -c configure sata1/2
- tell zpool new disk is there
zpool online atlashome c1t2d0
zpool replace atlashome c1t2d0
- (yes that's right, replace it with itself)
- Wait until resilvering process is done (zpool status will tell you more)
- Everything should fine now, a scrub will tell you more
zpool scrub atlashome
Setup from the ground up with Solaris 10u6
- Install the core stuff from DVD
- if you want more software from DVD, mount it with
mount -F hsfs /dev/dsk/c4t0d0p0 /mnt
(assuming c4t0d0p0 is the drive, I needed to perform ls /dev/dsk/c4* before), software package are then under /mnt/Solaris_10/Product
- install these packages
pkgadd -d. SUNWxsvc SUNWsshcu SUNWsshdr SUNWsshdu SUNWsshr SUNWsshu SUNWdoc SUNWxwrtl SUNWtoo SUNWxwrtl SUNWxwice SUNWxwplt
- The package
SUNWhd
would also be nice, however, it was not found on our 10u6 DVD
- install the ssh configuration and host keys under
/etc/ssh
(usually a copy of another machine)
-
svcadm enable ssh; svcadm restart ssh
to enable ssh-server
- use
export TERM=vt100
to enable easy editing with vi
- edit
/etc/nsswitch.conf
to have the line host: file dns
-
/etc/resolv.conf
should read
domain atlas.local
nameserver 10.20.30.2
- follow the instructions on Blastwave to get access to the blastwave repo
- install a base set of stuff:
pkgutil -i CSWless CSWforemost CSWemacs
software
git
required sources
required packages
- curl-7.19.3-sol10-x86-local einstein-dl: 130.75.116.202
- libiconv-1.9.2-sol10-x86-local
- libintl-3.4.0-sol10-x86-local
installation
use git