Sun Fire X4500

We have got a Sun Fire X4500 , a dual-Opteron storage server with 48 SATA 500GB drives from Sun Microsystems.

Installation

It weights about 80kg - so putting out harddisks and power supplies for installation is a good idea!

A good starting point for X4500 installation issues is available at our collegues' site at UWM here and here

Starting the Server

As soon as you plug the power to the to supplies the server makes noise and starts with some self-tests which take about one minute. After that bios, grub and finally solaris boots. The vga-ports output is redirected, and thats why after the grub menu, you will just see a blinking cursor. Anyway accessing the server via SP/remote console is the better alternative.

SP

You can access the servers BMC, or Service Processor/ILOM how Sun calls it, via several ways. ILOM administration guide
  • SSH/Command line: You have got to connect the NET MGT. Look at the servers BIOS settings for activating SP DHCP and getting the SPs MAC-Address. Then you can access the SP via
  ssh root@192.168.0.112 (or whatever is SPs IP - default password is changeme)
Type "ESC" and "(" to leave the remote console.
  • IPMI and SNMP interfaces
  • Web-GUI on the SPs IP
  • Serial connection (SER MGT port) - there is an adapter for RJ45 to serial port
  • Connecting a monitor to the box wont work, cause output is redirected to remote console (see above)

Now we want to log in to Solaris:
  start /SYS (switch on the thumper, if not done yet)
  start /SP/console (start the remote console - log in to Solaris)

During first boot of Solaris you have to answer some questions via installation menu in order to specify language, clock etc.

But SP-CLI can do so many more usefull things, e.g.
  show /SYS/HD/HDD35
  set /SYS/LOCATE/ value=off

But all this can also be done with stuff like ipmitools.

Configuration

Install patches via pca

  • Download pca here
  • wget can be found under /usr/sfw/bin
  • if you want more output, look into pca and delete the -q for WGET
  • run the tool, it will show you how many patches are missing

Install software via blastwave

As root run
 pkgadd -d http://www.blastwave.org/pkg_get.pkg
and answer to all questions either all or yes.

Add a suitable mirror, e.g.
 url=http://ftp.uni-erlangen.de/pub/mirrors/blastwave.org/stable
to
 /opt/csw/etc/pkg-get.conf

If you are feeling lazy and don't want to answer many questions during installation, please run
 cp -p /var/pkg-get/admin-fullauto /var/pkg-get/admin

Now you can start installing new software, e.g.
 /opt/csw/bin/pkg-get -i wget rsync emacs vim

(don't forget to put set PATH accordingly)

Mirror System disk

After a basic installation the system disk is not mirrored anymore. To do so, use the following script, but make sure you edit the variables first!

 #!/bin/sh
 ## $Id: SunFireX4500.txt,v 1.4 2009/02/06 14:22:57 HenningFehrmann Exp $
 ##
 ## mirror_system_disk.sh
 ## 
 ## Made by Carsten Aulbert
 ## Login   
 ## 
 ## Started on  Fri Mar 21 13:01:16 2008 Carsten Aulbert
 ## Last update Fri Mar 21 13:01:16 2008 Carsten Aulbert
 ##
 
 # system & mirror disk
 SYSTEMDISK=c6t0d0
 MIRRORDISK=c6t4d0
 
 CYLSTART=5611
 
 ROOTSLICE=s0
 VARSLICE=s1
 SWAPSLICE=s3
 LOGFILE=/tmp/mirror.out
 
 echo "Starting mirror process" > $LOGFILE
 # show current layout
 echo -e "p\np\nq\nq\n" | format -d $SYSTEMDISK | tee -a $LOGFILE
 echo -e "p\np\nq\nq\n" | format -d $MIRRORDISK | tee -a $LOGFILE
 
 # add seventh partition (no magic performed)
 echo -e "p\n7\n\n\n$CYLSTART\n32130b\np\nlabel\ny\nq\nq\n" | format -d $SYSTEMDISK | tee -a $LOGFILE
 echo -e "p\n7\n\n\n$CYLSTART\n32130b\np\nlabel\ny\nq\nq\n" | format -d $MIRRORDISK | tee -a $LOGFILE
 
 # more output
 prtvtoc /dev/dsk/${SYSTEMDISK}s2 | tee -a $LOGFILE
 prtvtoc /dev/dsk/${SYSTEMDISK}s2 | fmthard -i -s - /dev/rdsk/${MIRRORDISK}s2 | tee -a $LOGFILE
 
 # do it
 prtvtoc /dev/dsk/${SYSTEMDISK}s2 | fmthard -s - /dev/rdsk/${MIRRORDISK}s2 | tee -a $LOGFILE
 
 # init metadb
 metadb -a -f ${SYSTEMDISK}s7 ${MIRRORDISK}s7
 metadb | tee -a $LOGFILE
 
 # work on swap
 swap -l | tee -a $LOGFILE
 swap -d /dev/dsk/${SYSTEMDISK}${SWAPSLICE} | tee -a $LOGFILE
 swap -l | tee -a $LOGFILE
 
 metainit d20 1 1 ${SYSTEMDISK}${SWAPSLICE} | tee -a $LOGFILE
 metainit d21 1 1 ${MIRRORDISK}${SWAPSLICE} | tee -a $LOGFILE
 metastat                                   | tee -a $LOGFILE
 metainit d2 -m d20 d21                     | tee -a $LOGFILE
 metastat                                   | tee -a $LOGFILE
 
 swap -a /dev/md/dsk/d2 | tee -a $LOGFILE
 swap -l | tee -a $LOGFILE
 
 # now /
 metainit -f d10 1 1 ${SYSTEMDISK}${ROOTSLICE} | tee -a $LOGFILE
 metainit -f d11 1 1 ${MIRRORDISK}${ROOTSLICE} | tee -a $LOGFILE
 metainit d1 -m d10 | tee -a $LOGFILE
 metaroot d1 | tee -a $LOGFILE
 cat /etc/vfstab | tee -a $LOGFILE
 
 # and /var
 metainit -f d30 1 1 ${SYSTEMDISK}${VARSLICE} | tee -a $LOGFILE
 metainit -f d31 1 1 ${MIRRORDISK}${VARSLICE} | tee -a $LOGFILE
 metainit d3 -m d30 | tee -a $LOGFILE
 metattach d3 d31 | tee -a $LOGFILE
 
 
 # user's TODO
 cat <

Create zpool

Right now this is suboptimal, after we will get rid off the system disks, we will optimize it more:

 #!/bin/sh
 ## $Id: SunFireX4500.txt,v 1.4 2009/02/06 14:22:57 HenningFehrmann Exp $
 ##
 ## zpool-setup.sh
 ## 
 ## Made by Carsten Aulbert
 ## Login   
 ## 
 ## Started on  Fri Mar 21 13:55:18 2008 Carsten Aulbert
 ## Last update Fri Mar 21 13:55:18 2008 Carsten Aulbert
 ##
 
 # Assuming system disks are c6t0d0 and c6t4d0
 
 ZPOOLNAME=atlashome
 
 # block 1
 zpool create -f $ZPOOLNAME raidz2 c{0,1,5,7,8}t0d0 c{6,0,1,5,7,8}t1d0
 
 # block 2
 zpool add -f $ZPOOLNAME raidz2 c{6,0,1,7,8}t2d0 c{6,0,1,5,7,8}t3d0
 
 # block 3
 zpool add -f $ZPOOLNAME raidz2 c{0,1,5,7,8}t4d0 c{6,0,1,5,7,8}t5d0
 
 # block 4
 zpool add -f $ZPOOLNAME raidz2 c{6,0,1,5,8}t6d0 c{6,0,1,5,7,8}t7d0
 
 # two hot spares
 zpool add -f $ZPOOLNAME spare c5t2d0 c7t6d0
 
 
 # create a small zfs file system with reservation, this will help to keep ZFS working once it's filled up
 zfs create -o reservation=10M $ZPOOLNAME/badtimes

Exchange faulty disk

Evidence

Every once in a while we will get this message from the fault detector:
 # fmdump
 TIME                 UUID                                 SUNW-MSG-ID
 Apr 07 08:07:22.4844 658924e5-75a4-c9cf-ff7e-c84b73bb8a6c DISK-8000-0X

This can also be found via IPMI:
 # ipmitool sel list
 8f00 | 04/07/2008 | 10:07:24 | Drive Slot #0x7a | Drive Fault | Asserted

How to identify that disk

Again, fmdump can help:
 # fmdump -v -u 658924e5-75a4-c9cf-ff7e-c84b73bb8a6c
 TIME                 UUID                                 SUNW-MSG-ID
 Apr 07 08:07:22.4844 658924e5-75a4-c9cf-ff7e-c84b73bb8a6c DISK-8000-0X
 100%  fault.io.disk.predictive-failure
 Problem in:  hc://:product-id=Sun-Fire-X4500:chassis-id=0746AMT037:server-id=s01:serial=KRVN67ZBHUDGEF:part=HITACHI-HDS7250SASUN500G-0737KUDGEF:revision=K2AOAJ0A/motherboard=0/hostbridge=0/pcibus=0/pcidev=2/pcifn=0/pcibus=2/pcidev=1/pcifn=0/sata-port=2/disk=0
 Affects: hc://:serial=KRVN67ZBHUDGEF/component=sata1/2
 FRU: hc:///component=HD_ID_32
 Location: -

Run cfgadm -v, grepping for the sata1/2 label yields:
 # cfgadm -v | grep sata1/2
 sata1/2::dsk/c1t2d0            connected    configured   ok         Mod: HITACHI HDS7250SASUN500G 0737KUDGEF FRev: K2AOAJ0A SN: KRVN67ZBHUDGEF

So, we need to replace c1t2d0, but please double check the serial number!

Replacing the device

Please make sure not to miss a single step!

  • Check if it's a system or data disk, e.g. by running "zpool status". The next steps assume you are handling a disk from your zpool.
 # zpool status
  pool: atlashome
 state: ONLINE
 scrub: scrub completed with 0 errors on Tue Apr  8 11:29:09 2008
 config:
        NAME        STATE     READ WRITE CKSUM
        atlashome   ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t0d0  ONLINE       0     0     0
            c1t0d0  ONLINE       0     0     0
            c5t0d0  ONLINE       0     0     0
            c7t0d0  ONLINE       0     0     0
            c8t0d0  ONLINE       0     0     0
            c6t1d0  ONLINE       0     0     0
            c0t1d0  ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c5t1d0  ONLINE       0     0     0
            c7t1d0  ONLINE       0     0     0
            c8t1d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c6t2d0  ONLINE       0     0     0
            c0t2d0  ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
            c7t2d0  ONLINE       0     0     0
            c8t2d0  ONLINE       0     0     0
            c6t3d0  ONLINE       0     0     0
            c0t3d0  ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0
            c5t3d0  ONLINE       0     0     0
            c7t3d0  ONLINE       0     0     0
            c8t3d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
            c5t4d0  ONLINE       0     0     0
            c7t4d0  ONLINE       0     0     0
            c8t4d0  ONLINE       0     0     0
            c6t5d0  ONLINE       0     0     0
            c0t5d0  ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
            c5t5d0  ONLINE       0     0     0
            c7t5d0  ONLINE       0     0     0
            c8t5d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c6t6d0  ONLINE       0     0     0
            c0t6d0  ONLINE       0     0     0
            c1t6d0  ONLINE       0     0     0
            c5t6d0  ONLINE       0     0     0
            c8t6d0  ONLINE       0     0     0
            c6t7d0  ONLINE       0     0     0
            c0t7d0  ONLINE       0     0     0
            c1t7d0  ONLINE       0     0     0
            c5t7d0  ONLINE       0     0     0
            c7t7d0  ONLINE       0     0     0
            c8t7d0  ONLINE       0     0     0
        spares
          c5t2d0    AVAIL
          c7t6d0    AVAIL
So this disk belongs to the second raidz2 sub pool.
  • Bring device offline
 zpool offline atlashome c1t2d0
  • Check that the zpool is degraded now by "zpool status"
  • Unconfigure that device
 cfgadm -c unconfigure sata1/2
  • the light should now be flashing and you can exchange the drive
  • after exchanging the parts, fill out the Global Part Return Tag (PRT), help will be added soon
  • notify system that new disk is available again
 cfgadm -c configure sata1/2
  • tell zpool new disk is there
 zpool online atlashome c1t2d0
 zpool replace atlashome c1t2d0
  • (yes that's right, replace it with itself)
  • Wait until resilvering process is done (zpool status will tell you more)
  • Everything should fine now, a scrub will tell you more
 zpool scrub atlashome

Setup from the ground up with Solaris 10u6

  • Install the core stuff from DVD
  • if you want more software from DVD, mount it with mount  -F hsfs /dev/dsk/c4t0d0p0 /mnt (assuming c4t0d0p0 is the drive, I needed to perform ls /dev/dsk/c4* before), software package are then under /mnt/Solaris_10/Product
  • install these packages pkgadd -d. SUNWxsvc SUNWsshcu SUNWsshdr SUNWsshdu SUNWsshr SUNWsshu SUNWdoc SUNWxwrtl SUNWtoo SUNWxwrtl SUNWxwice SUNWxwplt
  • The package SUNWhd would also be nice, however, it was not found on our 10u6 DVD
  • install the ssh configuration and host keys under /etc/ssh (usually a copy of another machine)
  • svcadm enable ssh; svcadm restart ssh to enable ssh-server
  • use export TERM=vt100 to enable easy editing with vi
  • edit /etc/nsswitch.conf to have the line host:  file dns
  • /etc/resolv.conf should read
domain atlas.local
nameserver 10.20.30.2
  • follow the instructions on Blastwave to get access to the blastwave repo
  • install a base set of stuff: pkgutil -i CSWless CSWforemost CSWemacs

software

git

required sources

required packages

  • curl-7.19.3-sol10-x86-local einstein-dl: 130.75.116.202
  • libiconv-1.9.2-sol10-x86-local
  • libintl-3.4.0-sol10-x86-local

installation

  • dependencies are note resolved while installation. You have to do it yourself.
  • the sunfreeware packages are already on s00 in /export/packages
  • all packages from the installation cd and from sunfreeware you can install via
    pkgadd -d  package
  • the blastwave packages you install via
    pkg-get install package
    pkg-get -a lists all available packages.
  • set the library path
     export LD_LIBRARY_PATH=/opt/csw/lib:/usr/local/lib:$LD_LIBRARY_PATH

use git

This topic: ATLAS > SunFireX4500
Topic revision: 06 Feb 2009, HenningFehrmann
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback