HSM upgrade
Current status
2014-02-01T12:33Z: samfsdump/final backup stopped due to too many non-archived files. Rushing to archive those as fast as possible.
2014-02-01T21:09Z: Still about 7 million files (10TB) to be archived to tape. Hopefully done within 12hrs.
2014-02-02T06:20Z: Over night, archiving finished and backup now took 45 minutes. Phase 2 can finally start, reinstalling central server and upgrade array firmware
2014-02-02T08:50Z: Oracle 6780 array upgrade done, DDN SFA10000 update underway
2014-02-02T11:09Z: DDN SFA10000 update done, a few warnings persist, contacted DDN.
2014-02-02T17:11Z: Main server reinstalled, SAM/QFS installed, next step configuring it properly
2014-02-03T10:05Z: Main server can mount all file systems just fine, however, QFS clients cannot. Investigating together with Oracle.
2014-02-03T19:10Z: Some progress has been made and we have a promising route to reestablish services soon.
2014-02-04T13:10Z: We are currently reinstalling all QFS clients to bring them into the same state, please bear with us a little bit longer, we should be back soon
2014-02-04T15:00Z: We are back in action, exceptions so far are titan1 and titan2 which still have problems with their new configuration.
Plans/Details
Scheduled starttime: 2014-01-30T09:00Z
Our hierarchical storage management (HSM) will be upgraded in the days following 2014-01-30. We will try to keep the downtime as short as possible, however, we expect a
minimum downtime of 2 days.
Please ensure,
- you stop your Condor jobs or put them on hold
- do not have screen sessions running
- are logged out of head nodes
During the work you may log in to the head nodes und store files under /local/user/
, however you will not be able to perform many tasks.
Our HSM consists of 6 computers (QFS clients) which act as NFS servers to the cluster, a central computer ("meta data server" (MDS), SAM/QFS server) and a tape library. The MDS acts as the central organizer between disk arrays (about 750 disks), the tape library (8 drives and more than 2000 tapes) and the client requests (your jobs).
During the upgrade we will
- put condor jobs on hold (for any user affected, see below) done
- remove NFS mounts for all users on the HSM done
- shut down QFS clients done
- perform full file systems dumps (20 file systems in total) done
- perform full backup of meta data server done
- install Solaris 11 on all QFS servers done
- perform firmware upgrades on disk arrays (Oracle 6780) done
- perform firmware upgrades on disk arrays (DDN SFA10000) done
- install Solaris 11 on SAMFS server done
- install SAM/QFS on SAMFS server done
- install QFS on all QFS clients done
- ensuring all file systems are mountable and distributable in progress
- bring system up again
users affected
The following users are affected by this work:
accadia fdonovan matthew.edwards
adam.mullavey fehrmann maxime.fays
afina.neunzert forte max.isi
ajith francesco.direnzo mbebronn
alessandra.corsi francesco.piergiovanni mcoughlin
alexander.cole frank.ohme mdetert
alexander.mellus frederick.coburn michael.puerrer
alexander.urban fritz.miot michele
alex.nielsen gaborg millerd
almir.alemic gabor.szeifert min-a.cho
anamaria gabriela.gonzalez muhammed.saleem
anderson gabriela.hernandez mwas
andrew.miller gabriela.serna namgyu.kim
andrew.rodger gabriel.islas nathaniel.indik
andrew.williamson gareth.pickford nce
andri.gretarsson geodc nicole.darman
anirban.ain gharry none
anthony.lefeld giancarlocella oriella.torre
antonio.perreca gimazz paleac
anuradha.samajdar giovanni.rabuffo patricia.porter
ashikuzzaman.idrisy gmartini patricia.schmidt
ashish.mahabal graef patrick.meyers
ashley.disbrow grant.meadors paul.hopkins
asperanz greenley paul.lasky
astroeer grifonator pbrem
atbraack guillermo.valdes pehrens
avecchio halston.lim peng.geng
ballen hannah.middleton pfreire
bangalore.sathyaprakash harald.pfeiffer praffai
bastiaan.swinkels haris.k prathamesh.dalvi
bbehnke hbeggenstein qi.chu
belinda.cheeseboro hcmarroc quitzow
bema hjkim rajesh.nayak
benacquista hoff ramesh
benjamin.aylott hpletsch rana.adhikari
benno.puetz hunter.gabbard re
bernard.hall igor.andreoni reatough
bgarcia igorbilenko reedessick
bhubbert irene rhondale.tso
bianca.danilet irina.ene ripeschke
bose isantiago robert.coyne
boyang jackson.henry rolland
brandi.dunnington jaclyn.sanders ryan.darragh
branson.stephens jacob.peoples ryan.goetz
brevilo jade.powell ryan.lynch
bsomhegy jaehyun.lee ryan.magee
byuan james.bell rynge
carl.brannen james.cowley saeed.mirshekari
carl-johan.haster jason.tye salemi
carsten jayanti.prasad salvatore.vitale
cbiwer jeong-su.ha samantha.usman
chandramishra jeroen.meidam sanjit.mitra
charlton jhcscargill satya
chase.kernan jiafrate scaudill
chericoni.domizia jialun.luo scottmsul
chmahr jing.ming sebastian
chmess jlogue serena.vinciguerra
chohs joey.key sfischet
christian john.le sfranco
christopher.berry jonathan.bayless s.gwynne.crowder
chunglee.kim jonathan.hanks shaltev
ckim josephb shaon
claudia.lazzaro joseph.bowers shi
claudio.casentini joshua.kerrigan shinkee.chung
clio jotradov shivaraj.kandhasamy
colin.clark jslutsky simon
connor.skeehan juan.bustillo simon.stevenson
cristian.maureira justing sinead.walsh
cristiano.palomba justin.tervala slawomir.gras
daniel.duddleston justin.wagner smorriss
daniele.trifiro juve soenke.schuster
daniel.evans kalina.nedkova surabhi.sachdev
dantonio karla.guardado swetha.bhagwat
david.groden katherine.grover sydney.chamberlin
david.kelley kawies szabolcs.marka
david.morate keiko tania
david.stiles kendall.ackley tdent
dbrown kent teresa.symons
deborah.good kg.arun thomas
deborah.hamm kgrover thomas.adams
dietz kloew thomas.downes
dkeitel koutarou.kyutoku tito
dkeppel kremin tom.wantock
dmeacher laleh.sadeghian tsidery
dmsima laszlo.gondan vaibhav
drago laura.spitler vansuch
dtalukder lauro.salazar vedovato
edaw leroy veronica.lockett-ruiz
eddy lesteves vicere
egoetz lex vihan.pandey
eharstad lrodriguez vincent.roma
eheaton lucas violet.poole
einstein lucas.giolas wademc
einstein.temp lucas.johns walter
einstein.work lwade weigang.liu
emacayeal magathos william.tritch
emaros manca wpozzo
eric marcel.kehl xian.chen
eric.lebigot marc.normandin xiangyu.guo
evan.anders marco.tompitak xiaoge.wang
evan.foley marek.szczepanczyk xilong.fan
evan.keane maria.tringali yatish
fabian.magana-sandoval marion yingsheng.ji
fabio.ricci marissa.walker yuanhao.zhang
fabrizia matthew zijing.yang
fan.zhang matthew.cowart
Misc
(Friday/Saturday) We encountered problems with two file systems. Instead of taking the risk of losing up to 20TB of data, we stopped the dump and try to force as many of these fiels to tape as fast as possible
-- CarstenAulbert - 28 Jan 2014