Hardware
MGE Galaxy 6000
ups.mfr: MGE
UPS SYSTEMS
ups.model: Galaxy Single
UPS
ups.serial: 49EH29101
ups.date: 2008/03/24
ups.firmware:
GAU305S300IC303
ups.firmware.aux: Network Management Card/Transverse/GB
Servers
Server 1: Pyramid: 4 x Intel(R) Xeon(R) CPU X3220 @ 2.40GHz with
MemTotal: 4053584 kB
Server 2: Pyramid: 4 x Intel(R) Xeon(R) CPU X3220 @ 2.40GHz with
MemTotal: 8184300 kB
Software
NUT
Abstract
The primary goal of the Network
UPS Tools (NUT) project is to provide reliable monitoring of
UPS hardware and ensure safe shutdowns of the systems which are connected.
How it works
Nut has server - client based architecture. The functionality depends on:
- ups driver runing at background
- ups deamon connected to that driver
- clients connecting the deamon for example upsc, upsmon, etc.
Upsmon in particular starts notification program, for example upssched, that runs scripts for doing the real work,e.g. shut down the machine.
Installation
We use self build package from the svn repository. Currently it is r1378 and
apt-get install nut
should works.
Server side configuration files
/etc/nut/ups.conf
[galaxy]
driver=netxml-ups
port = http://172.25.2.1
desc = "GALAXY 6000"
pollinterval = 30
/etc/nut/upsd.conf
LISTEN 172.25.2.200 3493
LISTEN 10.90.90.200 3494
/etc/nut/upsd.users
[admin]
password=admin
actions=SET
instcmds=ALL
[monuser]
password=monuser
actions=SET
upsmon master
Client side configuration files
/etc/nut/upsmon.conf
MONITOR galaxy@172.25.2.200 1 monuser monuser master
MINSUPPLIES 1
SHUTDOWNCMD "/sbin/shutdown -h now"
NOTIFYCMD "/sbin/upssched"
POLLFREQ 15
POLLFREQALERT 15
HOSTSYNC 15
DEADTIME 30
POWERDOWNFLAG /etc/killpower
NOTIFYFLAG ONLINE SYSLOG+EXEC+WALL
NOTIFYFLAG ONBATT SYSLOG+EXEC+WALL
RBWARNTIME 43200
NOCOMMWARNTIME 300
FINALDELAY 5
/etc/nut/upssched.conf
CMDSCRIPT /etc/nut/down.sh
PIPEFN /var/run/upssched.pipe
LOCKFN /var/run/upssched.lock
AT ONLINE galaxy@172.25.2.200 CANCEL-TIMER onbatt
AT ONLINE galaxy@172.25.2.200 EXECUTE online
AT ONBATT galaxy@172.25.2.200 START-TIMER onbatt 60
/etc/nut/down.sh
#!/bin/sh
case $1 in
onbatt)
logger -s "The UPS is on battery... THE END."
/sbin/shutdown -h now
;;
online)
logger -s "The UPS is online,check battery.charge"
/etc/nut/upsonline.sh
;;
*)
logger -s "Something else: $1">> /tmp/galaxy.msg
;;
esac
/etc/nut/upsonline.sh
#!/bin/bash
UPS=galaxy
HOST=172.25.2.200
LIMIT_LEVEL=75
BATT_LEVEL=$(upsc $UPS@$HOST | grep battery.charge: | sed 's/battery.charge://');
if [ $BATT_LEVEL -lt $LIMIT_LEVEL ] ; then
logger -s "shut down node... ups online but battery.charge: $BATT_LEVEL";
shutdown -h now;
else
logger -s "battery.charge: $BATT_LEVEL";
exit;
fi
/etc/nut/upschk.sh
#!/bin/bash
UPS=galaxy
HOST=172.25.2.200
SLEEP=60
LIMIT_LEVEL=25
BATT_LEVEL=$(upsc $UPS@$HOST | grep battery.charge: | sed 's/battery.charge://');
if [ $BATT_LEVEL -lt "99" ] ; then
logger -s "shut down upsmon... execute /etc/init.d/upsmond stop";
/etc/init.d/upsmond stop;
else
logger -s "battery.charge: $BATT_LEVEL";
exit;
fi
while [ 1 ] ; do
clear;
BATT_LEVEL=$(upsc $UPS@$HOST | grep battery.charge: | sed 's/battery.charge://');
logger -s "Readed battery charge level $BATT_LEVEL% from $UPS@$HOST"
if [ $BATT_LEVEL -gt "98" ] ; then
logger -s "...level is near 99% start upsmond and return back to normal life.";
/etc/init.d/upsmond start;
exit;
fi
if [ $BATT_LEVEL -gt $LIMIT_LEVEL ] ; then
logger -s "...level > $LIMIT_LEVEL%, that's O.K., sleep for $SLEEP seconds.";
else
logger -s "...level < $LIMIT_LEVEL%, that's BAD, perform shutdown now!";
shutdown -h now;
fi
sleep $SLEEP;
done
Programs
upsdrvctl -
UPS driver controller
upsc - example lightweight
UPS client
upsd -
UPS information server
upslog -
UPS status logger
upsmon -
UPS monitor and shutdown controller
upscmd -
UPS administration program for instant commands
upsrw -
UPS variable administration tool
upssched - Timer helper for scheduling events from upsmon
WEB interfaces
There is a build-in java based web interface, available under
http://172.25.2.1
NUT based html interface can be found here
https://n0.aei.uni-hannover.de/ups
Configuration
IP scheme
MGE Galaxy 6000: 172.25.2.1
Server: 172.25.2.100 / 10.20.40.100 , hostname
nutdemon1
Server: 172.25.2.200 / 10.20.40.200 , hostname
nutdemon2
Shutdown strategy
We use time / battery dependend scheme to turn off the cluster, if on battery state occurs. The flow is:
0. onbatt
1. 1 min. later shut down all compute nodes
2. if battery.charge lower than 50 \% shut down all storage nodes
3. if battery.charge lower than 25 \% shut down all head nodes
4. if battery low, all remaining nodes goes down
An online state is followed by script checking the current battery.charge, to be on the safe side if successive
power lose occurs.
We do not use automatic power on. If the cluster is down, it stays down until turned on by hand.
NUT configuration
Currently we use only one server:
nutdemon2
Solaris notes
For the sun machines running solaris you need to install
upsmon-i386
unpack
usr_local_ups_etc.tar.bz2
to /usr/local/ups/etc
copy
upsmond
to /etc/init.d/upsmond
and add it to runlevel
ln /etc/init.d/upsmond /etc/rc2.d/S99upsmond
ln /etc/init.d/upsmond /etc/rc3.d/S99upsmond
External links
http://www.networkupstools.org/faq/