You are here: Foswiki>ATLAS Web>UPS (20 Apr 2008, Shaltev)Edit Attach

Hardware

MGE Galaxy 6000

ups.mfr: MGE UPS SYSTEMS

ups.model: Galaxy Single UPS

ups.serial: 49EH29101

ups.date: 2008/03/24

ups.firmware: GAU305S300IC303

ups.firmware.aux: Network Management Card/Transverse/GB

Servers

Server 1: Pyramid: 4 x Intel(R) Xeon(R) CPU X3220 @ 2.40GHz with MemTotal: 4053584 kB

Server 2: Pyramid: 4 x Intel(R) Xeon(R) CPU X3220 @ 2.40GHz with MemTotal: 8184300 kB

Software

NUT

Abstract

The primary goal of the Network UPS Tools (NUT) project is to provide reliable monitoring of UPS hardware and ensure safe shutdowns of the systems which are connected.

How it works

Nut has server - client based architecture. The functionality depends on:

  • ups driver runing at background
  • ups deamon connected to that driver
  • clients connecting the deamon for example upsc, upsmon, etc.

Upsmon in particular starts notification program, for example upssched, that runs scripts for doing the real work,e.g. shut down the machine.

Installation

We use self build package from the svn repository. Currently it is r1378 and

 apt-get install nut

should works.

Server side configuration files

/etc/nut/ups.conf

 [galaxy]
 driver=netxml-ups
 port = http://172.25.2.1
 desc = "GALAXY 6000"
 pollinterval = 30

/etc/nut/upsd.conf

 LISTEN 172.25.2.200 3493
 LISTEN 10.90.90.200 3494

/etc/nut/upsd.users
 [admin]
 password=admin
 actions=SET
 instcmds=ALL
 [monuser]
 password=monuser
 actions=SET
 upsmon master

Client side configuration files

/etc/nut/upsmon.conf

 MONITOR galaxy@172.25.2.200 1 monuser monuser master
 MINSUPPLIES 1
 SHUTDOWNCMD "/sbin/shutdown -h now"
 NOTIFYCMD "/sbin/upssched"
 POLLFREQ 15
 POLLFREQALERT 15
 HOSTSYNC 15
 DEADTIME 30
 POWERDOWNFLAG /etc/killpower
 NOTIFYFLAG ONLINE SYSLOG+EXEC+WALL
 NOTIFYFLAG ONBATT SYSLOG+EXEC+WALL
 RBWARNTIME 43200
 NOCOMMWARNTIME 300
 FINALDELAY 5

/etc/nut/upssched.conf

 CMDSCRIPT /etc/nut/down.sh
 PIPEFN /var/run/upssched.pipe
 LOCKFN /var/run/upssched.lock
 AT ONLINE galaxy@172.25.2.200 CANCEL-TIMER onbatt
 AT ONLINE galaxy@172.25.2.200 EXECUTE online
 AT ONBATT galaxy@172.25.2.200 START-TIMER onbatt 60

/etc/nut/down.sh

 #!/bin/sh
 case $1 in
 onbatt)
 logger -s "The UPS is on battery... THE END."
 /sbin/shutdown -h now
 ;;
 online)
 logger -s "The UPS is online,check battery.charge"
 /etc/nut/upsonline.sh
 ;;
 *)
 logger -s "Something else: $1">> /tmp/galaxy.msg
 ;;
 esac

/etc/nut/upsonline.sh

 #!/bin/bash
 UPS=galaxy
 HOST=172.25.2.200
 LIMIT_LEVEL=75
 BATT_LEVEL=$(upsc $UPS@$HOST | grep battery.charge: | sed 's/battery.charge://');
 if [ $BATT_LEVEL -lt $LIMIT_LEVEL ] ; then
 logger -s "shut down node... ups online but battery.charge: $BATT_LEVEL";
 shutdown -h now;
 else
 logger -s  "battery.charge: $BATT_LEVEL";
 exit;
 fi

/etc/nut/upschk.sh

 #!/bin/bash
 UPS=galaxy
 HOST=172.25.2.200
 SLEEP=60
 LIMIT_LEVEL=25
 BATT_LEVEL=$(upsc $UPS@$HOST | grep battery.charge: | sed 's/battery.charge://');
 if [ $BATT_LEVEL -lt "99" ] ; then
 logger -s "shut down upsmon... execute /etc/init.d/upsmond stop";
 /etc/init.d/upsmond stop;
 else
 logger -s "battery.charge: $BATT_LEVEL";
 exit;
 fi
 while [ 1 ] ; do
 clear;
 BATT_LEVEL=$(upsc $UPS@$HOST | grep battery.charge: | sed 's/battery.charge://');
 logger -s "Readed battery charge level $BATT_LEVEL% from $UPS@$HOST"
 if [ $BATT_LEVEL -gt "98" ] ; then
 logger -s "...level is near 99% start upsmond and return back to normal life.";
 /etc/init.d/upsmond start;
 exit;
 fi
 if [ $BATT_LEVEL -gt $LIMIT_LEVEL  ] ; then
 logger -s "...level > $LIMIT_LEVEL%, that's O.K., sleep for $SLEEP seconds.";
 else
 logger -s "...level < $LIMIT_LEVEL%, that's BAD, perform shutdown now!";
 shutdown -h now;
 fi
 sleep $SLEEP;
 done

Programs

upsdrvctl - UPS driver controller
upsc - example lightweight UPS client
upsd - UPS information server
upslog - UPS status logger
upsmon - UPS monitor and shutdown controller
upscmd - UPS administration program for instant commands
upsrw - UPS variable administration tool
upssched - Timer helper for scheduling events from upsmon

WEB interfaces

There is a build-in java based web interface, available under http://172.25.2.1

NUT based html interface can be found here https://n0.aei.uni-hannover.de/ups

Configuration

IP scheme

MGE Galaxy 6000: 172.25.2.1

Server: 172.25.2.100 / 10.20.40.100 , hostname nutdemon1

Server: 172.25.2.200 / 10.20.40.200 , hostname nutdemon2

Shutdown strategy

We use time / battery dependend scheme to turn off the cluster, if on battery state occurs. The flow is:

0. onbatt

1. 1 min. later shut down all compute nodes

2. if battery.charge lower than 50 \% shut down all storage nodes

3. if battery.charge lower than 25 \% shut down all head nodes

4. if battery low, all remaining nodes goes down

An online state is followed by script checking the current battery.charge, to be on the safe side if successive power lose occurs.

We do not use automatic power on. If the cluster is down, it stays down until turned on by hand.

NUT configuration

Currently we use only one server: nutdemon2

Solaris notes

For the sun machines running solaris you need to install

upsmon-i386

unpack

usr_local_ups_etc.tar.bz2

to /usr/local/ups/etc

copy

upsmond

to /etc/init.d/upsmond

and add it to runlevel

 ln /etc/init.d/upsmond /etc/rc2.d/S99upsmond
 ln /etc/init.d/upsmond /etc/rc3.d/S99upsmond

External links

http://www.networkupstools.org/faq/
Topic revision: r1 - 20 Apr 2008, Shaltev
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback