This is also a todo list which needs to be implemented. A SMS is send out to a list of numbers whenever a email is send to sms@postfixserver.atlas.local. The subject line is the text which will be sent. Please ensure that the warning message is less then 160 characters.
Conditions
- LCP water in temperature too high: Avg. T_in > 16C
- Rack switched off
- Smoke sensor alarm
- UPS going on battery, back to mains
- battery level crosses threshold
SMS send program
- make sure another SMS is not send before a certain "dead" time is reached
- which telephone numbers to send to?
- do we need different notification layers?
Exact error messages
To enable a dead time for the SMS program, the incoming messages need to be standardized. We will use the bar | as a delimiter, everything after that will be ignored. The dead time means that any mail arriving with the same message within the dead time will simply be discarded. This should prevent us being flooded by SMS.
LCP water temp too high:
Ex: "Water temp to LCPs too high | 18C" - dead time: 1h?
LCP water temp too low:
Ex: "Water temp to LCPs too low | 8C" - dead time: 1h?
A rack is completely switched off:
Ex: "Rack 14 is switched off" - dead time: 24h?
Smoke sensor:
Ex: "Smoke sensor alert in rack 05" - dead time: 1h?
UPS on battery (only the switch should emit a message):
Ex: "UPS switched to battery mode" - dead time: 0m?
UPS battery below thresholds (75%, 50%, 25%):
Ex: "UPS battary below 75%" - dead time: 6h?
Ex: "UPS battary below 50%" - dead time: 3h?
Ex: "UPS battary below 25%" - dead time: 1h?
UPS on mains (only the switch should emit a message):
Ex: "UPS switch to mains" - dead time: 0m?
Node MCE error:
Ex:
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 BANK 3
TSC 15c3d0c575579
ADDR 857c340
Node NIC problem::
Ex:
Time: hpet clocksource has been installed.
e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang