ClusterMonitoring

Description

We need to monitor frequently various values in the cluster. See Ticket here: https://n0.aei.uni-hannover.de/tracking/issues/show/6 The first part of the OID is not shown. It is 1.3.6.1.4.1.

where what importance how IPSort OID Message
file server disk failure SMS SNMP 172.28.11.x .18928.1.1.5.1.1.5.0 Degraded
power supply failure SMS IPMI 172.28.1.x sdr ok means ok
fan failure mail,SMS IPMI 172.28.1.x sdr fan speed >0
Sun disk failure SMS SNMP 172.28.2.x .42.2.70.101.1.1.9.1.1.124 (+4) 01 ok, 00 missing, 09 not configured
disk failure SMS SNMP 172.28.2.x .42.2.70.101.1.1.9.1.1.1 0 ok, 1 one of the disk is unavailable
power supply failure SMS SNMP 172.28.2.x .42.2.70.101.1.1.2.1.3.112, 113,116,117 2 is ok
fan failure mail SNMP 172.28.2.x .42.2.70.101.1.1.2.1.3.29, 38,47,56,65 7 is ok

EFX power supply failure SMS SNMP 172.25.20.1 .26390.1.1.2.2.2 PSUGroup
fan failure SMS SNMP 172.125.1.1 .26390.1.1.2.2.3 operational
ATLAS.Internal.TRX-100 fan failure SMS SNMP 10.25.0.x .7244.2.101.1.1.106.1.3/4(1) ??
temperature mail SNMP 10.25.0.x 26390.9.1.101.1.1.106.1.2.1 °C
Node fan failure mail SNMP 172.26.x.y .2021.50.4.1.2.7.47.98.105.110.47.115.104.4-6 >1000
temperature mail SNMP 172.26.x.y .2021.50.4.1.2.7.47.98.105.110.47.115.104.2-3 <40
timestamp log SNMP 172.26.x.y .2021.50.4.1.2.7.47.98.105.110.47.115.104.1 < 3600
home mount mail SNMP 172.26.x.y .2021.50.4.1.2.7.47.98.105.110.47.115.104.7 ^0$
rsh ability mail SNMP 172.26.x.y .2021.50.4.1.2.7.47.98.105.110.47.115.104.8 ^0$
automount log SNMP 172.26.x.y .2021.2.1.5 1
diskspace log SNMP 172.26.x.y .2021.9.1.9 <91
swap log SNMP 172.26.x.y .2021.2021.4.100.0 0
ATLAS.Internal.UPS On battery status SMS SNMP 172.25.2.1 .705.1.7.3 # for status see also http://130.75.117.102/ups
battery temperature SMS SNMP 172.25.2.1 .705.1.8.1.0 # °C
ambient temperature SMS SNMP 172.25.2.1 .705.1.5.7.0 # 0.1°C
battery status mail SNMP 172.25.2.1 .705.1.5.2.0
Racks (LCP) air temperature mail SNMP 172.25.1.x .2606.4.2.4.5.2.1.5.13/15/17 # °C
LCP water temperature mail SNMP 172.25.1.x .2606.4.2.4.5.2.1.5.23 # °C
smoke SMS SNMP 172.25.1.x .2606.4.2.3.7.2.1.3.1 4 is no smoke, 5 is smoke

DocumentationForm edit

Title Cluster Monitoring
Description On this page various problems concerning the cluster are listed for monitoring.
Tags cluster monitoring
Category Admin
Topic revision: r48 - 10 Feb 2012, ArthurVarkentin
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback