Detailed list of metrics we want to monitor

Compute nodes (61)

  • CPU: user / nice / system / wait (4)
  • disk:
    • space available/free per locally defined file system (for both space/inodes) (16)
    • IO times/reads/writes/busy per local physical device (12)
  • GPU: load/mem usage (2)
  • system: one minute load, number of processes (2)
  • memory: used/free/buffer (3)
  • swap: used/free (if applicable) (2)
  • network:
    • received/sent (packets or bytes) per logical device (4)
    • errors per physical device (8)
  • local condor usage (resources used/free) (8)

-- CarstenAulbert - 11 Sep 2017
Topic revision: r1 - 11 Sep 2017, CarstenAulbert
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback