Planned downtimes
This page summarizes planned/on-going work within the Atlas cluster along with a few details. Usually, we will issue
condor_off -peaceful
at least 12h in advance, which means, no new jobs will be scheduled to that node and currently running jobs have another 12h to finish. However, after this, we usually need to kill the jobs on the machines to start with our work.
If you want to cross-check which nodes may have been affected, we use a simple scheme denoting the location of a node within the cluster room, e.g. if you see the IP address
10.10.7.18
in your log files, this refers to the node named
a0718
which is a server in rack 7 in height unit 18.
date |
racks/nodes |
status/notes |
2019-07-18/19 |
racks 5&6 |
done |
2019-07-25 |
racks 1&2 |
done |
2019-07-26 |
racks 3&4 |
done |
2019-08-02 |
racks 7&8 |
done |
2019-08-05 |
rack 9 |
done |
2019-08-06 |
rack 10&11 |
done |
2019-08-07 |
rack 12 (caution: special servers in rack 12) |
done |
2019-08-08 |
rack 17 |
done |
2019-08-09 |
rack 18 (caution: special servers in rack 18) |
done |
2019-08-13 |
rack 19 |
done |
2019-08-15 |
rack 20 |
done |
2019-08-16 |
rack 21 |
done |
2019-08-19 |
rack 22 (caution: special servers in rack 22) |
done |
2019-08-19 |
rack 23 |
done |
2019-08-23 |
rack 24 |
done |
2019-08-23 |
rack 27 |
done |
2019-08-23 |
rack 28 |
done |
--
SvenRenas - 24 Aug 2019