How to use dsh and password less ssh in between the nodes It's very easy to get password less ssh working in between the nodes, you just need a proper ssh keypair...
GridKaHOWTO Follow the whole process with the same user and the same browser! If you use Firefox, please ensure to use version ESR version 60 or 68 a...
Planned downtimes This page summarizes planned/on going work within the Atlas cluster along with a few details. Usually, we will issue condor_off peaceful at lea...
This is the public information on the ATLAS cluster operated by the Max Planck Institute for Gravitational Physics (also named Albert Einstein Insititute, AEI) si...
Trying to get iPXE as the default method to netinstalls working (based on http://ipxe.org/howto/chainloading and https://doc.rogerwhittaker.org.uk/ipxe installati...
Rebuilding Debian's kernel (loosely following https://kernel team.pages.debian.net/kernel handbook/ch common tasks.html#s common official) pbuilder environment I...
First steps with spack Please note this all this was tested on a extremely minimally installed server. I.e. just installing something like doxygen can take a very...
Create a hybrid USB Image The goal is to create an image file which can be copied onto a USB stick and booted both via legacy BIOS as well as UEFI. This document ...
Simple ZFS ZVol testing creating baseline Create simple test data set in RAM: mkdir p /dev/shm/data for i in $(seq w 30); do dd if=/dev/null base64)" nosalt...
Detailed list of metrics we want to monitor Compute nodes (61) * CPU: user / nice / system / wait (4) * disk: * space available/free per locally defi...
Monitoring for Jessie and Beyond What do we want/need to monitor (metrics/checks) A list of non exhaustive metrics and checks we we need/would like to have, e.g....
Webserver serving user content If you need a webserver to serve content from your $HOME to the world, please create the directory ~/WWW on Atlas if it does not ex...
How to add a new host (salt era) This example will use einstein12 as a sample machine which before was known as ra15. Before you begin, you need to have ssh agent...
Cluster upgrade to Debian 8/Jessie We plan to use this page for keeping a record of where we are with respect to our full cluster upgrade to Debian Jessie. Curr...
FAI Jessie set up 1 base install via old fai jessie 1 base minimal config via salt 1 echo 'deb http://repo.atlas.local/reprepro fai contrib' /etc/apt/...
How to disable KM1/2 and use KM4 manually In order to disable KM1/2 and temporarily run with KM4 only, the following steps are needed (please monitor that each ch...
HSM file system check stats (last update: 2016 07 26T18:42Z) Planned steps (starting at 2016 07 26T11:00Z): 1 Issuing condor_hold to all jobs on all submit hos...
How to migrate from gitmaster to gitlab This document explains how to migrate away from gitmaster.atlas.aei.uni hannover.de to the new gitlab.aei.uni hannover.de:...
Aptly For the new LDG repo set , we are trying to use aptly as a potential successor to reprepro. The goals are: * support various Debian and Ubuntu releases ...
general questions are dual port 10GbaseT connectors possible Likely, will be clarified. are 40 GBits (QFSP ) possible Not clear yet. what is the meaning of a q...
Checking drive ordering between SL3000 and SAM Following Doc Id 1006246.1 to verify drive ordering matches between SL3000 and SAM/Solaris preparation 1 Shut d...
llldd * TCP connection from CIT (or sites) to special receiver machine (possibly need root access for John Zweizig) possibly SL6???? * from there UDP multic...
Condor Accounting Groups on Atlas In May 2015, LIGO introduced mandatory accounting groups for jobs running on the LIGO data grid (LDG). As Atlas is part of the L...
What is ATLAS? ATLAS is a general purpose compute cluster, located in the Albert Einstein Institute for Gravitational Physics, in Hannover Germany, on the campus ...
HTCondor configuration updates in 2015 (1) Using cgroups to softly enforce memory and core limits Reasoning In the past, we either relied on users' jobs to obey...
Benchmarking distributed file systems stupid fast tests first, all using small compute nodes and use iozone r 32 s $((2**24 2**25)) i 0 i 1 i 2 i 8 O I S...
Cheat sheet for rebooting E@H machines in Hannover If the E@H machines need to be rebooted (e.g. kernel upgrade) here's the proper ordering: Isolated machines Th...
SQLDump tests for Einstein@home On einstein db1 the following was found: # no compresion /usr/bin/mysqldump opt master data=2 EinsteinAtHome mbuffer /dev/nu...
This is just an unsorted, unfiltered list of current tasks and services all over the AEI (and beyond) which could be counted as SYSOP related. It is neither compl...
Common guide lines for cluster usage This document describes common pitfalls and guide lines when using a large computing cluster. Some of the details are specifi...
Configuration Management (primer/summary/brainstormer) What's out there? These are not really meant for configuration mgmt (alone) and have their strengths somew...
Distributed/clustered file systems This page should summarize what scenarios such file systems could fulfill within Atlas and what we expect from it. Properties s...
HSM Upgrade July 2014 This is the proposed plan, small changes may be needed Move from x4270 to x4 2l Moving to new meta data server should result in more file s...
Planned steps for Atlas Update to Wheezy Steps to perform: * get new head nodes up and running * reinstall all old nodes a0101...a3842, gpu001...gpu0XX ...
Directory hierarchy for LSC files Storage structure for S4/S5/S6 data (past) In the past we used paths like these H/H1/RDS/C03/L1/H H1_RDS_C03_L1 822092472 60.gw...
Short overview on different configuration possibilities for FC arrays Our DDN FC array currently houses 600 disks (400 3 TB drives and 200 2 TB drives). DDN only ...
Benchmarking bcache Evaluating if bcache can/should be used on our compute nodes. Benchmarking was performed with iozone (revision 3.397) with the command line io...