How to use dsh and password less ssh in between the nodes It's very easy to get password less ssh working in between the nodes, you just need a proper ssh keypair...
GridKaHOWTO Follow the whole process with the same user and the same browser! If you use Firefox, please ensure to use version ESR version 60 or 68 a...
Planned downtimes This page summarizes planned/on going work within the Atlas cluster along with a few details. Usually, we will issue condor_off peaceful at lea...
This is the public information on the ATLAS cluster operated by the Max Planck Institute for Gravitational Physics (also named Albert Einstein Insititute, AEI) si...
Trying to get iPXE as the default method to netinstalls working (based on http://ipxe.org/howto/chainloading and https://doc.rogerwhittaker.org.uk/ipxe installati...
Rebuilding Debian's kernel (loosely following https://kernel team.pages.debian.net/kernel handbook/ch common tasks.html#s common official) pbuilder environment I...
First steps with spack Please note this all this was tested on a extremely minimally installed server. I.e. just installing something like doxygen can take a very...
Create a hybrid USB Image The goal is to create an image file which can be copied onto a USB stick and booted both via legacy BIOS as well as UEFI. This document ...
Simple ZFS ZVol testing creating baseline Create simple test data set in RAM: mkdir p /dev/shm/data for i in $(seq w 30); do dd if=/dev/null base64)" nosalt...
Detailed list of metrics we want to monitor Compute nodes (61) * CPU: user / nice / system / wait (4) * disk: * space available/free per locally defi...
Monitoring for Jessie and Beyond What do we want/need to monitor (metrics/checks) A list of non exhaustive metrics and checks we we need/would like to have, e.g....
Webserver serving user content If you need a webserver to serve content from your $HOME to the world, please create the directory ~/WWW on Atlas if it does not ex...
How to add a new host (salt era) This example will use einstein12 as a sample machine which before was known as ra15. Before you begin, you need to have ssh agent...
Cluster upgrade to Debian 8/Jessie We plan to use this page for keeping a record of where we are with respect to our full cluster upgrade to Debian Jessie. Curr...
FAI Jessie set up 1 base install via old fai jessie 1 base minimal config via salt 1 echo 'deb http://repo.atlas.local/reprepro fai contrib' /etc/apt/...
How to disable KM1/2 and use KM4 manually In order to disable KM1/2 and temporarily run with KM4 only, the following steps are needed (please monitor that each ch...
HSM file system check stats (last update: 2016 07 26T18:42Z) Planned steps (starting at 2016 07 26T11:00Z): 1 Issuing condor_hold to all jobs on all submit hos...
How to migrate from gitmaster to gitlab This document explains how to migrate away from gitmaster.atlas.aei.uni hannover.de to the new gitlab.aei.uni hannover.de:...
Aptly For the new LDG repo set , we are trying to use aptly as a potential successor to reprepro. The goals are: * support various Debian and Ubuntu releases ...
general questions are dual port 10GbaseT connectors possible Likely, will be clarified. are 40 GBits (QFSP ) possible Not clear yet. what is the meaning of a q...
Checking drive ordering between SL3000 and SAM Following Doc Id 1006246.1 to verify drive ordering matches between SL3000 and SAM/Solaris preparation 1 Shut d...
llldd * TCP connection from CIT (or sites) to special receiver machine (possibly need root access for John Zweizig) possibly SL6???? * from there UDP multic...
Condor Accounting Groups on Atlas In May 2015, LIGO introduced mandatory accounting groups for jobs running on the LIGO data grid (LDG). As Atlas is part of the L...
What is ATLAS? ATLAS is a general purpose compute cluster, located in the Albert Einstein Institute for Gravitational Physics, in Hannover Germany, on the campus ...
HTCondor configuration updates in 2015 (1) Using cgroups to softly enforce memory and core limits Reasoning In the past, we either relied on users' jobs to obey...
Benchmarking distributed file systems stupid fast tests first, all using small compute nodes and use iozone r 32 s $((2**24 2**25)) i 0 i 1 i 2 i 8 O I S...
Cheat sheet for rebooting E@H machines in Hannover If the E@H machines need to be rebooted (e.g. kernel upgrade) here's the proper ordering: Isolated machines Th...
SQLDump tests for Einstein@home On einstein db1 the following was found: # no compresion /usr/bin/mysqldump opt master data=2 EinsteinAtHome mbuffer /dev/nu...
This is just an unsorted, unfiltered list of current tasks and services all over the AEI (and beyond) which could be counted as SYSOP related. It is neither compl...
Common guide lines for cluster usage This document describes common pitfalls and guide lines when using a large computing cluster. Some of the details are specifi...
Configuration Management (primer/summary/brainstormer) What's out there? These are not really meant for configuration mgmt (alone) and have their strengths somew...
Distributed/clustered file systems This page should summarize what scenarios such file systems could fulfill within Atlas and what we expect from it. Properties s...
HSM Upgrade July 2014 This is the proposed plan, small changes may be needed Move from x4270 to x4 2l Moving to new meta data server should result in more file s...
Planned steps for Atlas Update to Wheezy Steps to perform: * get new head nodes up and running * reinstall all old nodes a0101...a3842, gpu001...gpu0XX ...
Directory hierarchy for LSC files Storage structure for S4/S5/S6 data (past) In the past we used paths like these H/H1/RDS/C03/L1/H H1_RDS_C03_L1 822092472 60.gw...
Short overview on different configuration possibilities for FC arrays Our DDN FC array currently houses 600 disks (400 3 TB drives and 200 2 TB drives). DDN only ...
Benchmarking bcache Evaluating if bcache can/should be used on our compute nodes. Benchmarking was performed with iozone (revision 3.397) with the command line io...
basic configuration the default baud rate is 9600 first steps Use the serial console with a baud rate of 9600 and do system view interface Route Aggregation 1...
Create Service Data File for 6780 In case a hard drive is about to fail (or has already failed), Oracle support needs one special file collection, to create this,...
Comparing efficiency and size of compressors With samfs and archive dumps being rather large, we needed a good compression scheme for these. We use a 101GB (1033...
HSM upgrade Current status 2014 02 01T12:33Z: samfsdump/final backup stopped due to too many non archived files. Rushing to archive those as fast as possible. 20...
First steps with Solaris 11 I'm using the old x4440 machine, booting off a solaris 11 CD based on sol 11_1 text x86.iso. You must exit the grub countdown with ESC...
Einstein@Home machine head count Please check this list to make migration and finding/cabling machines easier (racks used for these machines should be 79 (water c...
Building an atlas Kernel This assumes that a stable linux kernel source tree already exists on a machine (bob, /srv/kernel/linux stable), cloned via git clone git...
Rack layout which racks contained what and when Basic information * rack rows are numbered 1 to 10 * Water cooled racks are numbered 1 to 102 * open ...
New networking set up for Atlas From 2008 till 2013 we used a flat networking structure, i.e. all computers on the data network were connected "directly" to the c...
For Users * General Introduction for Users * Useful Items * How ATLAS stores files * ErrorMessages and how to fix them (not updated) General Document...
Category:Network Category:WovenSystems This Page is about our ATLAS.CoreSwitch. You might also want to read the TRX page. Configuration access serial not wor...
ATLAS Web Preferences The following settings are web preferences of the ATLAS web. These preferences overwrite the site level preferences in . and , and c...
Using cgroups to push backfill into the background apt get update apt get y install boinc client libboinc app7 cgroup bin rsync a n0669:/etc/default/boinc clien...
LINDY COMPower Switch LITE 8 Main.AlexPost 11 May 2009: This document describes the use of the LINDY 8port Power Strip 32453. The easiest way to access the po...
Create benchmark image with Debian live build apt get install live build lb config archives live.debian.net mkdir /srv/live default cd /srv/live default lb ...
Work planned for cluster shutdown on 2013 01 15 shutdown plan The following services will be shut down 1 all compute nodes possibly with the exception of "r...
Einstein@home RAID setup testing Over time the einstein abp1 download server (mostly BRP project) went through various updates to increase download throughput (ap...
Central benchmark page All our benchmark results should be linked from this page to help rediscovering already performed benchmarks. Ideally, a summary page will ...
1.: Uns wurde gesagt, dass zu wartende Teile eine Gewährleistungsdauer von 2 Jahre habe. Nicht zu wartende Teile haben eine Gewährleistungsdauer von 5 Jahren. Dur...
AddNewUser How to add a new user All this is now done by atlas_adduser.pl from our git repo! The remaining stuff here is done for the fictitious user foo BAR, an...
ZFS Send/Receive Performance Testing Since we want to backup and move users' home file systems regularly between Thumpers we want to have as much speed as poss...
Testing different zpool layouts on Thumper Local testing with iozone Essentially the tests were made with a single iozone run per file system, zfs compression wa...
Iozone on Areca server Command line iozone a g 16G O n 32K y 32 q 32 i 0 i 2 Disk layout The disk layout for the four test was in RAID1 or RAID10 mode o...
Myrycom Benchmarking We received two myricom NICs for evaluation 03:00.0 Ethernet controller: MYRICOM Inc. Myri 10G Dual Protocol NIC (10G PCIE 8A) These netperf ...
Measured performance with this command line bonnie s 32768 d /path/to/dir u 1000 The results are # data server locally store03,32G,76714,99,358710,47,160723,3...
Testing different file systems on Atlas compute nodes We are using the attached ffsb profiles for testing, these are just a first shot at the problem, but might p...
Atlas Boinc Condor Scheduling As Condor's fetchwork does not seem to work with dynamic slots, we are working on our own "scheduling" system for BOINC Initial tho...
Atlas basic usage guide First things first Be nice to others, others should be nice to you as well :) Please read this aloud: I will be nice to other users, and ...
Move from Debian Lenny to Debian Squeeze Changes * updated packages from upstream Debian * Condor 7.6 with dynamic slots on most execute machines (exceptio...
Small tour of Atlas Atlas is a computer cluster situated in the basement of a university building near the Max Planck institute in Hannover. Since the ceiling is ...
Atlas Compute Node 2008 These Supermicro based machines were bought from Pyramid in 2008. Typical host names n0001 n1680 Spec table Chassis SC811T 300B (1...
Atlas Compute nodes Compute Node 2008 In 2008 we bought 1680 Supermicro based machines from Pyramid in 2008, getting a total of 6720 2.4GHz compute cores, 13TB R...
For Squeeze all need to be performed in a chroot, i.e. run cowbuilder update basepath /var/cache/pbuilder/base.squeeze.amd64.cow/ # prepare environment apt ...
Shutdown priorities The following list puts priorities on computers, equipment and other items of interest. Computers in racks, which will stay powered up, should...
Where shall I put my (Condor) log files? As always, the correct answer is: It depends. You can put your log files in your home, e.g. assuming your user name is MY...
Windows Cisco VPN Client How to connect to the 10.117.0.0 network of Max Plank Institute with VPN The Cicso VPN Client with the pcf file calls AEI 10NET.pfc can ...
Repair a xfs filesystem A XFS file system can become corrupted due to a power cut, a kernel bug or something else. To repair it, please use a recent version of xf...
Virtualbox and USB on gutsy Getting USB support working in virtualbox under Ubuntu gutsy 0) for general important info for virtualbox on gutsy, see FAQ: #91;31 ...
How to verify a S/MIME signed email (X.509) * save the email into a file * check that the certificate authority for this sender is known in your system, e.g...
Video capture box How to use the video capture box we got from Golm: Some good documenation how to use VLC for capturing multiple devices and creating a mosaic wi...
Ssh password less Three steps to passless ssh (without ssh agent) ... 1 Use ssh keygen to generate the key pair ssh keygen t rsa ! do NOT use passphrase, ju...
Suse Cisco VPN Installing Cisco VPN Client in Suse If you want to connect with your suse to an VPN network, it's very simple. Now we install an VPN Client and ma...
In order to convert existent SVN repositories to git there are a few simple steps to take. First of all you'll need the following tools (if not already available)...
* install netsnmp pkg get i netsmp * create a configuration file /opt/csw/share/snmp/snmpd.conf rocommunity public disk /atlashome 5% load 12 6 3 syslocati...
How to mount your Atlas home on your computer Using sshfs it is quite simple but could be a little bit on the slower side of life. Prerequisites: 1. Make sure...
How to send automatic SMS messages (via host postfix) Create a temporary file which looks like this: To: 49123456789 This is the SMS text and then move the fil...
How to read failure information on Solaris and how to "repair" old faults. Using fmdump one can look at the past error logs, while fmadm faulty will display fault...
Superseed sendmail with postfix on Solaris 10 1. update blastwave packages: pkgutil u 1. install postfix: pkgutil i postfix 1. disable sendmail: scvadm...
Restart GEOSegDB if service does not work anymore * log into geosegdb either as root/suser/ldbd (x509 certs) * assume ldbd identity * run db2start * a...
Some things to be aware of when running Octave scripts on ATLAS: Octave path searching When an Octave script calls a non built in function, Octave will look thro...
How to export/import tapes from SL3000 Export tapes Explanatory task: tapes from the secondary copy of a file system should be exported * Log into metadata se...
How to recover a file you accidentally deleted? If you $HOME is on a ZFS file system 1 you can easily retrieve files you deleted from an automatic snapshot creat...
This basic tutorial describes the correct usage of the Redmine bugtracker. After the login you will be redirected to your own page which can be customized by usig...
How to rescue data from a broken disk If a disk is "only" throwing errors, but is not entirely dead yet, dd might help, but can cause a lot of grief. This recipe ...
How to protect a web page with .htaccess If you happen to need to protect a web page but not only for LSC usage, you need to perform the following steps: * Cre...
Collect evidence data for an Oracle case (HSM) Based on based experience we should perform the following before restarting samd or even rebooting the machine: ...
using iptables A gateway with two network cards conntects a LAN with WAN. This document describes, how to forward ports of nodes in the LAN to ports of the gatewa...
Running compiled Matlab under Condor Problems Multiple users on the same node When running compiled Matlab codes under Condor, you need to be aware that the def...
Netboot This is a simple description how to boot over a network using kernel on the remote server. Server side configuration To proivde net boot capabilities, yo...
What is LVM ? Note: This may answer the question, why editing the /etc config files does nothing. If you don't have a backup, you can re create the equivalent of ...
Dangers when sourcing the Matlab runtime environment Problem When sourcing the Matlab runtime environment (currently /opt/matlab/2008a/MCR/MCRSetup_R2008a_glnxa6...
Problems with X11 when connecting with MacOS X to headnodes If you experience problems running X programs, e.g. Matlab (even in command line mode), you might be h...
Local ssh configuration You can create shortcuts as well as special settings for hosts you want to connect to in the file $HOME/.ssh/config. The following lines c...
Using ligo_data_find (successor to LSCdataFind) The command line arguments are basically the same as before with LSCdataFind. One new feature is the P or no pro...
We have a pxe bootable live system to examine a node without touching the system on the harddrive. It is basically a self made chroot environment. usage * st...
Jumpstart Solaris How To clone our Solaris Sun boxes: Create flash archive flarcreate n "s01 flash" c R / x /atlashome /atlashome/carsten/s01.flar Our conf...
Keep your X.509 certificate alive after logging out of the cluster On some occasions one needs a valid grid proxy to access data, query the file database using LS...
HOWTO: Guideline to repair an offline computer Evaluate current status (remotely) 1. Try to log in via data network 1. Try to log in via mgmt network 1....
HOWTO Update ILOM Firmware cf. ILOM Howto Update ILOM Firmware * Log in in ILOM CLI and type "version" to check * Download new Firmware versions http://w...
Main.HenningFehrmann 24 Jul 2008 abstract This page contains detected symptoms and the corresponding hardware problems. It is based on experiences. See the list ...
ATLAS Hardware Resources And Photo Gallery This is about the hardware resources that ATLAS is based on and related photos Computer node There are 1680 Computer n...
Why do I get an openssl related problem when logging into a machine? If you see something like this: GSSAPI Error: GSS Major Status: Authentication Failed GSS Min...
How to use gridftp In general the remote site should be running a gsiftp server which will accept your user credentials (X.509 grid certificate). The full syntax ...
How to monitor the current usage of LSC clusters or Atlas? There are two nice web pages summarizing these information. First of all there is watchtower which is a...
What is fdisk? fdisk is a command line partition manager available for a lot of platforms. Usage general fdisk l Lists the partition table fdisk /dev/...
How to fix a Solaris boot archive After an update it might happen, that the bootarchive is invalid/damaged. To fix this, boot into the failsafe mode (usually seco...
How to create a pulling foswiki set up with reprepro Assuming you are already in the location where the repository shall live, perform these magic steps: mkdir co...
This explains how to get FreeDOS working with TCP/IP and ssh. The Problem Sometimes, it is necessary to flash the BIOS, the IPMI card or to set the BIOS. For som...
RFC: Classes list for FAI/Lenny The Etch installation scheme with our FAI server used one class for each type of node (NODE_COMPUTE, STORAGE, ...) but this is som...
Getting VDT up and running on Debian Squeeze Problem description As of 2010 03 12 there is a new openssl version in squeeze which is somewhat incompatible with t...
Debootstrap It is sufficient to read the man page. To debootstrap from local deb mirror: debootstrap distribution dest. directory http://192.168.0.1:9999/deb...
This documentation will briefly summarize how to build packages. At the time of writing this, we focused on packages for Debian etch and lenny in i386 and amd64 v...
This is a short explanation of how to create debian source package. Assume you want to build package x.y.tar.gz. 1. Rename that one to package x.y.orig.tar.gz. 2....
Brief Condor dagman HowTo This small article will not get into any depth what DAGs are and will also not explain the terminology of Condor's dagman, for this plea...
Conserver Conserver is a server/client program enabling to pool console connections. The upstream page is at http://www.conserver.com/, in Debian the packages are...
How to create a Debian package? Tutorials: * Debian packaging tutorial by Lars Wirzenius * Debian packaging tutorial * Debian New Maintainer Guide Bad st...
Condor, BOINC and dynamic slots As Condor's dynamic slots don't yet mix with "fetch work", we build the system around a dedicated, cron based system running on a ...
ClusterMonitoring Description We need to monitor frequently various values in the cluster. See Ticket here: https://n0.aei.uni hannover.de/tracking/issues/show/...
Compiling tempo2 on Ubuntu AMD64 The pgplot packages shipped with ubuntu are not consistently compiled using fPIC, which results in an unusable cpgplot wrapper l...
Clean the local scratch space Sometimes you want to get rid off all temporary stuff you left on any compute node (e.g. you just finished a big search, all results...
Cleaning your $HOME Quick steps for those who have no time to spare 1. Locate directory which you don't need right now but which produced a lot of small Condo...
How to create a build farm. Installation/Preparation Install the necessary base packages sudo aptitude install pbuilder cowdancer reprepro rebuildd * pbuild...
How to build Matlab Before Installation You need: matlab installation cd ( or iso image ), pbuilder, installation key, license file to activate the product, ping...
updating the DB If you want to add names to the db (e.g. /etc/bind9/atlas.local.db), please make sure to follow these steps: (1) Increase serial number by using t...
Packaging on Bob This page documents how to use git buildpackage to build packages local and on the server. Quick recipe (only builds on the server) * insta...
Badblocks on XFS Essentially the same as for ext 23 as shown on the smartmontools homepage. Please follow this example: smartctl reports bad sector # smartctl ...
Using Badblocks for testing hdds on the nodes one normally uses $ badblocks v p10 /dev/sdb which checks the disk. if it makes 10 clean runs it will exit. but if...
How to backup (and restore) a SVN repository Backup First create a hotcopy somewhere where enough free space is available: mkdir /tmp/hotcopy svnadmin hotcopy ...
Automatic environment setup Introduction After the upgrade to Debian Lenny, we finally offer the psosiblity to automatically set up your environment. This is bei...
Automating telnet Here a quick and dirty bash script, showing how you could automate some configuration tasks to be done in a menu based telnet session. E.g. to d...
Apache Limit access to certain directories: Add AllowOverride AuthConfig in your config section in httpd.conf or where you define your servers. In the limit...
Building the GSTLAL Dependency Package for Lenny Preperation phase Start in a fresh location by issuing the following commands: mkdir p gstlal/gst deps 1.0/pack...
Rack numbering We only count the cooled rack space with numbers, the open racks are "counted" with letters. Please look at the following diagram: Compute Nodes T...