AddNewUser How to add a new user All this is now done by atlas_adduser.pl from our git repo! The remaining stuff here is done for the fictitious user foo BAR, an...
Atlas Boinc Condor Scheduling As Condor's fetchwork does not seem to work with dynamic slots, we are working on our own "scheduling" system for BOINC Initial tho...
Measured performance with this command line bonnie s 32768 d /path/to/dir u 1000 The results are # data server locally store03,32G,76714,99,358710,47,160723,3...
Category:Kernel Category:Network Finally the Channelbonding Documentation has been rewritten. I hope its new structure helps answering the questions! Some illustr...
Condor Condor High Throughput Computing System is a software framework for managing workload on a cluster of computers. It is some kind of batch system, that di...
With 1340 nodes it might be wise to split services among many boxes to ensure that not half of the cluster is waiting to a single server to serve data through its...
Distributed/clustered file systems This page should summarize what scenarios such file systems could fulfill within Atlas and what we expect from it. Properties s...
Category:fai FAI is short for "Fully Automatic Installation". Installation List of important Files * /etc/fai/make fai nfsroot.conf add location of the Image...
what happens before the FAI installation * client * server * provides * * admin action * NICs do a DHCP request (BIOS default ) DHCP server IP addre...
Faimond faimond catches installation messages on port 4711 sent by the clients. The clients use natcat deliver the current status of the installation tasks like...
FirstTest LV storage The following items need to be checked for compute nodes (please add your name to the test you have performed). If you need much more space p...
For Users * General Introduction for Users * Useful Items * How ATLAS stores files * ErrorMessages and how to fix them (not updated) General Document...
Problems with h2 h2 is the machine which was up first and thus users like it and feel homely there; h1 came later and takes a bit of the load, but the main load i...
HSM file system check stats (last update: 2016 07 26T18:42Z) Planned steps (starting at 2016 07 26T11:00Z): 1 Issuing condor_hold to all jobs on all submit hos...
HSM upgrade Current status 2014 02 01T12:33Z: samfsdump/final backup stopped due to too many non archived files. Rushing to archive those as fast as possible. 20...
IP Scheme The goal is to create a scheme which eases daily life. IP mapping should be more or less straightforward and extensible. For servers look at: * indi...
Atlas basic usage guide First things first Be nice to others, others should be nice to you as well :) Please read this aloud: I will be nice to other users, and ...
jumpstart A jumpstart server provides the information and configuration used to install other nodes. It serves tftp requests to provide a kernel and installatio...
Jumpstart Solaris How To clone our Solaris Sun boxes: Create flash archive flarcreate n "s01 flash" c R / x /atlashome /atlashome/carsten/s01.flar Our conf...
A introduction to Channelbonding can be found here. Description Why? Round robin is the only way to get more than the speed of a single interface for a single T...
We have a pxe bootable live system to examine a node without touching the system on the harddrive. It is basically a self made chroot environment. usage * st...
Local scratch on nodes local partitions Storing data locally on the nodes is possible everywhere. Please remember you are free to log into any node manually (rsh...
Logcheck mail locations, related scripts and other mail locations on postfix server Log mail location on postfixserver logadmin account 1. Normal logcheck mail...
Netboot This is a simple description how to boot over a network using kernel on the remote server. Server side configuration To proivde net boot capabilities, yo...
NodeTests Tasks to do: Initial work (HP) Manual work * Blank disk of node, wipe by: dd if=/dev/zero of=/dev/sda; sync * Put MAC address into DHCP table o...
How to rescue data from a broken disk If a disk is "only" throwing errors, but is not entirely dead yet, dd might help, but can cause a lot of grief. This recipe ...
Softupdate Softupdate runs through the fai installation and performs all the changes which have been made after the installation process. On the client side f...
Category:Network Transmission Control Protocol (TCP) vistit wikipedia on TCP . Congestion control Is used to optimize the senders behaviour to the current netwo...
For Squeeze all need to be performed in a chroot, i.e. run cowbuilder update basepath /var/cache/pbuilder/base.squeeze.amd64.cow/ # prepare environment apt ...
Where shall I put my (Condor) log files? As always, the correct answer is: It depends. You can put your log files in your home, e.g. assuming your user name is MY...
Sun Fire X4500 The Sun Fire X4500 Servers (nick named "Thumper") are the heart of our $HOME. All user data reside on these boxes. To ensure that there are as few ...
Performance problems with X4500 using ZFS with NFSv3 When moving a large number of small files onto our X4500 boxes we found very bad performance numbers: Job ...