Condor, BOINC and dynamic slots

As Condor's dynamic slots don't yet mix with "fetch work", we build the system around a dedicated, cron based system running on a submit host. Setup itself is quite easy - once you get the idea

Prerequisites

You need to have a recent enough version of the BOINC client to understand all XML options used below (6.10.x should be sufficient). Each execute node should have the target directory (/local/boinc) pre-created (with proper permissions).

Wrapper script

Place this wrapper script under /local/boinc/boinc_starter.sh (and don't forget to make it executable)

#!/bin/bash

set -e

PATH=/bin:/usr/bin

##################################################################################
# start boinc from a schedd machine to fill the cluster (even with dynamic slots)
##################################################################################

# get slot from local machine ad:
SLOT=$(awk -F"[\"@]" '/^Name/ {print $2}' ${_CONDOR_MACHINE_AD})

# some useful variables
BOINCDIR=/local/boinc/${SLOT}
UID=boinc
GID=boinc

# create environment

# slot directory
if [ ! -d "${BOINCDIR}" ]; then
    mkdir -p "${BOINCDIR}"
fi

# account file
if [ ! -f "${BOINC_InitialDir}/account_einstein.phys.uwm.edu.xml" ]; then
cat > "${BOINC_InitialDir}/account_einstein.phys.uwm.edu.xml" <<EOF
<account>
    <master_url>http://einstein.phys.uwm.edu/</master_url>
    <authenticator>INSERT_YOUR_ACCOUNT_KEY_HERE</authenticator>
</account>
EOF
fi

# config file
if [ ! -f "${BOINC_InitialDir}/cc_config.xml" ]; then
    cat > "${BOINC_InitialDir}/cc_config.xml" <<EOF
<cc_config>
  <log_flags>
  </log_flags>
  <options>
    <report_results_immediately>1</report_results_immediately>
    <dont_contact_ref_site>1</dont_contact_ref_site>
    <ncpus>1</ncpus>
    <allow_multiple_clients>1</allow_multiple_clients>
    <allow_remote_gui_rpc>0</allow_remote_gui_rpc>
    <proxy_info>
      <no_proxy></no_proxy>
    </proxy_info>
  </options>
</cc_config>
EOF
fi

# start the beast
exec /usr/bin/nice -n +19 /usr/bin/boinc --dir ${BOINCDIR} \
     --update_prefs http://einstein.phys.uwm.edu/ --no_gui_rpc \
     >> ${BOINCDIR}/boinc.out 2>>  ${BOINCDIR}/boinc.err

on submit machine

Create a crontab entry for the proper user starting a script doing
  1. Check number of idle jobs for this user, e.g. condor_q -constraint "JobStatus =? 1" -format "%s\n" JobStatus carsten|wc -l=
  2. exit if this number exceeds a pre-defiend threshold
  3. submit more jobs into the pool to reach this threshold, e.g. write this as "standard in" to condor_submit:
Executable     = /local/boinc/boinc_starter.sh
Error   = /dev/null
Output  = /dev/null
Log = /dev/null
Universe = vanilla
request_memory=400
request_cpus=1
request_disk=50
nice_user = true
on_exit_remove = true
Queue NUMBER
where NUMBER is the difference between threshold and current number of idle jobs.

Sample script:
#!/bin/bash

set -e

# set up environment
PATH=/bin:/usr/bin
source /opt/condor/condor.sh

# max number of idle jobs in queue
IDLE_THRESHOLD=200

# current number of idle jobs in queue
IDLE_CUR=$(condor_q -constraint "JobStatus =?= 1" -format "%s\n" JobStatus carsten|wc -l)

# do we need to act?
DIFF=$(($IDLE_THRESHOLD - $IDLE_CUR))
if [[ $DIFF <= 0 ]]; then
  exit 0
fi

(
cat <<EOF
Executable     = /local/boinc/boinc_starter.sh
Error   = /dev/null
Output  = /dev/null
Log = /dev/null
Universe = vanilla
request_memory=400
request_cpus=1
request_disk=50
nice_user = true
on_exit_remove = true
Queue $DIFF
EOF 
) | condor_submit

-- CarstenAulbert - 01 Jul 2011

DocumentationForm edit

Title Boinc jobs with Condor and dyncamic slots
Description How to circumvent problems with fetch work and dynamic slots
Tags condor, dynamic slots, fetchwork
Category Admin
Topic revision: r2 - 10 Feb 2012, ArthurVarkentin
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback