Running the Pulsar Timing Code on Xeon Phi (Aug 2015)

Intel's Xeon Phi: http://www.intel.de/content/www/de/de/processors/xeon/xeon-phi-detail.html

Status plot of cards in xeonphi02: https://www.atlas.aei.uni-hannover.de/~fehrmann/MicLoad/micload.png

xeonphi02: mic0, mic1, mic2, ..., mic7

Note: The directory /local/user/$USER/ on xeonphi02 is available on mic0 at /home/$USER/

Optional: Modify your ~/.bash_profile to include

if \[\[ `hostname -s` = xeonphi* \]\]; then
echo "Recognized xeonphi host! Sourcing intel compiler."
source /opt/intel/2015/intel.sh
source /opt/intel/2015/composer_xe_2015.0.090/bin/compilervars.sh intel64
source /opt/intel/2015/impi_5.0.1/bin64/mpivars.sh
export DAPL_DBG_TYPE=0
export I_MPI_MIC=1
fi



git clone git@gitmaster.atlas.aei.uni-hannover.de:gamma-ray-project/fgrptiming.git fgrptiming_mic

cd fgrptiming_mic

git checkout xeonphi

ssh xeonphi02

./build.sh --linux-mic

cd testing/J1035

./run_J1035-6720_mics.sh

Mount Atlas via sshfs

Then:
python triplot_iamcmc.py --infile /Users/holger/sshfs/atlasB/FermiLAT/fgrptiming_mic/testing/J1035/PSR_J1035-6720_v19.dat --outfile /Users/holger/sshfs/atlasB/FermiLAT/fgrptiming_mic/testing/J1035/tripl_PSR_J1035-6720_v19.dat --parfile /Users/holger/sshfs/atlasB/FermiLAT/fgrptiming_mic/testing/J1035/PSR_J1035-6720_v19.dat.par --phaseplots


Xeon Phi Computing examples (Jan 2015)

Here are two example codes showing how to perform parallel computing with the Xeon Phi (MIC).

Example of offloading to the Xeon Phi using OpenMP: test-openmp.c

Example using Intel MPI to parallelise the computing: test-mpi.c

First, login to atlas8 and do the following:

source /opt/intel/2015/intel.sh
source /opt/intel/2015/impi_latest/intel64/bin/mpivars.sh
source /opt/intel/2015/composer_xe_2015/bin/compilervars.sh intel64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64

To compile the OpenMP example: (remove -openmp tag to compare to running normally)
icc -openmp test-openmp.c -o test-openmp

This code simply "offloads" the expensive for-loop to the Xeon Phi using shared memory, and OpenMP handles the parallelisation. It's necessary to take care of all pointers and arrays which need to be passed to the Xeon Phi using e.g. in(pointer:length(array_length)) in the offload pragma. Any variables which are iterated over in the loop should be declared "private" to prevent threads from interfering with one another.

The MPI example can be run in a few different modes: running only on the head-node, only on the Xeon Phi, or "symmetrically" on both at once. Using the Xeon Phi like this requires you to be able to access the MIC using passwordless SSH (ask Henning!)

First, compile for both the headnode and the Xeon Phi separately with: #HeadNode only:
mpiicc -lmpi test-mpi.c -o test-mpi.host

#Xeon Phi version:
mpiicc -mmic -lmpi test-mpi.c -o test-mpi.mic

#The mic version must be copied to the Xeon Phi, using e.g.:
cp test-mpi.mic /local/user/fermi/ You can then run them using (-n X specifies X threads on that host): #Headnode only:
mpirun -n 6 ./test-mpi.host

#XeonPhi only:
mpirun -host fermi@192.168.1.1 -n 240 /home/fermi/test-mpi.mic

#Both symmetrically:
mpirun -host atlas8.atlas.local -n 6 ./test-mpi.host : -host fermi@192.168.1.1 -n 240 /home/fermi/test-mpi.mic

Using MPI means that the code runs separately on each thread (which keeps its own copy of variables etc. in memory) which only communicate at certain points when you tell it to (in this case only with MPI_Reduce() which gathers every thread's calculation and sums them). It seems like this method is a bit more cumbersome, but gives you a huge amount of control over exactly what is computed on each node.
Topic revision: r6 - 28 Aug 2015, LarsNieder
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback