Setting up XeonPhi (MIC) on hosts
required kernel
A very limited number of kernel is being supported. Usually Intel
supports the the kernel of some RHEL and SLES versions. The MIC driver
only compile on these particular kernels. At the moment of writing this
documentations it is the kernel v 3.10.
required software
One has to download the MIC bundle. See
https://software.intel.com/de-de/mic-developer/tools-and-downloads
with alien one can convert the RPM und to DEB packages. The required
packages are (including native packages):
- pciutils
- linux-image-3.10.65-atlas
- linux-headers-3.10.65-atlas
- glibc2.12.2pkg-libmicaccesssdk-dev
- glibc2.12.2pkg-libmicaccesssdk0
- glibc2.12.2pkg-libodmdebug-dev
- glibc2.12.2pkg-libodmdebug0
- glibc2.12.2pkg-libsettings-dev
- glibc2.12.2pkg-libsettings0
- glibc2.12.2pkg-mpss-flash
- glibc2.12.2pkg-mpss-memdiag-kernel
- glibc2.12.2pkg-mpss-rasmm-kernel
- mpss-boot-files
- mpss-coi
- mpss-coi-dev
- mpss-coi-doc
- mpss-coi-staticdev
- mpss-core
- mpss-core-dev
- mpss-daemon
- mpss-daemon-dev
- mpss-eclipse-cdt-mpm
- mpss-hstreams
- mpss-hstreams-dev
- mpss-hstreams-doc
- mpss-license
- mpss-miccheck
- mpss-miccheck-bin
- mpss-micmgmt
- mpss-micmgmt-doc
- mpss-micmgmt-python
- mpss-micsmc-gui
- mpss-mpm
- mpss-mpm-doc
- mpss-myo
- mpss-myo-dev
- mpss-myo-doc
- mpss-offload
- mpss-offload-dev
- mpss-sciftutorials
- mpss-sciftutorials-doc
- mpss-sdk-k1om
- mpss-sysmgmt-micdiagnostic
- mpss-sysmgmt-micras
- mpss-modules-3.4.2-atlas
- mpss-sysmgmt-python
- libscif0
- intel-composerxe-compat-k1om
- bridge-utils *
configuration
One possibility of booting the MICs is using a NFS
RootFS. The location is "/var/mpss"
There is a "/var/mpss/common" directoiry which contains everything which should
appear in the root file system. It should contain all libraries required to
run
OpenMP and
OpenMPI and the MKL library. It also contains some self compiled tools like "screen" and "htop".
The libraries are available on the Intel side and require a comercial licence.
Generate the "/etc/mpss/default.conf" file:
Version 1 1
CommonDir /var/mpss/common
ExtraCommandLine "highres=off"
Console "hvc0"
ShutdownTimeout 300
CrashDump /var/crash/mic 16
OSimage /usr/share/mpss/boot/bzImage-knightscorner /usr/share/mpss/boot/System.map-knightscorner
BootOnStart Enabled
Base CPIO /usr/share/mpss/boot/initramfs-knightscorner.cpio.gz
MacAddrs Serial
PowerManagement cpufreq_on;corec6_on;pc3_on;pc6_on
Cgroup memory=disabled
Bridge br0 External 192.168.0.254 16 64512
The NFS export file can be generated by (replace mic0 by corresponding MIC number devices)
(micctrl --updatenfs mic0)
micctrl --initdefaults mic0
micctrl --network=static --bridge=br0 --ip=192.168.0.100 --mtu=64512 mic0
micctrl --rootdev=NFS -c -t /var/mpss/MIC0NFS -s 192.168.0.254 mic0
micctrl --addnfs=192.168.0.254:/local/user --dir=/home mic0
Here the MIC IP is 192.168.0.100. The host ip is missing and has to be manually placed into the "/etc/mpss/mic0.conf" file.
add
hostip=192.168.0.254
in the
Network class=StaticBridge bridge=br0 micip=192.168.0.100 modhost=yes modcard=yes
line.
NFS export
Add into "/etc/exports"
/local/user 192.168.0.0/255.255.255.0(async,rw,no_subtree_check,no_root_squash,insecure)
/var/mpss/MIC0NFS 192.168.0.0/255.255.255.0(async,rw,no_subtree_check,no_root_squash,insecure)
/var/mpss/MIC1NFS 192.168.0.0/255.255.255.0(async,rw,no_subtree_check,no_root_squash,insecure)
/var/mpss/MIC2NFS 192.168.0.0/255.255.255.0(async,rw,no_subtree_check,no_root_squash,insecure)
/var/mpss/MIC3NFS 192.168.0.0/255.255.255.0(async,rw,no_subtree_check,no_root_squash,insecure)
/var/mpss/MIC4NFS 192.168.0.0/255.255.255.0(async,rw,no_subtree_check,no_root_squash,insecure)
/var/mpss/MIC5NFS 192.168.0.0/255.255.255.0(async,rw,no_subtree_check,no_root_squash,insecure)
/var/mpss/MIC6NFS 192.168.0.0/255.255.255.0(async,rw,no_subtree_check,no_root_squash,insecure)
/var/mpss/MIC7NFS 192.168.0.0/255.255.255.0(async,rw,no_subtree_check,no_root_squash,insecure)
if you have, e.g., 8 MICs.
small configurations
Udev rules
create a file "/opt/intel/2015/mkl# cat /etc/udev/rules.d/50-udev-mic.rules":
KERNEL=="scif", ACTION=="add", NAME="mic/%k",MODE="0666", RUN+="/bin/chmod og+x /dev/mic"
KERNEL=="ctrl", ACTION=="add", NAME="mic/%k", MODE="0666"
library comparability
Add "/usr/lib64" in the LD_LIBRARY_PATH. The MIC world is designed to work on RH derivatives.
/etc/passwd
The "/etc/passwd" file is being copied from the host. Make sure that all "/usr/bin/zsh" entries are replaced
by "/bin/bash" or compile "zsh" for the mics.
--+++ systemd
If you have systemd running create the "/lib/systemd/system/mpss.service" file:
[Unit]
Description=Intel(R) MPSS control service
After=nfs.target
[Service]
Type=forking
ExecStart=/etc/init.d/mpss start
ExecStop=/etc/init.d/mpss stop
[Install]
WantedBy=multi-user.target
init script
Generate the "/etc/init.d/mpss" file (here with Copyright info):
#!/bin/sh -e
# Copyright 2010-2013 Intel Corporation.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License, version 2,
# as published by the Free Software Foundation.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# Disclaimer: The codes contained in these modules may be specific to
# the Intel Software Development Platform codenamed Knights Ferry,
# and the Intel product codenamed Knights Corner, and are not backward
# compatible with other Intel products. Additionally, Intel will NOT
# support the codes or instruction set in future products.
#
# Intel offers no warranty of any kind regarding the code. This code is
# licensed on an "AS IS" basis and Intel is not obligated to provide
# any support, assistance, installation, training, or other services
# of any kind. Intel is also not obligated to provide any updates,
# enhancements or extensions. Intel specifically disclaims any warranty
# of merchantability, non-infringement, fitness for any particular
# purpose, and any other warranty.
#
# Further, Intel disclaims all liability of any kind, including but
# not limited to liability for infringement of any proprietary rights,
# relating to the use of the code, even if Intel is notified of the
# possibility of such liability. Except as expressly stated in an Intel
# license agreement provided with this code and agreed upon with Intel,
# no license, express or implied, by estoppel or otherwise, to any
# intellectual property rights is granted herein.
#
# mpss Start mpssd.
#
### BEGIN INIT INFO
# Provides: mpss
# Required-Start: $time
# Required-Stop: iptables
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Intel(R) MPSS control
# Description: Intel(R) MPSS control
### END INIT INFO
PATH="/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin"
export LD_LIBRARY_PATH="/usr/lib64"
. /lib/lsb/init-functions
exec=/usr/sbin/mpssd
sysfs="/sys/class/mic"
start()
{
[ -x $exec ] || exit 5
# Ensure the driver is loaded
[ -d "$sysfs" ] || modprobe mic
log_action_begin_msg "Starting Intel(R) MPSS"
if [ "`ps -e | awk '{print $4}' | grep mpssd`" = "mpssd" ]; then
echo
micctrl -s
return 0;
fi
$exec
RETVAL=$?
if [ "$RETVAL" = "0" ]; then
micctrl -w 1> /dev/null
RETVAL=$?
fi
echo
log_action_end_msg $RETVAL
if [ $RETVAL = 0 ]; then
micctrl -s
fi
return $RETVAL
}
stop()
{
log_action_begin_msg "Shutting down Intel(R) MPSS"
WAITRET=0
MPSSD=`ps ax | grep $exec | grep -v grep`
if [ "$MPSSD" = "" ]; then
echo
return 0;
fi
MPSSDPID=`echo $MPSSD | awk '{print $1}'`
kill -s QUIT $MPSSDPID > /dev/null 2>/dev/null
RETVAL=$?
if [ $RETVAL = 0 ]; then
while [ "`ps -e | awk '{print $4}' | grep mpssd`" = "mpssd" ]; do sleep 1; done
micctrl -w 1> /dev/null
WAITRET=$?
if [ $WAITRET = 9 ]; then
log_action_begin_msg "Shutting down Intel(R) MPSS by force"
micctrl -r 1> /dev/null
RETVAL=$?
if [ $RETVAL = 0 ]; then
micctrl -w 1> /dev/null
WAITRET=$?
fi
fi
fi
log_action_end_msg $?
echo
return $RETVAL
}
restart()
{
stop
start
}
status()
{
if [ "`ps -e | awk '{print $4}' | grep mpssd`" = "mpssd" ]; then
echo "mpss is running"
STOPPED=0
else
echo "mpss is stopped"
STOPPED=3
fi
return $STOPPED
}
unload()
{
if [ ! -d "$sysfs" ]; then
log_action_begin_msg "Removing MIC Module"
log_action_end_msg $?
echo
return
fi
stop
RETVAL=$?
log_action_begin_msg "Removing MIC Module"
if [ $RETVAL = 0 ]; then
sleep 1
modprobe -r mic
RETVAL=$?
fi
log_action_end_msg $?
echo
return $RETVAL
}
case $1 in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
status)
status
;;
unload)
unload
;;
*)
echo $"Usage: $0 {start|stop|restart|status|unload}"
exit 2
esac
exit $?
--++ network and bridging
To enable the possibility to use
OpenMPI the bridging configuration for MICs has been performed already.
Run "micctrl --addbridge=br0 --type=external --ip=192.168.0.254 --netbits=16 --mtu=64512" once.
On the host side to
brctl addbr br0
ip a|gawk -F " |:" '/mic/ {print $3}'| xargs -i brctl addif br0 {}
ip a|gawk -F " |:" '/mic/ {print $3}'| xargs -i ifconfig {} 0.0.0.0
ifconfig br0 192.168.0.254 netmask 255.255.255.0 mtu 64512
for i in $( seq 0 7);do ssh mic$i "route add default gw 192.168.0.254";done
- generate the softlink "/opt/intel/2015/impi/5.0.1.035/intel64/bin/pmi_proxy -> /bin/pmi_proxy" in /var/mpss/common before setting up the NFS root.
- add "hostname.domainame" in the /etc/hosts. Currently only the hostname is present. hostname is the name of the host.
- check whether "/bin/pmi_proxy" is present in /var/mpss/common. Otherwise copy it from the Intel visual studio bundle (buy a licence)
- copy the public user key to /local/user/username/.ssh/autorized_keys. This will be the home of the user on the mics
- compile some MPI code with the flag "-mmic". This is the code which runs on the MIC and has to be placed somewhere in /local/user/username/. The executable will then be in /home/username on the MICs
- create also the host code, if you like
- use ".../impi/5.0.1.035/intel64/bin/mpiicc" as the compiler
do
source /opt/intel/2015/composer_xe_2015.0.090/bin/compilervars.sh intel64
source /opt/intel/2015/impi_5.0.1/bin64/mpivars.sh
export DAPL_DBG_TYPE=0
transparent huge page
dsh -g mic -c -M -r ssh "echo never > /sys/kernel/mm/transparent_hugepage/enabled"
- the SSH host key changes every time the MICs get a new NFS root
- the /etc/hosts on the MICs require the hostname.domain name, currently only the hostname is present.
problems and solutions
coi does not start with message Can't drop priviledges to requested user 'micuser'
dsh -r ssh -c -M "echo \"micuser:x:400:400:MIC User:/home/micuser:/bin/false\" >> /etc/passwd"
dsh -r ssh -c -M "/etc/init.d/coi restart"
--
HenningFehrmann - 29 May 2015