Setting up XeonPhi (MIC) on hosts

required kernel

A very limited number of kernel is being supported. Usually Intel supports the the kernel of some RHEL and SLES versions. The MIC driver only compile on these particular kernels. At the moment of writing this documentations it is the kernel v 3.10.

required software

One has to download the MIC bundle. See https://software.intel.com/de-de/mic-developer/tools-and-downloads

with alien one can convert the RPM und to DEB packages. The required packages are (including native packages):

  • pciutils
  • linux-image-3.10.65-atlas
  • linux-headers-3.10.65-atlas
  • glibc2.12.2pkg-libmicaccesssdk-dev
  • glibc2.12.2pkg-libmicaccesssdk0
  • glibc2.12.2pkg-libodmdebug-dev
  • glibc2.12.2pkg-libodmdebug0
  • glibc2.12.2pkg-libsettings-dev
  • glibc2.12.2pkg-libsettings0
  • glibc2.12.2pkg-mpss-flash
  • glibc2.12.2pkg-mpss-memdiag-kernel
  • glibc2.12.2pkg-mpss-rasmm-kernel
  • mpss-boot-files
  • mpss-coi
  • mpss-coi-dev
  • mpss-coi-doc
  • mpss-coi-staticdev
  • mpss-core
  • mpss-core-dev
  • mpss-daemon
  • mpss-daemon-dev
  • mpss-eclipse-cdt-mpm
  • mpss-hstreams
  • mpss-hstreams-dev
  • mpss-hstreams-doc
  • mpss-license
  • mpss-miccheck
  • mpss-miccheck-bin
  • mpss-micmgmt
  • mpss-micmgmt-doc
  • mpss-micmgmt-python
  • mpss-micsmc-gui
  • mpss-mpm
  • mpss-mpm-doc
  • mpss-myo
  • mpss-myo-dev
  • mpss-myo-doc
  • mpss-offload
  • mpss-offload-dev
  • mpss-sciftutorials
  • mpss-sciftutorials-doc
  • mpss-sdk-k1om
  • mpss-sysmgmt-micdiagnostic
  • mpss-sysmgmt-micras
  • mpss-modules-3.4.2-atlas
  • mpss-sysmgmt-python
  • libscif0
  • intel-composerxe-compat-k1om
  • bridge-utils *

NFS RootFS

configuration

One possibility of booting the MICs is using a NFS RootFS. The location is "/var/mpss" There is a "/var/mpss/common" directoiry which contains everything which should appear in the root file system. It should contain all libraries required to run OpenMP and OpenMPI and the MKL library. It also contains some self compiled tools like "screen" and "htop".

The libraries are available on the Intel side and require a comercial licence.

Generate the "/etc/mpss/default.conf" file:
Version 1 1
CommonDir /var/mpss/common
ExtraCommandLine "highres=off"
Console "hvc0"
ShutdownTimeout 300
CrashDump /var/crash/mic 16
OSimage /usr/share/mpss/boot/bzImage-knightscorner /usr/share/mpss/boot/System.map-knightscorner
BootOnStart Enabled
Base CPIO /usr/share/mpss/boot/initramfs-knightscorner.cpio.gz
MacAddrs Serial
PowerManagement cpufreq_on;corec6_on;pc3_on;pc6_on
Cgroup memory=disabled
Bridge br0 External 192.168.0.254 16 64512

The NFS export file can be generated by (replace mic0 by corresponding MIC number devices)
(micctrl --updatenfs mic0)
micctrl --initdefaults mic0
micctrl --network=static --bridge=br0 --ip=192.168.0.100 --mtu=64512 mic0
micctrl --rootdev=NFS -c -t /var/mpss/MIC0NFS -s 192.168.0.254 mic0
micctrl --addnfs=192.168.0.254:/local/user --dir=/home mic0

Here the MIC IP is 192.168.0.100. The host ip is missing and has to be manually placed into the "/etc/mpss/mic0.conf" file. add
hostip=192.168.0.254
in the
Network class=StaticBridge bridge=br0 micip=192.168.0.100 modhost=yes modcard=yes
line.

NFS export

Add into "/etc/exports"

/local/user 192.168.0.0/255.255.255.0(async,rw,no_subtree_check,no_root_squash,insecure)
/var/mpss/MIC0NFS 192.168.0.0/255.255.255.0(async,rw,no_subtree_check,no_root_squash,insecure)
/var/mpss/MIC1NFS 192.168.0.0/255.255.255.0(async,rw,no_subtree_check,no_root_squash,insecure)
/var/mpss/MIC2NFS 192.168.0.0/255.255.255.0(async,rw,no_subtree_check,no_root_squash,insecure)
/var/mpss/MIC3NFS 192.168.0.0/255.255.255.0(async,rw,no_subtree_check,no_root_squash,insecure)
/var/mpss/MIC4NFS 192.168.0.0/255.255.255.0(async,rw,no_subtree_check,no_root_squash,insecure)
/var/mpss/MIC5NFS 192.168.0.0/255.255.255.0(async,rw,no_subtree_check,no_root_squash,insecure)
/var/mpss/MIC6NFS 192.168.0.0/255.255.255.0(async,rw,no_subtree_check,no_root_squash,insecure)
/var/mpss/MIC7NFS 192.168.0.0/255.255.255.0(async,rw,no_subtree_check,no_root_squash,insecure)

if you have, e.g., 8 MICs.

small configurations

Udev rules

create a file "/opt/intel/2015/mkl# cat /etc/udev/rules.d/50-udev-mic.rules":
KERNEL=="scif", ACTION=="add", NAME="mic/%k",MODE="0666", RUN+="/bin/chmod og+x /dev/mic"
KERNEL=="ctrl", ACTION=="add", NAME="mic/%k", MODE="0666"

library comparability

Add "/usr/lib64" in the LD_LIBRARY_PATH. The MIC world is designed to work on RH derivatives.

/etc/passwd

The "/etc/passwd" file is being copied from the host. Make sure that all "/usr/bin/zsh" entries are replaced by "/bin/bash" or compile "zsh" for the mics.

--+++ systemd

If you have systemd running create the "/lib/systemd/system/mpss.service" file:
[Unit]
Description=Intel(R) MPSS control service
After=nfs.target

[Service]
Type=forking
ExecStart=/etc/init.d/mpss start
ExecStop=/etc/init.d/mpss stop

[Install]
WantedBy=multi-user.target

init script

Generate the "/etc/init.d/mpss" file (here with Copyright info):
#!/bin/sh -e
# Copyright 2010-2013 Intel Corporation.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License, version 2,
# as published by the Free Software Foundation.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# Disclaimer: The codes contained in these modules may be specific to
# the Intel Software Development Platform codenamed Knights Ferry,
# and the Intel product codenamed Knights Corner, and are not backward
# compatible with other Intel products. Additionally, Intel will NOT
# support the codes or instruction set in future products.
#
# Intel offers no warranty of any kind regarding the code. This code is
# licensed on an "AS IS" basis and Intel is not obligated to provide
# any support, assistance, installation, training, or other services
# of any kind. Intel is also not obligated to provide any updates,
# enhancements or extensions. Intel specifically disclaims any warranty
# of merchantability, non-infringement, fitness for any particular
# purpose, and any other warranty.
#
# Further, Intel disclaims all liability of any kind, including but
# not limited to liability for infringement of any proprietary rights,
# relating to the use of the code, even if Intel is notified of the
# possibility of such liability. Except as expressly stated in an Intel
# license agreement provided with this code and agreed upon with Intel,
# no license, express or implied, by estoppel or otherwise, to any
# intellectual property rights is granted herein.
#
# mpss  Start mpssd.
#
### BEGIN INIT INFO
# Provides: mpss
# Required-Start: $time
# Required-Stop: iptables
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Intel(R) MPSS control
# Description: Intel(R) MPSS control
### END INIT INFO

PATH="/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin"
export LD_LIBRARY_PATH="/usr/lib64"

. /lib/lsb/init-functions

exec=/usr/sbin/mpssd
sysfs="/sys/class/mic"

start()
{
        [ -x $exec ] || exit 5

        # Ensure the driver is loaded
        [ -d "$sysfs" ] || modprobe mic

        log_action_begin_msg "Starting Intel(R) MPSS"

        if [ "`ps -e | awk '{print $4}' | grep mpssd`" = "mpssd" ]; then
                echo
                micctrl -s
                return 0;
        fi

        $exec
        RETVAL=$?
        if [ "$RETVAL" = "0" ]; then
                micctrl -w 1> /dev/null
                RETVAL=$?
        fi

        echo

        log_action_end_msg $RETVAL

        if [ $RETVAL = 0 ]; then
                micctrl -s
        fi

        return $RETVAL
}

stop()
{
        log_action_begin_msg "Shutting down Intel(R) MPSS"

        WAITRET=0
        MPSSD=`ps ax | grep $exec | grep -v grep`

        if [ "$MPSSD" = "" ]; then
                echo
                return 0;
        fi

        MPSSDPID=`echo $MPSSD | awk '{print $1}'`
        kill -s QUIT $MPSSDPID > /dev/null 2>/dev/null
        RETVAL=$?

        if [ $RETVAL = 0 ]; then
                while [ "`ps -e | awk '{print $4}' | grep mpssd`" = "mpssd" ]; do sleep 1; done
                micctrl -w 1> /dev/null
                WAITRET=$?
                if [ $WAITRET = 9 ]; then
                        log_action_begin_msg "Shutting down Intel(R) MPSS by force"
                        micctrl -r 1> /dev/null
                        RETVAL=$?
                        if [ $RETVAL = 0 ]; then
                                micctrl -w 1> /dev/null
                                WAITRET=$?
                        fi
                fi
        fi

        log_action_end_msg $?
        echo
        return $RETVAL
}

restart()
{
        stop
        start
}

status()
{
        if [ "`ps -e | awk '{print $4}' | grep mpssd`" = "mpssd" ]; then
                echo "mpss is running"
                STOPPED=0
        else
                echo "mpss is stopped"
                STOPPED=3
        fi
        return $STOPPED
}

unload()
{
        if [ ! -d "$sysfs" ]; then
                log_action_begin_msg "Removing MIC Module"
                log_action_end_msg $?
                echo
                return
        fi

        stop
        RETVAL=$?

        log_action_begin_msg "Removing MIC Module"

        if [ $RETVAL = 0 ]; then
                sleep 1
                modprobe -r mic
                RETVAL=$?
        fi

        log_action_end_msg $?
        echo
        return $RETVAL
}

case $1 in
        start)
                start
                ;;
        stop)
                stop
                ;;
        restart)
                restart
                ;;
        status)
                status
                ;;
        unload)
                unload
                ;;
        *)
                echo $"Usage: $0 {start|stop|restart|status|unload}"
                exit 2
esac

exit $?

--++ network and bridging

To enable the possibility to use OpenMPI the bridging configuration for MICs has been performed already.

Run "micctrl --addbridge=br0 --type=external --ip=192.168.0.254 --netbits=16 --mtu=64512" once.

On the host side to

brctl addbr br0
ip a|gawk -F " |:" '/mic/ {print $3}'| xargs -i brctl addif br0 {}
ip a|gawk -F " |:" '/mic/ {print $3}'| xargs -i ifconfig {} 0.0.0.0
ifconfig  br0 192.168.0.254 netmask 255.255.255.0 mtu 64512
for i in $( seq 0 7);do ssh mic$i "route add default gw 192.168.0.254";done

run OpenMPI

  • generate the softlink "/opt/intel/2015/impi/5.0.1.035/intel64/bin/pmi_proxy -> /bin/pmi_proxy" in /var/mpss/common before setting up the NFS root.
  • add "hostname.domainame" in the /etc/hosts. Currently only the hostname is present. hostname is the name of the host.
  • check whether "/bin/pmi_proxy" is present in /var/mpss/common. Otherwise copy it from the Intel visual studio bundle (buy a licence)
  • copy the public user key to /local/user/username/.ssh/autorized_keys. This will be the home of the user on the mics
  • compile some MPI code with the flag "-mmic". This is the code which runs on the MIC and has to be placed somewhere in /local/user/username/. The executable will then be in /home/username on the MICs
  • create also the host code, if you like
  • use ".../impi/5.0.1.035/intel64/bin/mpiicc" as the compiler

do
source /opt/intel/2015/composer_xe_2015.0.090/bin/compilervars.sh intel64
source /opt/intel/2015/impi_5.0.1/bin64/mpivars.sh
export DAPL_DBG_TYPE=0

transparent huge page

dsh -g mic -c -M -r ssh "echo never > /sys/kernel/mm/transparent_hugepage/enabled"

open problems (not configured yet automatically)

  • the SSH host key changes every time the MICs get a new NFS root
  • the /etc/hosts on the MICs require the hostname.domain name, currently only the hostname is present.

problems and solutions

coi does not start with message Can't drop priviledges to requested user 'micuser'

     dsh -r ssh -c -M "echo \"micuser:x:400:400:MIC User:/home/micuser:/bin/false\" >> /etc/passwd"
     dsh -r ssh -c -M "/etc/init.d/coi restart"

-- HenningFehrmann - 29 May 2015
Topic revision: r4 - 18 Mar 2019, HenningFehrmann
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback