I have ported my application on nodes with two Intel's Xeon Phi cards. I notice that performances are very disappointing.
As it is a MPI application, I have to give some more informations about how it works (sorry for the long text).
MPI parallelization is done with a classical 3D domain decomposition using a cartesian grid of subdomains (one process per subdomain). They have ghost cells (26 neighbours) which need to be refreshed several times per time iteration (explicit multi step scheme in time).
I need your help in setting up Xeon Phi coprocessor in cluster where I’m facing some issues in doing that. Below I’ve mentioned some of the points where having issues,
I’m setting up Rocks cluster where CentOS 6.6 in being used. MPSS and OFED versions are mentioned below:-
MPSS version - mpss-3.6-linux
Ofed - MLNX_OFED_LINUX-2.4-1.0.4-rhel6.6-x86_64
I’ve configured the Ethernet network on MIC with external bridging which is working in all the MIC’s in the cluster.
After recently upgrading to mpss 3.6, I've encountered the following behavior and hope someone has some insights or can help debug. When either starting or restarting the mpss service, the procedure fails. Immediately following the service start, the cards show:
[root@host ~]# micctrl -s
mic0: boot failed
mic1: boot failed
mic2: boot failed
mic3: boot failed
However, after several (on the order of 10) seconds, the cards appear to successfully complete boot:
I have had several issue rebuilding the MPSS on Centos 7/RHEL and later here is a proposed change to the manual to fix it.
Problem Symptoms: Are that in section
yum install kernel-headers kernel-devel
Does not install all of the right source and the build will fail.
This command should be used in it's place
sudo yum install "kernel-devel-uname-r == $(uname -r)"
This will put everything in the right place under
/lib/moduale/(uname -r)/build (which is a link)
I am attempting to use LIBXSMM on a Xeon Phi and it fails to compile when I use
make install OFFLOAD=1 PREFETCH=1 MNK="2,4,6,8,10,12,14,16,18,20,23"
I am getting the error
~/libxsmm/samples/smm/specialized.cpp(143): error: call of an object of a class type without appropriate operator() or conversion functions to pointer-to-function type
xmm(pa, pb, pc LIBXSMM_PREFETCH_ARGA(pa + asize) LIBXSMM_PREFETCH_ARGB(pb + bsize) LIBXSMM_PREFETCH_ARGC(tmp) LIBXSMM_PREFETCH_ARGC(pc + csize_act));