dapl with MPSS 3.5 and Qlogic HCA

I am running MPSS 3.5 and OFED+ and have 2 nodes with Phi cards and Qlogic. I believe that if I set I_MPI_FABRICS to use  either tcp or tmi everything works, but I've heard that dapl is faster and I'm having problems getting that to work everywhere. It works when MPI tasks are either only on the hosts or only on cards in a single host. If there are tasks on the host and a card it appears to have problems connecting to the IP address that is added during the ofed-mic service startup (

functions like pow() used in offload programming


I am using the offload programming model on Xeon Phi, and in the code which  I want to offload to the Xeon Phi, I need  to use some math functions like pow() function, however when I  compile the code ,there are some errors during link stage. the error show that it seem the compiler can not find the reference of the pow function. so which path environment should set ?

Thank you!


New Jim Dempsey article: Elusive Algorithms – Parallel Scan


Since I haven't seen a notification of this elsewhere, the ever knowledgeable Jim Dempsey (QuickThreadProgramming.com) just published one of his great technical articles entitled, "Elusive Algorithms – Parallel Scan".

I believe this was an outgrowth of another discussion on the forums, "how to perform inclusive scan in C cilk".


Error getting OFED to compile when mic is selected.

Compiling OFED with with phi and --all fails when compiling compat-rdma.
Compile without the "--with-xeon-phi" option works. 

./install.pl --with-xeon-phi --all

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.1 (Maipo)

# uname -r

rpm -qi mpss-sdk-k1om-3.5-1.x86_64
Name        : mpss-sdk-k1om
Version     : 3.5
Release     : 1
Architecture: x86_64
Install Date: Thu 09 Apr 2015 10:00:28 PM CDT
Group       : base
Size        : 484359036
License     : various
Signature   : DSA/SHA1, Thu 02 Apr 2015 06:57:59 AM CDT, Key ID

MICs appear to crash

I'm trying to setup a new system:

SuperMicro 5018GR-T

2 Intel Xeon Phis:

                Coprocessor Stepping     : B1
                Board SKU                : B1PRQ-31S1P

MPSS 3.5 and Scientific Linux 7.1


Using MKL to generate random data on Xion Phi


I try to use MKL to generate lots of random data every time on Xeon phi, but the performance is very bad comparing the performance on Xeon CPU.(E5620) .

The attachment is the original code, and the compile option for Xeon Phi is -O3  -mkl -mmic. and it takes about 115 seconds, however when I run it on Xeon CPU,it only takes 3.5 seconds. I do not know why the difference is so much. Is the way in which I use the Xeon Phi  wrong or the real  performance on Xeon Phi is bad? 

Thank you!



Subscribe to Enterprise