MKL not finding mics

MKL not finding mics

mic0: online (mode: linux image: /lib/firmware/mic/uos.img)

mic1: online (mode: linux image: /lib/firmware/mic/uos.img)

 

ifort version 14.0.1

ifort -o dgemm-test.x -openmp dgemm-test.f90 -mkl -offload -offload-option,mic,ld,"--no-undefined"

 

Fortran code starts with:

num_devices = OFFLOAD_NUMBER_OF_DEVICES()

if (num_devices==0) then

call exit

endif

 

There are no errors, even if the variables like:

MKL_MIC_ENABLE=1

MIC_MKLPATH=/phi/intel/mkl/lib/mic/

MIC_LD_LIBRARY_PATH=/phi/intel/lib (I have copied the lib files here)

MIC_OMP_NUM_THREADS=240

MIC_MKL_ROOT=/phi/intel/lib

MIC_MKLROOT=/phi/intel/mkl (in addition I also copied the lib files here)

are set.

 

Running in native more works fine, ssh and running applications built on the host run fine. All deps are resolved nicely when I run in native mode.

Next step would be to run on the host and use offload to share work between host and co-processor. Bu so far the applications will no see any of the mics.

 

I am I missing something that links the host application with the mic co-processor ?

 

 

10 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Perhaps the -offload in the command-line shown was just a typo?   If not, then remove that but I doubt that would be the cause. The other options used are fine.

Could you share your dgemm-test.f90?

Is the Composer XE 2013 installed in the default location under /opt/intel?

Hi, 

Are you using Compiler Assisted Offload or explicit offload? Are you seeing any messages or warnings? Could share a reproducer? 

-Sumedh

I use the Intel example dgemm-with-timing.f  it's well known and simply the problem. I have reinstalled the compilers under a NFS directory, I cannot use the cluster file system as the production version of FhGFS do not support NFS export. So to get things simpler I installed the compilers under /phi/intel and mount the /phi on both hosts and mics, 2013.sp1.1. The documentation states that MKL should find the co-processors and launch offload automatically if the MKL_MIC_ENABLE is set. The problem is that the mics are not even discovered by MKL or the compiler. There is no warnings, and all libraries are resolved. If not there should be warnings or errors.

I have attached a some information from tests, I could not post it inline as it triggered the spam filter.

The mics are there and can be ssh'ed into. Native runs work fine, ref attachment.

 

It might just be some missing paths or some libraries etc, but it is a real showstopper right now.

 

Regards,
Ole

 

Attachments: 

AttachmentSize
Downloadtext/plain mic-attachment.txt3.48 KB

Check that your LD_LIBRARY_PATH contains /opt/intel/mic/coi/host-linux-release/lib. I found before (discussed here) the behavior shown in your mic-attachment.txt could occur if that path was not included.

Hi,

the path /opt/intel/mic/coi/host-linux-release/lib does not exist. Very little coi content exist under opt/intel/mic.

Could it be that my co-worker who initially installed and built the mpss omitted this by answering no to a question ?

He installed mpss_gold_update_3-2.1.6720-16-rhel-6.4.tar before taking off to San Diego, CA.

Maybe I should install the newer updated mpss stack ? If I understand correctly it is the applications under coi that deals with co-processor? coi stand for co-processor interface right ?

 

Regards,

Ole

 

Installing mpss-3.1.2-rhel-6.4.tar worked.

 

 This example measures performance of computing the real
 matrix product C=alpha*A*B+beta*C using
 Intel(R) MKL subroutine DGEMM, where A, B, and C are
 matrices. alpha and beta are double precision scalars
 
 Initializing data for matrix multiplication C=A*B for
 matrix A( 4000 x 4000) and matrix B( 4000 x 4000)
 
  Checking for Intel(R) MIC Architecture (Target CPU) devices...

    Number of Target devices installed:      1

           0           1
           1  threads          32  processors
 Intializing matrix data
 
 Making the first run of matrix product using
 Intel(R) MKL DGEMM subroutine to get stable
 run time measurements
 
 Measuring performance of matrix product using
 Intel(R) MKL DGEMM subroutine
 
 == Matrix multiplication using Intel(R) MKL DGEMM ==
 == completed at    367.27743 milliseconds ==
 
 Example completed.
 

A comment is that installations instructions might be somewhat clearer. A rebuild of modules was needed.

rpmbuild --rebuild mpss-modules-3.1.2-1.el6.src.rpm

After rebuild of modules it worked fine.

 

Regards,

Ole

 

 

 

Glad to read that you resolved the issue.

I'm afraid I don't understand the need for rebuilding modules. I've never had to do that. Was there an error/warning during the MPSS installation that prompted you to know to do that?

About rebuilding modules:

The two files that comes with the tar file are:

mpss-modules-2.6.32-358.el6.x86_64-3.1.2-1.el6.x86_64.rpm
mpss-modules-dev-2.6.32-358.el6.x86_64-3.1.2-1.el6.x86_64.rpm

It turns out that my kernel is slightly different, 2.6.32-358.6.2.el6.x86_64, only the minor numbers 6.2. This is enough to put the modules in another directory, /lib/modules/2.6.32-358.el6.x86_64/, when they should have been in /lib/modules/2.6.32-358.6.2.el6.x86_64/. Maybe the module files could have been copied, but then again that might not work. Modules tend to sensitive. Better rebuild.

This is easy to resolve by just rebuilding using : rpmbuild --rebuild mpss-modules-3.1.2-1.el6.src.rpm and copy the rpm in with the other and install them with yum. Problem fixed. I have written a script that install the servers after reinstall by rocks, this script used the trick above and present the node ready for use after unattended reinstall. 

 

Ole

 

Quote:

Ole Saastad wrote:

I use the Intel example dgemm-with-timing.f  it's well known and simply the problem. I have reinstalled the compilers under a NFS directory, I cannot use the cluster file system as the production version of FhGFS do not support NFS export. So to get things simpler I installed the compilers under /phi/intel and

[...]

Er, all recent Fhgfs versions (2012.10 and 2014.01) do support NFS export. You should use NFSv4, though.

Leave a Comment

Please sign in to add a comment. Not a member? Join today