BLAS, openMP and Phi

BLAS, openMP and Phi

 

Hi,

Is it possible to implement BLAS library on Intel Phi in such a way that each thread in openMP calls a BLAS function on different data sets independently?

This requirement is different to use all the threads to perform single BLAS operation.

Thank you.

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

It makes little sense to attempt hundreds of simultaneous independent blas calls. Are you interested in the batch gemm facility of the current beta test?

You can link with the single thread version of MKL (this means each caller uses a single thread within MKL).

You may also want to look at different thread on the MKL forum relating to NUMA node affinity pinning (aggregation within NUMA node), which can be used on MIC by logically partitioning the 60 cores (60, 12, 180, 240 threads) into an arbitrary number of functional nodes: 2x30C, 3x30C, 4x30C, 5x30C, 6x30C, 10x30C, 12x30C, ... if you want evenly sized partitions. Then your program's thread to BLAS interaction would be via a queue serviced by your BLAS(MKL) virtual node master. The forum thread is rather brief in the discussion on this topic.

Jim Dempsey

 

Hi Jim,

When you say "You can link with the single thread version of MKL (this means each caller uses a single thread within MKL)",

  1. Is this means that each thread uses BLAS-MKL simultaneously? If so do you have a sample code or introduction for how to do that?
  2. Can we use openMP and assign MKL-BLAS routine to each thread to run in parallel?
  3. Can we call BLAS routine on Phi from already offloaded function? Or do we have to call BLAS as offload functions by the CPU?
  4. Where can I find BLAS batch functions for MIC architecture?

Thank you.

See: Intel Math Kernel Library | User's Guide | Linking Your Application | Linking Examples | (pick IA-32 or Intel 64).

You would pick the sequential version.

You should also read the other forum thread as you may find it beneficial to hybridize the OpenMP of the application with multiple parallel versions of MKL... But this is tricky - read the hints and understand what is necessary. The method you are expressing to employ (many application threads each with single thread MKL) will certainly be the easiest to implement but it may not yield the best performance. The fact that 4 threads share a core (L1 an L2 caches) may cause adverse performance issues when these threads are not working together on closely related data. Do not take this to mean I am suggesting to use one thread per core as that typically is not good either.

MKL batch - Tim said it is in the current Beta version. You will have to look on the MKL forum as to how to sign up for the beta program. I strongly suggest you get familiar with what you can do with the current release version with respect to advanced threading concepts.

Jim Dempsey

The following is from the MKL User's guide:

The following examples illustrate linking that uses Intel® compilers.

The examples use the .f Fortran source file. C/C++ users should instead specify a .cpp (C++) or .c (C) file and replace ifort with icc:

  • Static linking of myprog.f and parallel Intel MKL supporting the LP64 interface:

    ifort myprog.f mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib
    libiomp5md.lib

  • Dynamic linking of myprog.f and parallel Intel MKL supporting the LP64 interface:

    ifort myprog.f mkl_intel_lp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib
    libiomp5md.lib

  • Static linking of myprog.f and sequential version of Intel MKL supporting the LP64 interface:

    ifort myprog.f mkl_intel_lp64.lib mkl_sequential.lib mkl_core.lib

  • Dynamic linking of myprog.f and sequential version of Intel MKL supporting the LP64 interface:

    ifort myprog.f mkl_intel_lp64_dll.lib mkl_sequential_dll.lib mkl_core_dll.lib

  • Static linking of myprog.f and parallel Intel MKL supporting the ILP64 interface:

    ifort myprog.f mkl_intel_ilp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib

  • Dynamic linking of myprog.f and parallel Intel MKL supporting the ILP64 interface:

    ifort myprog.f mkl_intel_ilp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib
    libiomp5md.lib

  • Dynamic linking of user code myprog.f and parallel or sequential Intel MKL supporting the LP64 or ILP64 interface (Call appropriate functions or set environment variables to choose threaded or sequential mode and to set the interface):

    ifort myprog.f mkl_rt.lib

  • Static linking of myprog.f, Fortran 95 LAPACK interface, and parallel Intel MKL supporting the LP64 interface:

    ifort myprog.f mkl_lapack95_lp64.lib mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib

  • Static linking of myprog.f, Fortran 95 BLAS interface, and parallel Intel MKL supporting the LP64 interface:

    ifort myprog.f mkl_blas95_lp64.lib mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib

Leave a Comment

Please sign in to add a comment. Not a member? Join today