Using Intel® MKL in your Python program

Introduction

This article describes how to use the Intel® Math Kernel Library (Intel® MKL) from a Python program. There's more than one way to write Python programs to interface with native libraries. I've simply chosen one so that I can emphasize what might be less commonly known: how to build a custom shared library from Intel MKL so that you can call it from your script.

I'll run through the basics steps of accessing Intel MKL from Python 2.6 on a 64-bit Linux OS. The example program calls the CBLAS interface to the DGEMM function which performs a multiplication (and optional add) on general, double precision matrices. Much more about these functions can be found in the Intel® MKL reference manual (available online here).

Update: With Intel MKL 10.3 or 11.0 there is a new dynamic library which removes the need to create your own custom library. So if you're using 10.3 or later you don't need to do step 1 below. To make some changes in the behavior of this library you can look up these routines in the reference manual: mkl_set_interface_layer, mkl_set_threading_layer, mkl_set_xerbla, and mkl_set_progress.

    1. Build a custom library (now unnecessary with Intel MKL 10.3 or later): To interface with Intel MKL from Python we recommend you use the custom library builder in the tools/builder sub-directory of the Intel MKL package. The Intel® MKL User's Guide has documentation on this tool (docs online). Here briefly are the steps I took to do this:

      1. Set up your environment to use the desired version of Intel MKL:
        source /<MKLpath>/tools/environment/mklvarsem64t.sh
      2. Build the DLL:
        cd /<MKLpath>/tools/builder
        make em64t name=~/libmkl4py export=cblas_list

    1. Add library paths to LD_LIBRARY_PATH: All the Intel MKL libraries needed must be in directories contained in the LD_LIBRARY_PATH environment variable. The library as built above will depend on the OpenMP* threading runtime library used by Intel MKL (libiomp5.so) so you should make sure that both libraries, libmkl4py.so and libiomp5.so, are in a directory specified in the LD_LIBRARY_PATH environment variable. If you're using Intel MKL 10.3 or later you need to add the directories for both libmkl_rt.so and libiomp5.so (if you want it to run on multiple cores).

  1. Call Intel MKL in your Python script: The following is a simple script (also available here) that loads the shared library just created and calls the matrix function.
    from ctypes import *
    
    # Load the share library
    mkl = cdll.LoadLibrary("./libmkl_rt.so")
    # For Intel MKL prior to version 10.3 us the created .so as below
    # mkl = dll.LoadLibrary("./libmkl4py.so")
    cblas_dgemm = mkl.cblas_dgemm
    
    def print_mat(mat, m, n):
      for i in xrange(0,m):
        print " ",
        for j in xrange(0,n):
          print mat[i*n+j],
        print 
    
    # Initialize scalar data
    Order = 101  # 101 for row-major, 102 for column major data structures
    TransA = 111 # 111 for no transpose, 112 for transpose, and 113 for conjugate transpose
    TransB = 111
    m = 2
    n = 4
    k = 3
    lda = k
    ldb = n
    ldc = n
    alpha = 1.0
    beta = -1.0
    
    # Create contiguous space for the double precision array
    amat = c_double * 6      
    bmat = c_double * 12
    cmat = c_double * 8
    
    # Initialize the data arrays
    a = amat(1,2,3, 4,5,6)
    b = bmat(0,1,0,1, 1,0,0,1, 1,0,1,0)
    c = cmat(5,1,3,3, 11,4,6,9)
    
    print "nMatrix A ="
    print_mat(a,2,3) 
    print "nMatrix B ="
    print_mat(b,3,4)
    print "nMatrix C ="
    print_mat(c,2,4)
    
    print "nCompute", alpha, "* A * B + ", beta, "* C"
    
    # Call Intel MKL by casting scalar parameters and passing arrays by reference
    cblas_dgemm( c_int(Order), c_int(TransA), c_int(TransB), 
                 c_int(m), c_int(n), c_int(k), c_double(alpha), byref(a), c_int(lda), 
                 byref(b), c_int(ldb), c_double(beta), byref(c), c_int(ldc))
    
    print_mat(c,2,4)
    print
  2. A few notes:
    • Matrices in the BLAS and LAPACK parts of Intel MKL are stored in one dimensional arrays and integers are used to specify their geometry.
    • I've actually loaded here CBLAS interface to the general matrix multiply function which allows you to choose how the matrix is specified. In my script I've listed the matrix by rows (row-major ordering). If you do not use the cblas interface to the BLAS or if you use LAPACK you should keep in mind that these functions assume the Fortran method of listing matrices by columns (column-major ordering).

Here is the Python code I created that implements the steps above: matmult.py

Examples code:

We extended the list of examples demonstrate how possible to call different ( not only widespread example like dgemm ) from the Python program:

See the list of 3 different examples attached:
dft.zip  - shows the Python program calls 1D DFTI API
spblas.zip - shows how to call  matrix-matrix multiplication routine for a sparse matrix stored in the block compressed format (BSR)
vsl.zip - shows how to call vdRngGaussian routine ( generates normally distributed random numbers)  from VSL domain.

Notes:
Each zip file contains  *.res and *_list files mean input file and file for custom library building correspondingly

Для получения подробной информации о возможностях оптимизации компилятора обратитесь к нашему Уведомлению об оптимизации.