Intel® oneAPI Math Kernel Library Cookbook

ID 758503
Date 9/27/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Speeding up Python* scientific computations

Goal

Use Intel® oneAPI Math Kernel Library (oneMKL) to boost Python* applications that perform heavy mathematical computations.

Solution

Python applications with a high amount of mathematical computations use these packages:

NumPy*

Consists of an N-dimensional array object, a multi-dimensional container of generic data.

SciPy*

Includes modules for linear algebra, statistics, integration, Fourier transforms, ordinary differential equations solvers, and more. Depends on NumPy for fast N-dimensional array manipulation.

To speed up NumPy/SciPy computations, build the sources of these packages with oneMKL and run an example to measure the performance. To get further performance boost on systems with Intel® Xeon Phi™ coprocessors available, enable Automatic Offload.

Building NumPy and SciPy with oneMKL

IMPORTANT:

To benefit from NumPy and SciPy prebuilt with oneMKL, download Intel® Distribution for Python* from https://software.intel.com/en-us/intel-distribution-for-python.

These steps assume a Linux* or Windows* operating system, Intel® 64 architecture, and ILP64 interface.

  1. Get the latest NumPy and SciPy packages from http://www.scipy.org/Download and unpack them

  2. Install the latest versions oneMKL and Intel® C++ and Intel® Fortran Compilers

  3. Set the environment variables for Intel C++ and Fortran compilers:

    • Linux*:

      Execute the command:

      $source <intel tools installation dir>/bin/compilervars.sh intel64
    • Windows*:

      Launch environment setters to specify the Visual Studio* mode for your Intel64 build binaries:

      1. (Windows 8:) Place the mouse pointer in the bottom-left corner of the screen, click the right mouse button, select Search, and click anywhere in the screen white space.

      2. Navigate to the Intel Parallel Studio 2016 section and select Intel64 Visual Studio 20XX mode.

  4. Change directory to <numpy dir>

  5. Make a copy of the existing site.cfg.example and save it as site.cfg

  6. Open site.cfg, uncomment the [mkl] section, and modify it to look as follows:

    • Linux:

      [mkl]
      library_dirs = /opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64
      include_dirs = /opt/intel/compilers_and_libraries_2016/linux/mkl/include
      mkl_libs = mkl_rt
      lapack_libs =
      
    • Windows:

      [mkl]
      library_dirs = C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2016\windows\mkl\lib\intel64;
      C:\Program Files (x86)\Intel\Composer XE 2015.x.yyy\compiler\lib\intel64 include_dirs = C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2016\windows\mkl\include mkl_libs = mkl_lapack95_lp64,mkl_blas95_lp64,mkl_intel_lp64,mkl_intel_thread,mkl_core,libiomp5md lapack_libs = mkl_lapack95_lp64,mkl_blas95_lp64,mkl_intel_lp64,mkl_intel_thread,mkl_core,libiomp5md
  7. Modify intelccompiler.py in <numpy dir>/distutils to pass optimization options to Intel C++ Compiler:

    • Linux:

      self.cc_exe = 'icc –O3 –g -xhost –fPIC  
      –fomit-frame-pointer –openmp –DMKL_ILP64'
      
    • Windows:

      self.compile_options = [ '/nologo', '/O3', '/MD', '/W3', '/Qstd=c99', 
      '/QxHost', '/fp:strict', '/Qopenmp']
  8. Modify intel.py in the <numpy dir>/distutils/fcompiler folder to pass optimization options to Intel Fortran Compiler:

    • Linux:

      ifort –xhost –openmp –i8 –fPIC
      
    • Windows:

      def get_flags(self):
          opt = ['/nologo', '/MD', '/nbs','/names:lowercase', '/assume:underscore']
      
  9. Change directory to <numpy dir> and build and install NumPy:

    • Linux:

      $python setup.py config --compiler=intelem build_clib 
      --compiler=intelem 
      build_ext --compiler=intelem install
    • Windows:

      python setup.py config --compiler=intelemw build_clib 
      --compiler=intelemw build_ext --compiler=intelemw install
  10. Change directory to <scipy dir> and build and install SciPy:

    • Linux:

      $python setup.py config --compiler=intelem --fcompiler=intelem build_clib  
      --compiler=intelem --fcompiler=intelem build_ext --compiler=intelem 
      --fcompiler=intelem install
      
    • Windows:

      python setup.py config --compiler=intelemw --fcompiler=intelvem build_clib 
      --compiler=intelemw --fcompiler=intelvem build_ext --compiler=intelemw 
      --fcompiler=intelvem install
      

Code Example

import numpy as np
import scipy.linalg.blas as slb
import time

M = 10000
N = 6000
k_list = [64, 128, 256, 512, 1024, 2048, 4096, 8192]

np.show_config()

for K in k_list:
        a = np.array(np.random.random((M, N)), dtype=np.double, order='C', copy=False)
        b = np.array(np.random.random((N, K)), dtype=np.double, order='C', copy=False)
        A = np.matrix(a, dtype=np.double, copy=False)
        B = np.matrix(b, dtype=np.double, copy=False)

        start = time.time()
        C = slb.dgemm(1.0, a=A, b=B)
        end = time.time()

        tm = start - end
        print ('{0:4}, {1:9.7}'.format(K, tm))

Source code: see the dgemm_python folder in the samples archive available at https://software.intel.com/content/dam/develop/external/us/en/documents/mkl-cookbook-samples-120115.zip.

Enabling Automatic Offload

If Intel® Xeon Phi™ coprocessors are available on your system, to enable Automatic Offload of computations to coprocessors, set the environment variable MKL_MIC_ENABLE to 1.

Discussion

The build steps install NumPy and SciPy in the default Python path. To install them in your home directory or another specific folder, pass –prefix=$HOME or the folder path to the commands in steps 9 or 10. IF you install Python into $HOME, after building NumPy and before building SciPy, set the PYTHONPATH environment variable to $HOME/lib/pythonY.Z/site-packages, where Y.Z is the Python version.

Specific instructions in step 3 for selecting the Visual Studio* mode for your Intel64 build binaries depend on the Windows version. For example:

On Windows 7, go to All Programs -> Intel Parallel Studio XE 20XX -> Command Prompt and select Intel64 Visual Studio 20XX mode, where 20XX is the version of Visual Studio, such as 2014.

The code example uses the most common matrix-matrix multiplication routine dgemm from SciPy and NumPy arrays to create and initialize the input matrices. If NumPy and SciPy are built with oneMKL, this code actually calls oneMKL BLAS dgemm routine.

If Intel® Xeon Phi™ coprocessors are available on your system, some oneMKL routines can take advantage of the coprocessors (for the list of Automatic Offload enabled oneMKL functions, see [AO]). If Automatic Offload is enabled, these routines split the computations between the host CPU(s) and coprocessor(s).