One of the big new features introduced in the Intel® Math Kernel Library (Intel® MKL) 11.2 is the greatly improved performance for small problem sizes. In 11.2, this improvement focuses on xGEMM functions (matrix multiplication). Out of the box, there is already a version-to-version improvement (from Intel MKL 11.1 to Intel MKL 11.2). But on top of it, Intel MKL introduces a new control that can lead to further significant performance boost for small matrices. Users can enable this control when linking with Intel MKL by specifying "-DMKL_DIRECT_CALL" or "-DMKL_DIRECT_CALL_SEQ". At the run time, the execution will be dispatched to a fast path for small input matrices. The fast path skips error checking and multiple layers of function calls, therefore improves performance by reducing associated overhead. The matrix sizes have to be small, for example, only a few dozens of rows and columns. For larger matrices the regular execution path is taken. MKL_DIRECT_CALL and MKL_DIRECT_CALL_SEQ do not help, but do not do any harm either.
The chart below is a comparison between 4 scenarios of computing double-precision matrix-matrix multiplication for small matrices:
The matrices used in this chart are all square. The version-to-version improvement of Intel MKL 11.2 over 11.1.1, as well as the additional benefit brought by MKL_DIRECT_CALL, are evident.
These are the pre-processor macros to be defined to instruct Intel MKL to pick the fast path for small matrices. The first macro, MKL_DIRECT_CALL, is used when you plan to link to the parallel Intel MKL library. The second, MKL_DIRECT_CALL_SEQ, is used when you plan to link to the sequential Intel MKL library. These macros do not have effects on larger matrices.
For a program in the C language on Linux system, simply add -DMKL_DIRECT_CALL or -DMKL_DIRECT_CALL_SEQ. On Windows, the syntax is /DMKL_DIRECT_CALL or /DMKL_DIRECT_CALL_SEQ. Usually, the flag -std=c99 (/Qstd=c99 on Windows) is also needed. This has been tested on mainstream C and C++ compilers such as Intel C++ Compiler, GCC, Microsoft Visual Studio, etc. Note that this also works for the CBLAS interface (Intel MKL 11.2 Update 2 and later).
For a program in Fortran, first inlcude "mkl_direct_call.fi". See below for an example from the "Intel MKL User's Guide". Then, add -DMKL_DIRECT_CALL (/DMKL_DIRECT_CALL on Windows) or -DMKL_DIRECT_CALL_SEQ (/DMKL_DIRECT_CALL_SEQ on Windows). If you are using Intel Fortran Compiler then pass -fpp (/fpp on Windows) to enable Fortran pre-processing. If you are using PGI Fortran compiler then pass -Mpreprocess instead. This feature does not work with GNU Fortran compiler.
# include "mkl_direct_call.fi" program DGEMM_MAIN .... * Call Intel MKL DGEMM .... call sub1() stop 1 end * A subroutine that calls DGEMM subroutine sub1 * Call Intel MKL DGEMM end
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804