Improving Performance of Math Functions with Intel® Math Kernel Library

Introduction

Intel® Math Kernel Library1 (Intel® MKL) is a product that accelerates math processing routines to increase the performance of an application when running on systems equipped with Intel® processors. Intel MKL includes linear algebra, fast Fourier transforms (FFT), vector math, and statistics functions.

To illustrate performance improvement using Intel MKL, this paper selects matrix multiplication operation as an example. Matrix multiplication operation is used here because it is a fundamental mathematical operation that has many applications across most scientific fields.

Performance Test Procedure

To demonstrate how Intel MKL can help improve the performance of matrix operation, we used a code sample downloaded from GitHub.

The tests were done on two systems; one system equipped with the Intel® Xeon® processor E5-2699 v4 and the other equipped with the Intel® Xeon® Platinum 8180 processor.

The performance was measured by comparing the time, in seconds, it takes to compute the matrix multiplication.

The tests were done using the following steps:

  1. Measuring the time (in seconds) it takes to complete 2000 x 2000, 4000 x 4000, and 10000 x 10000 matrix multiplications using different methods of optimization. Figure 1 shows how to specify the matrix sizes and optimized methods options. More information about these methods can be found in the link above.

     


    Figure 1. Matrix size specifications and optimization method options.

    Figure 1 shows different optimized methods. Option 2 optimizes the matrix multiplication using vectorized sdot with Intel® Streaming SIMD Extensions (Intel SSE) and option 7 utilizes option 2 with loop tiling. All measurements were collected on the system equipped with the Intel Xeon processor E5-2699 v4.

  2. Comparing results and selecting the best results to be used for later steps.
  3. Repeating steps 1 and 2 on the system equipped with the Intel Xeon Platinum 8180 processor.
  4. Creating a new matrix multiplication function using Intel MKL in the file matmul.c.

    This involved two steps:

    a) Adding the mkl include file as follows:

         #include <mkl.h></mkl.h>

    b) Making a call to the mkl function cblas_sgemm as follows:

         cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, n_a_rows, n_b_cols, n_a_cols, 1.0f, a[0], n_a_rows, b[0], n_b_rows, 0.0f, m[0], n_a_rows);

  5. Running the test again with the Intel MKL function implemented. Measuring the time it takes to do the matrix multiplication for 2000 x 2000, 4000 x 4000, and 10000 x 10000 matrices.
  6. Comparing the results in step 5 with the best results in steps 2 and 3.

Test Configurations

Hardware

   System #1

  • System: Preproduction
  • Processor: Intel Xeon processor E5-2699 v4 @ 2.2 GHz
  • Cores: 22
  • Memory: 256 GB DDR4

   System #2

  • System: Preproduction
  • Processor: Intel Xeon Platinum 8180 @ 2.5 GHz
  • Cores: 28
  • Memory: 256 GB DDR4

Software

  • Ubuntu* 16.04 LTS
  • GCC* version 5.4.0
  • Intel MKL 2017

Test Results


Figure 2. Results of different optimized methods on different-sized matrices.

Figure 2 shows that the optimized method using explicit vectorized sdot with loop tiling performed the best on all sizes of matrices. The results of this method will be compared against those of the Intel MKL method.


Figure 3. Results of Intel® MKL on systems equipped with Intel® Xeon® processor E5-2699 v4 and Intel® Xeon® Platinum 8180 processor.

Figure 3 shows the results of the matrix multiplications using the Intel MKL method on systems equipped with the Intel Xeon processor E5-2699 v4 and the Intel Xeon Platinum 8180 processor.


Figure 4. Results with and without Intel® MKL on system equipped with the Intel® Xeon® Platinum 8180 processor.

Figure 4 shows the results of the matrix multiplications using the Intel MKL method and without the Intel MKL method on a system equipped with the Intel Xeon Platinum 8180 processor.

Conclusion

Intel MKL greatly improves the performance of Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) functions since it takes advantage of special features in the new generation of Intel processors such as Intel® Advanced Vector Extensions 512 that greatly speed up matrix operations. With Intel MKL, you don’t need to modify your source code to take advantage of new features of Intel processors. Just make sure to link the code to the latest version of Intel MKL to automatically detect and make use of new features in Intel Xeon processors.

References

1. Intel® Math Kernel Library

For more complete information about compiler optimizations, see our Optimization Notice.