Performance Comparison of OpenBLAS* and Intel® Math Kernel Library in R

Introduction

Today, scientific and business industries collect large amounts of data, analyze them, and make decisions based on the outcome of the analysis. They employ data visualization techniques and predictive analytics to predict future probabilities and trends. R is a programming language for computational statistics, data visualization, and predictive analytics [1]. Since data visualization and predictive analytics are compute intensive, it’s important to find ways to speed up the computing process in order to allow faster business and scientific decision making. This paper compares the performance of Basic Linear Algebra Subprograms (BLAS) [2], libraries OpenBLAS [3], and Intel® Math Kernel Library (Intel® MKL) [4].

Performance Test Procedure

Performance is measured based on how long (in seconds) it takes to run the tests. To compare the performance of the libraries, we performed the tests on a system equipped with the Intel® Xeon® processor E5-2697 v4. We first loaded the OpenBLAS and ran the tests. Next, we loaded the Revolution R and Intel MKL and then reran the tests. We created simple tests to measure how long it takes to perform certain R functions. For example, to measure the performance of the cross product and Cholesky [5] function of a matrix, we followed these steps:

  1. Create a matrix A.
  2. Measure the time of the cross product of A using the following command:
    system.time (crossprod(A))
  3. Measure the time of the cholesky of A using the following command:
    system.time (chol(A))

The following tests were performed:

  • The cross product of a matrix (R function crossprod)
  • The Cholesky decomposition of a matrix (R function chol)
  • Singular value decomposition (R function svd) [6]
  • Principal component analysis (R function prcomp) [7]
  • R-benchmark v2.5 (this benchmark has a total of 15 tests.)

Test Configurations

Hardware

  • System: Preproduction
  • Processor: Intel Xeon processor E5-2697 v4 @2.3 GHz
  • Cores: 18
  • Memory: 128 GB DDR4

Software

  • RedHat Enterprise Linux* 7.0
  • R 3.2.2
  • Revolution R* 3.2.2
  • OpenBLAS 0.2.14
  • Intel MKL (from revomath-3.2.2)

Note: Revolution R [8] was used here as a mean to test R functions with Intel MKL since it is, by default, linked to Intel MKL.

Test Results


Figure 1: The elapsed time of the tests OpenBLAS* versus Intel® Math Kernel Library.

Figure 1 only shows the total elapsed time of the R-benchmark-25 [14] test. The results are sorted in ascending order of Intel MKL performance improvement.


Figure 2: The R-benchmark-25 detail results of OpenBLAS* versus Intel® Math Kernel Library.

Figure 2 shows the individual results of the R-benchmark v. 2.5. The results are sorted in ascending order of Intel MKL performance improvement. Intel MKL outperformed OpenBLAS on almost all the tests except the final test, Escoufier’s method on a 45x45 matrix. More information about Eigenvalues, Fibonacci, Hilbert, and Toeplitz can be found at [9], [10], [11], and [12] in the reference section, respectively.

Note that the tests were not done on the latest version of the Intel MKL. The latest version of Intel MKL has already been optimized for small matrices.

Benefits of Using Intel® Math Kernel Library

The results of figures 1 and 2 show that using Intel MKL on systems equipped with Intel® Xeon® processors E5-2697 v4 product family helps speed up the R functions such as cross product, Cholesky decomposition, singular value decomposition (SVD), and so on as compared to using OpenBLAS. These functions are important in teaching machine learning (ML) methods and modern data analysis. Intel MKL helps improve the performance of those functions by taking advantage of special features in Intel Xeon processor E5 v4 called Intel® Advanced Vector Extensions 2 (Intel® AVX2) that boosts the performance of matrix manipulation. The Intel Xeon processor E5 v4 implemented a hardware feature called fused multiply-add (FMA) [13] that greatly speed ups the multiply-add operation that is used extensively in matrix manipulation. For more information about FMA, go to www.software.intel.com. As new Intel® Xeon® processors launch with more improved architecture, newer version of Intel MKL will make use of new features to optimize the above functions even more without the need for user intervention.

Conclusion

R plays an important role in analyzing data. Speeding up R will help improve performance of data analysis tools. Since data analysis tools heavily involve matrix computation, in general, Intel MKL will help speed up these tools because Intel MKL takes advantage of special features like Intel AVX2 that greatly speed up matrix calculation. With Intel MKL, you don’t need to modify R source code. Just make sure to link the R compiler to the latest version of Intel MKL to take advantage of new features in new Intel Xeon processors.

This is the first article in the ML series. Upcoming articles will discuss how Intel MKL helps speed up not only at the functional level but also at the application level. We will also have articles discussing how to use and optimize ML applications using Python*.

References

[1] What is R? https://www.r-project.org/about.html

[2] Basic Linear Algebra Subprograms - Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms

[3] An optimized BLAS library http://www.openblas.net/

[4] Intel® Math Kernel Library https://software.intel.com/en-us/intel-mkl?wapkw=intel%20mkl

[5] Cholesky decomposition – Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Cholesky_decomposition

[6] Singular value decomposition - Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Singular_value_decomposition

[7] Principle component analysis - Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Principal_component_analysis

[8] Revolution R http://www.revolutionanalytics.com/

[9] Eigenvalue – from Wolfram MathWorld http://mathworld.wolfram.com/Eigenvalue.html

[10] What is the Fibonacci sequence? http://www.livescience.com/37470-fibonacci-sequence.html

[11] Hilbert matrix - Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Hilbert_matrix

[12] Toeplitz matrix - Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Toeplitz_matrix

[13] Multiply–accumulate operation - Fused multiply.E2.80.93add - Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation#Fused_multiply.E2.80.93add

[14] http://r.research.att.com/benchmarks/R-benchmark-25.R

For more complete information about compiler optimizations, see our Optimization Notice.