Tutorial

  • 327357-009
  • 04/15/2019
  • Public Content
  • Download as PDF

Measuring Performance with Intel® MKL Support Functions

Intel MKL provides functions to measure performance. This provides a way of quantifying the performance improvement resulting from using Intel MKL routines in this tutorial.

Measure Performance of dgemm

Use the
dsecnd
routine to return the elapsed CPU time in seconds.
The quick execution of the
dgemm
routine makes it difficult to measure its speed, even for an operation on a large matrix. For this reason, the exercises perform the multiplication multiple times. You should set the value of the
LOOP_COUNT
constant so that the total execution time is about one second.
* Fortran source code is found in dgemm_with_timing.f PRINT *, "Making the first run of matrix product using " PRINT *, "Intel(R) MKL DGEMM subroutine to get stable " PRINT *, "run time measurements" PRINT *, "" CALL DGEMM('N','N',M,N,K,ALPHA,A,M,B,K,BETA,C,M) PRINT *, "Measuring performance of matrix product using " PRINT *, "Intel(R) MKL DGEMM subroutine" PRINT *, "" S_INITIAL = DSECND() DO R = 1, LOOP_COUNT CALL DGEMM('N','N',M,N,K,ALPHA,A,M,B,K,BETA,C,M) END DO S_ELAPSED = (DSECND() - S_INITIAL) / LOOP_COUNT PRINT *, "== Matrix multiplication using Intel(R) MKL DGEMM ==" PRINT 50, " == completed at ",S_ELAPSED*1000," milliseconds ==" 50 FORMAT(A,F12.5,A) PRINT *, ""

Measure Performance Without Using dgemm

In order to show the improvement resulting from using
dgemm
, perform the same measurement, but use a triply-nested loop to multiply the matrices.
* Fortran source code is found in matrix_multiplication.f PRINT *, "Making the first run of matrix product using " PRINT *, "triple nested loop to get stable run time" PRINT *, "measurements" PRINT *, "" DO I = 1, M DO J = 1, N TEMP = 0.0 DO L = 1, K TEMP = TEMP + A(I,L) * B(L,J) END DO C(I,J) = TEMP END DO END DO PRINT *, "Measuring performance of matrix product using " PRINT *, "triple nested loop" PRINT *, "" S_INITIAL = DSECND() DO R = 1, LOOP_COUNT DO I = 1, M DO J = 1, N TEMP = 0.0 DO L = 1, K TEMP = TEMP + A(I,L) * B(L,J) END DO C(I,J) = TEMP END DO END DO END DO S_ELAPSED = (DSECND() - S_INITIAL) / LOOP_COUNT PRINT *, "== Matrix multiplication using triple nested loop ==" PRINT 50, " == completed at ",S_ELAPSED*1000," milliseconds ==" 50 FORMAT(A,F12.5,A) PRINT *, ""
Compare the results in the first exercise using
dgemm
to the results of the second exercise without using
dgemm
.
You can find more information about measuring Intel MKL performance from the article "A simple example to measure the performance of an Intel MKL function" in the Intel Math Kernel Library Knowledge Base.
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804