Verbose Mode Supported in Intel® MKL

By Gennady Fedorov, Dmitrii Zarukin, Ying Hu, Published: 03/30/2014, Last Updated: 01/15/2020

Introduction

     We Introduced a useful verbose mode support feature since the Intel® Math Kernel Library (Intel® MKL) 11.2, for BLAS and LAPACK domains.

    Since version 2018 Intel MKL introduced supporting verbose mode for Fourier Transform functions (FFT) domain.

    Since version 2019 update 3 Intel MKL introduced MKL_VERBOSE support for the following ScaLAPACK functions: P?POTRF, P?TRTRI, PDSYEV{D, R, X} and PZHEEV{D, R, X}.  All MPI ranks will print MKL_VERBOSE output.

This feature enables developers to better understand Intel MKL function run-time usage in their programs. Verbose mode support provides the ability to extract information related to the version of Intel MKL used and the instruction set supported by the run-time processor, the Intel MKL functions called and the parameters passed to them, and the amount of time spent in each function call

Using Intel® MKL Verbose Mode

To enable the Intel MKL Verbose mode for an application, do one of the following:

•  Set the environment variable MKL_VERBOSE  to 1 

•  Call the support function mkl_verbose(1)

By default the verbose mode is disabled. When it is on, every call of a verbose-enabled function finishes with printing verbose log, including the list of version Information, the name of a function, values of the arguments, time is taken by the function and others.

Example 1: Using Verbose Mode for DGEMM 

The following is an example of calling matrix*matrix function dgemm() function and switch on the option MKL_VERBOSE and get the run-time information of dgemm

 The version information line:

MKL_VERBOSE Intel(R) MKL 11.2 build 20140312 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions (Intel(R) AVX) enabled processors, Lnx 2.70GHz lp64 intel_thread NMICDev:0

The information indicates that the current MKL version is 11.2 , the type of processor is Intel(R) AVX enabled , Operating System is Linux , CPU Frequency is 2.70GHz, it is using lp64 interface and thread MKL library,  and not using a Co Processor

And call description line:

MKL_VERBOSE DGEMM(N,N,1000,1000,1000,0x7fff10ff6560,0x7f9d09f20010,1000,0x7f9d0a6c2010,1000,0x7fff10ff6568,0x7f9d0977e010,1000) 15.79ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:16 WDiv:HOST:+0.000

The line show, the program is using DGEMM with the input parameter: N,N,1000,1000,1000,0x7fff10ff6560,0x7f9d09f20010,1000,0x7f9d0a6c2010,1000,0x7fff10ff6568,0x7f9d0977e010,1000.  It takes 15.79ms. The environment MKL_CBWR is OFF and MKL_DYNAMIC and FastMemory Manager is on. The print thread ID is 0. And the total used 16 threads. Ignore the WDiv:HOST:+0.000 as it is for coprocessor.

Example 2: Using Verbose Mode for 2D real FFT 

The following is an example of calling FFT functions.  Build the fft program and product binary. Before run the binary, set MKL_VERBOSE=1. The verbose information in the program will be shown up:

MKL_VERBOSE Intel(R) MKL 2019.0 Update 1 Product build 20180928 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 2.50GHz intel_thread

MKL_VERBOSE FFT(scfi7x13,tLim:1,desc:0x514a0c0) 32.49us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 

FFT(scfi7x13,tLim:1,desc:0x514a0c0) is a functional dump of MKL FFT descriptor. The content is interpreted as:

    scfi7x13, - is a problem description.

    s/d for Single/Double precision;

    c/r for Complex/Real forward domain;

    f/b for Forward/Backward compute direction;

    i/o for in-place/out-of-place output memory placement;

    7x13 for dimensions lengths go from the biggest^ dimension to smallest^ dimension, "x" is a delimiter between dimensions.

    ^ smallest dimension means that transform points are located in memory the most dense way.

      tLim:1 is a DFTI_THREAD_LIMIT setting, a number of threads to be used in a run-time (if available) to compute the FFT problem. It uses and prints set up value or, if not set, adjust itself to a specific value to achieve the best performance on the given system.

   desc:0x514a0c0  is a handle address in memory.

    32.49us is run time.

What also may appear in verbose out:

Problem description:

 *16 for DFTI_NUMBER_OF_TRANSFORMS setting (or batch setting) when input distance between two transforms equals to multiplication of all dimension lengths, or so named standard memory layout; "*" separates a problem from a batch plus distances settings.

v512 for DFTI_NUMBER_OF_TRANSFORMS setting (or batch setting) when input distance between two transforms equals to 1, or so named compact memory layout;  "v" separates a problem from a batch plus distances settings.

7:30:30x13:1:1 for non-standard strides. If strides differ from standard (in this particular case, the value of 13 is considered to be a standard, not 30), it will dump a full problem, which is <length>:<inputStride>:<outputStride> for each dimension goes from the biggest dimension to smallest dimension. If there's also a batch setting, batch size plus input and output distances will appear at the end of the problem description

fScale:x / bScale:x,  are DFTI_FORWARD_SCALE/DFTI_BACKWARD_SCALE settings and reflect the value provided by the user. Default values of 1.0 for each setting are not printed.

 pack: perm is a DFTI_PACKED_FORMAT setting. The default value of CCE format is not printed.

 input: unaligned, is a check for input data alignment on a 64-byte boarder. If the data is aligned, this is not printed. The same is true for output memory. A case for out-of-place split complex (when DFTI_COMPLEX_STORAGE = DFTI_REAL_REAL) is not supported.

Else parameters that are allowed to be set from Intel(R) MKL FFT API will be printed if non-default values were used.

 

MKL Verbose TOOLKIT

An Argonne National Laboratory researcher has written a parsing tool to summarize MKL_VERBOSE output. The tool can be very useful for customers who need a summary of many MKL calls and their statistics. The link to the GitHub is https://github.com/TApplencourt/mkl-verbose-toolkit

Some Limitations:

Because every call to a verbose-enabled function requires an output operation, the performance of the application may degrade with the verbose mode enabled.

Besides of this, MKL Verbose mode has the following limitations: 

  • Input values of parameters passed by reference are not printed if the values were changed by the function.  For example, if a LAPACK function is called with a workspace query, that is, the value of the lwork parameter equals -1 on input, the call description line prints the result of the query and not -1.
  • Return values of functions are not printed. For example, the value returned by the function ilaenv is not printed.
  • Floating-point scalars passed by reference are not printed.

Please see the MKL Developer Guide for more details about the verbose mode of MKL.

Attachment Size
a-0.jpg 0

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804