Verbose Mode Supported in Intel® MKL

Introduction

     We Introduced a useful verbose mode support feature since the Intel® Math Kernel Library (Intel® MKL) 11.2, for BLAS and LAPACK domains.

    Since version 2018 Intel MKL introduced supporting verbose mode for Fourier Transform functions (FFT) domain.

    Since version 2019 update 3 Intel MKL introduced MKL_VERBOSE support for the following ScaLAPACK functions: P?POTRF, P?TRTRI, PDSYEV{D,R,X} and PZHEEV{D,R,X}.  All MPI ranks will print MKL_VERBOSE output.

This feature enables developers to better understand Intel MKL function run-time usage in their programs. Verbose mode support provides the ability to extract information related to the version of Intel MKL used and the instruction set supported by run-time processor, the Intel MKL functions called and the parameters passed to them, and the amount of time spent in each function call

Using Intel® MKL Verbose Mode

To enable the Intel MKL Verbose mode for an application, do one of the following:

•  Set the environment variable MKL_VERBOSE  to 1 

•  Call the support function mkl_verbose(1)

By default the verbose mode is disabled. When it is on, every call of a verbose-enabled function finishes with printing verbose log, including the list of version Information, the name of function, values of the arguments, time taken by the function and others.

Example 1: Using Verbose Mode for DGEMM 

The following is an example of calling matrix*matrix function dgemm() function and switch on the option MKL_VERBOSE and get the run-time information of dgemm

 The version information line:

MKL_VERBOSE Intel(R) MKL 11.2 build 20140312 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions (Intel(R) AVX) enabled processors, Lnx 2.70GHz lp64 intel_thread NMICDev:0

The information indicates that the current MKL version is 11.2 , the type of processor is Intel(R) AVX enabled , Operating System is Linux , CPU Frequency is 2.70GHz, it is using lp64 interface and thread MKL library,  and not using a Co Processor

And call description line:

MKL_VERBOSE DGEMM(N,N,1000,1000,1000,0x7fff10ff6560,0x7f9d09f20010,1000,0x7f9d0a6c2010,1000,0x7fff10ff6568,0x7f9d0977e010,1000) 15.79ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:16 WDiv:HOST:+0.000

The line show, the program is using DGEMM with the input parameter: N,N,1000,1000,1000,0x7fff10ff6560,0x7f9d09f20010,1000,0x7f9d0a6c2010,1000,0x7fff10ff6568,0x7f9d0977e010,1000.  It takes 15.79ms. The environment MKL_CBWR is OFF and MKL_DYNAMIC and FastMemory Manager is on. The print thread ID is 0. And the total used 16 threads. Ignore the WDiv:HOST:+0.000 as it is for coprocessor.

Example 2: Using Verbose Mode for 2D real FFT 

The following is an example of calling FFT functions.  Build the fft program and product binary. Before run the binary, set MKL_VERBOSE=1. The verbose information in the program will be shown up:

MKL_VERBOSE Intel(R) MKL 2019.0 Update 1 Product build 20180928 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 2.50GHz intel_thread

MKL_VERBOSE FFT(scfi7x13,tLim:1,desc:0x514a0c0) 32.49us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 

FFT(scfi7x13,tLim:1,desc:0x514a0c0) is a functional dump of MKL FFT descriptor. The content is interpreted as:

    scfi7x13, - is a problem description.

    s/d for Single/Double precision;

    c/r for Complex/Real forward domain;

    f/b for Forward/Backward compute direction;

    i/o for in-place/out-of-place output memory placement;

    7x13 for dimensions lengths go from the biggest^ dimension to smallest^ dimension, "x" is a delimiter between dimensions.

    ^ smallest dimension means that transform points are located in memory the most dense way.

      tLim:1 is a DFTI_THREAD_LIMIT setting, number of threads to be used in a run-time (if available) to compute FFT problem. It uses and prints set up value or, if not set, adjust itself to a specific value to achieve the best performance on the given system.

   desc:0x514a0c0  is a handle address in memory.

    32.49us is a run time.

What also may appear in verbose out:

Problem description:

 *16 for DFTI_NUMBER_OF_TRANSFORMS setting (or batch setting) when input distance between two transforms equals to multiplication of all dimension lengths, or so named standard memory layout; "*" separates a problem from a batch plus distances settings.

v512 for DFTI_NUMBER_OF_TRANSFORMS setting (or batch setting) when input distance between two transforms equals to 1, or so named compact memory layout;  "v" separates a problem from a batch plus distances settings.

7:30:30x13:1:1 for non-standard strides. If strides differ from standard (in this particular case, the value of 13 is considered to be a standard, not 30), it will dump a full problem, which is <length>:<inputStride>:<outputStride> for each dimension go from the biggest dimension to smallest dimension. If there's also a batch setting, batch size plus input and output distances will appear at the end of problem description

fScale:x / bScale:x,  are DFTI_FORWARD_SCALE/DFTI_BACKWARD_SCALE settings and reflect the value provided by user. Default values of 1.0 for each setting are not printed.

 pack:perm is a DFTI_PACKED_FORMAT setting. Default value of CCE format is not printed.

 input:unaligned, is a check for input data alignment on a 64-byte boarder. If the data is aligned, this is not printed. Same is true for output memory. Case for out-of-place split complex (when DFTI_COMPLEX_STORAGE = DFTI_REAL_REAL) is not supported.

Else parameters that are allowed to be set from Intel(R) MKL FFT API will be printed if non-default values were used.

Some Limitations:

Because every call to a verbose-enabled function requires an output operation, the performance of the application may degrade with the verbose mode enabled.

Besides of this, MKL Verbose mode has the following limitations: 

  • Input values of parameters passed by reference are not printed if the values were changed by the function.  For example, if a LAPACK function is called with a workspace query, that is, the value of the lwork parameter equals -1 on input, the call description line prints the result of the query and not -1.
  • Return values of functions are not printed. For example, the value returned by the function ilaenv is not printed.
  • Floating-point scalars passed by reference are not printed.

Please see the MKL Developer Guide for more details about the verbose mode of MKL.

ВложениеРазмер
Иконка изображения a_0.jpg83.39 КБ
Для получения подробной информации о возможностях оптимизации компилятора обратитесь к нашему Уведомлению об оптимизации.
Возможность комментирования русскоязычного контента была отключена. Узнать подробнее.