Developer Reference

  • 0.9
  • 09/09/2020
  • Public Content
Contents

Graph Operations

The graph API provides optimized kernels for the following computationally intense routines:
Routine
Description
Compute a (masked) matrix-vector product
Compute a (masked) vector-matrix product
Compute a (masked) matrix-matrix product
Compute a (masked) transpose of a matrix
Graph operations (except
mkl_graph_transpose
) support the following modes:
  • Single-stage mode. Single-stage execution computes the output object in a single call to a graph operation with an appropriate value for the parameter of type
    mkl_graph_request_t
    . See Graph API Glossary for a list of all possible options.
    If the output object is sparse and the size of the corresponding arrays is likely not known in advance, the memory for the output object will be allocated inside the graph operation and can be deallocated only by calling an appropriate
    mkl_graph_<object>_destroy
    routine. To allocate all memory for the output on the user’s side, use multistage execution instead.
  • Multistage mode. Multistage execution constructs the output object over several calls to a graph operation, with each call requesting a specific stage. Unlike the single-stage mode, multistage execution allows you to allocate all memory for the output object. Only temporary memory will be allocated internally inside the graph routine. You must pass pointers to the allocations by calling an
    mkl_graph_<object>_set_<format>
    routine before each stage. These calls also specify the format of the final output object. The stage is specified through the parameter of type
    mkl_graph_request_t
    . See Graph API Glossary for a list of all possible options.
For choosing the best (performance-wise) format for the output, you can specify a method to be used for computations with an appropriate value for the parameter of type
mkl_graph_method_t
. For each graph operation which supports it, a desirable output format is described for a given configuration of input arguments. If you specify a format which is not considered to be the best inside the graph operation, your specified format will still be used internally.
As an example, consider computing a non-masked matrix-matrix product using
mkl_graph_mxm
in the multistage mode. Assume also that you want the output in CSR format (which is a preferred choice if both input matrices are also in CSR and the Gustavson algorithm is set for the method). Then you can have the following workflow shown in pseudo-code:
// Prepare the input matrices A and B. // Create an empty matrix object for the output. mkl_graph_matrix_create(&C) // Allocate a rows_start buffer of chosen type for the output. // Set the user-allocated rows_start in the output matrix object. mkl_graph_matrix_set_csr(C, nrows, ncols, rows_start, rows_start_type, NULL, …) // Fill rows_start for the output. mkl_graph_mxm(C,…, A, B, …, MKL_GRAPH_REQUEST_FILL_NNZ, …) // Use rows_start to deduce the number of nonzero entries nnz. // Allocate buffers for the column indices and values to hold nnz entries of the desired // types. // Set the allocated buffers for column indices and values in the output matrix object. mkl_graph_matrix_set_csr(C, …, col_indx, col_indx_type, values, values_type) // Fill buffers col_indx and values with calculated column indices and values mkl_graph_mxm(C, …, A, B, …, MKL_GRAPH_REQUEST_FILL_ENTRIES, …)
For full working code using multistage mode, refer to
graphc_mxm_multistage.c
in the examples for graph functionality.
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
This notice covers the following instruction sets: SSE2, SSE4.2, AVX2, AVX-512.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804