Developer Reference

  • 0.10
  • 10/21/2020
  • Public Content
Contents

Two-stage Algorithm for Inspector-executor Sparse BLAS routines

You can use a two-stage algorithm for Inspector-executor Sparse BLAS routines which produce a sparse matrix. The applicable routines are:
In the two-stage algorithm:
  1. The first stage constructs the structure of the output matrix.
    • For the BSR/CSR storage formats, fill out
      rows_start
      and either
      rows_end
      or
      rowIndex
      arrays for 4 or 3 array.
    • For the CSC storage format, fill out
      cols_start
      and either
      cols_end
      or
      colIndex
      arrays for 4 or 3 array.
    This stage also allows the user to estimate memory required for the desired operation.
  2. The second stage constructs other arrays and performs the desired operation.
You can separate the calls for each stage. You can also perform the entire computation in a single call using the
sparse_request_t
parameter:
Values for
sparse_request_t
parameter
Value
Description
SPARSE_STAGE_NNZ_COUNT
In the first stage, the algorithm computes only the row (CSR/BSR format) or column (CSC format) pointer array of the matrix storage format. The computed number of non-zeroes in the output matrix helps to calculate the amount of memory required.
SPARSE_STAGE_FINALIZE_MULT
In the second stage, the algorithm computes the remaining column (CSR/BSR format) or row (CSC format) index and value arrays for the output matrix. Use this value only after calling the function with SPARSE_STAGE_NNZ_COUNT first.
SPARSE_STAGE_FULL_MULT
Combine the two stages by performing the entire computation in a single step.
This example uses the two-stage algorithm for mkl_sparse_sp2m routine with a matrix in CSR format:
First stage (
sparse_request_t
=
SPARSE_STAGE_NNZ_COUNT
)
  1. The algorithm calls the mkl_sparse_sp2m routine with the request parameter set to
    SPARSE_STAGE_NNZ_COUNT
    .
  2. The algorithm exports the computed
    rows_start
    and
    rows_end
    arrays using the mkl_sparse_x_export_csr routine.
  3. These arrays are used to calculate the number of non-zeroes (nnz) of the resulting output matrix.
Note that at this stage, the arrays related to column index and values for the output matrix have not been computed.
status = mkl_sparse_sp2m ( opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_NNZ_COUNT, &csrC); /* optional calculation of nnz of resulting output matrix for computing memory requirement */ status = mkl_sparse_x_export_csr ( csrC, &indexing, &rows, &cols, &rows_start, &rows_end, &col_indx, &values); MKL_INT nnz = rows_end[rows-1] - rows_start[0];
Second stage (
sparse_request_t
=
SPARSE_STAGE_FINALIZE_MULT
)
The algorithm computes the remaining storage arrays (related to column index and values for the output matrix) and performs the desired operation.
status = mkl_sparse_sp2m ( opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_FINALIZE_MULT, &csrC);
Alternatively, you can perform both operations in a single step:
Single stage operation (
sparse_request_t
=
SPARSE_STAGE_FULL_MULT
)
status = mkl_sparse_sp2m ( opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_FULL_MULT, &csrC);

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804