Developer Reference

Contents

Two-stage Algorithm in Inspector-Executor Sparse BLAS Routines

You can use a two-stage algorithm in Inspector-executor Sparse BLAS routines which produce a sparse matrix. The applicable routines are:
The two-stage algorithm allows you to split computations into stages. The main purpose of the splitting is to provide an estimate for the memory required for the output prior to allocating the largest part of the memory (for the indices and values of the non-zero elements). Additionally, the two-stage approach extends the functionality and allows more complex usage models.
The multistage approach currently does not allow you to allocate memory for the output matrix outside
oneMKL
.
In the two-stage algorithm:
  1. The first stage allocates data which is necessary for the memory estimation (arrays
    rows_start
    /
    rows_end
    or
    cols_start
    /
    cols_end
    depending on the format, (see Sparse Matrix Storage Formats) and computes the number of entries or the full structure of the matrix.
    The format of the output is decided internally but can be checked using the export functionality
    mkl_sparse_?_export_<format>
    .
  2. The second stage allocates data and computes column or row indices (depending on the format) of non-zero elements and/or values of the output matrix.
Specifying the stage for execution is supported through the
sparse_request_t
parameter in the API with the following options:
Values for
sparse_request_t
parameter
Value
Description
SPARSE_STAGE_NNZ_COUNT
Allocates and computes only the
rows_start
/
rows_end
(CSR/BSR format) or
cols_start
/
cols_end
(CSC format) arrays for the output matrix. After this stage, by calling
mkl_sparse_?_export_<format>
, you can obtain the number of non-zeros in the output matrix and calculate the amount of memory required for the output matrix.
SPARSE_STAGE_FINALIZE_MULT_NO_VAL
Allocates and computes row/column indices provided that
rows_start
/
rows_end
or
cols_start
/
cols_end
have already been computed in a prior call with the request
SPARSE_STAGE_NNZ_COUNT
. The values of the output matrix are not computed.
SPARSE_STAGE_FINALIZE_MULT
Depending on the state of the output matrix
C
on entry to the routine, this stage does one of the following:
  • Allocates and computes row/column indices and values of nonzero elements, if only
    rows_start
    /
    rows_end
    or
    cols_start
    /
    cols_end
    are present
  • allocates and computes values of nonzero elements, if
    rows_start
    /
    rows_end
    or
    cols_start
    /
    cols_end
    and row/column indices of non-zero elements are present
SPARSE_STAGE_FULL_MULT_NO_VAL
Allocates and computes the output matrix structure in a single step. The values of the output matrix are not computed.
SPARSE_STAGE_FULL_MULT
Allocates and computes the entire output matrix (structure and values) in a single step.
The example below shows how you can use the two-stage approach for estimating the memory requirements for the output matrix in CSR format:
First stage (
sparse_request_t
=
SPARSE_STAGE_NNZ_COUNT
)
  1. The routine mkl_sparse_sp2m is called with the request parameter
    SPARSE_STAGE_NNZ_COUNT
    .
  2. The arrays
    rows_start
    and
    rows_end
    are exported using the mkl_sparse_x_export_csr routine.
  3. These arrays are used to calculate the number of non-zeros (nnz) of the resulting output matrix.
Note that by the end of the first stage, the arrays associated with column indices and values of the output matrix have not been allocated or computed yet.
sparse_matrix_t csrC = NULL; status = mkl_sparse_sp2m (opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_NNZ_COUNT, &csrC); /* optional calculation of nnz in the output matrix for getting a memory estimate */ status = mkl_sparse_?_export_csr (csrC, &indexing, &nrows, &ncols, &rows_start, &rows_end, &col_indx, &values); MKL_INT nnz = rows_end[nrows-1] - rows_start[0];
Second stage (
sparse_request_t
=
SPARSE_STAGE_FINALIZE_MULT
)
This stage allocates and computes the remaining output arrays (associated with column indices and values of output matrix entries) and completes the matrix-matrix multiplication.
status = mkl_sparse_sp2m (opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_FINALIZE_MULT, &csrC);
When the two-stage approach is not needed, you can perform both stages in a single call:
Single stage operation (
sparse_request_t
=
SPARSE_STAGE_FULL_MULT
)
status = mkl_sparse_sp2m (opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_FULL_MULT, &csrC);

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.