Developer Reference

  • 2021.1
  • 12/04/2020
  • Public Content
Contents

cblas_?gemm_pack

Performs scaling and packing of the matrix into the previously allocated buffer.

Syntax

void cblas_sgemm_pack
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_IDENTIFIER
identifier
,
const
CBLAS_TRANSPOSE
trans
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
float
alpha
,
const
float
*src
,
const
MKL_INT
ld
,
float
*dest
);
void cblas_dgemm_pack
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_IDENTIFIER
identifier
,
const
CBLAS_TRANSPOSE
trans
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
double
alpha
,
const
double
*src
,
const
MKL_INT
ld
,
double
*dest
);
Include Files
  • mkl.h
Description
The
cblas_?gemm_pack
routine is one of a set of related routines that enable use of an internal packed storage. Call
cblas_?gemm_pack
after you allocate a buffer whose size is given by
cblas_?gemm_pack_getsize
. The
cblas_?gemm_pack
routine scales the identified matrix by alpha and packs it into the buffer allocated previously.
Do not copy the packed matrix to a different address because the internal implementation depends on the alignment of internally-stored metadata.
The
cblas_?gemm_pack
routine performs this operation:
dest
:=
alpha
*op(
src
)
as part of the computation
C
:=
alpha
*op(
A
)*op(
B
) +
beta
*
C
where:

    op(
    X
    )
    is one of the operations
    op(
    X
    ) =
    X
    ,
    op(
    X
    ) =
    X
    T
    , or
    op(
    X
    ) =
    X
    H
    ,

    alpha
    and
    beta
    are scalars,

    src
    is a matrix,

    A
    ,
    B
    , and
    C
    are matrices

    op(
    src
    )
    is an
    m
    -by-
    k
    matrix if
    identifier
    =
    CblasAMatrix
    ,

    op(
    src
    )
    is a
    k
    -by-
    n
    matrix if
    identifier
    =
    CblasBMatrix
    ,

    dest
    is an internal packed storage buffer.

You must use the same value of the
Layout
parameter for the entire sequence of related
cblas_?gemm_pack
and
cblas_?gemm_compute
calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both
A
and
B
matrices, you must use the same number of threads for packing
A
as for packing
B
.
Input Parameters
Layout
Specifies whether two-dimensional array storage is row-major (
CblasRowMajor
) or column-major (
CblasColMajor
).
identifier
Specifies which matrix is to be packed:
If
identifier
=
CblasAMatrix
, the routine allocates storage to pack matrix
A
.
If
identifier
=
CblasBMatrix
, the routine allocates storage to pack matrix
B
.
trans
Specifies the form of
op(
src
)
used in the packing:
If
trans
=
CblasNoTrans
 
op(
src
) =
src
.
If
trans
=
CblasTrans
 
op(
src
) =
src
T
.
If
trans
=
CblasConjTrans
 
op(
src
) =
src
H
.
m
Specifies the number of rows of the matrix
op(
A
)
and of the matrix
C
. The value of
m
must be at least zero.
n
Specifies the number of columns of the matrix
op(
B
)
and the number of columns of the matrix
C
. The value of
n
must be at least zero.
k
Specifies the number of columns of the matrix
op(
A
)
and the number of rows of the matrix
op(
B
)
. The value of
k
must be at least zero.
alpha
Specifies the scalar
alpha
.
src
Array:
identifier
=
CblasAMatrix
identifier
=
CblasBMatrix
trans
=
CblasNoTrans
trans
=
CblasTrans
or
trans
=
CblasConjTrans
trans
=
CblasNoTrans
trans
=
CblasTrans
or
trans
=
CblasConjTrans
Layout
=
CblasColMajor
Size
ld
*
k
.
Before entry, the leading
m
-by-
k
part of the array
src
must contain the matrix
A
.
Size
ld
*
m
.
Before entry, the leading
k
-by-
m
part of the array
src
must contain the matrix
A
.
Size
ld
*
n
.
Before entry, the leading
k
-by-
n
part of the array
src
must contain the matrix
B
.
Size
ld
*
k
.
Before entry, the leading
n
-by-
k
part of the array
src
must contain the matrix
B
.
Layout
=
CblasRowMajor
Size
ld
*
m
.
Before entry, the leading
k
-by-
m
part of the array
src
must contain the matrix
A
.
Size
ld
*
k
.
Before entry, the leading
m
-by-
k
part of the array
src
must contain the matrix
A
.
Size
ld
*
k
.
Before entry, the leading
n
-by-
k
part of the array
src
must contain the matrix
B
.
Size
ld
*
n
.
Before entry, the leading
k
-by-
n
part of the array
src
must contain the matrix
B
.
ld
Specifies the leading dimension of
src
as declared in the calling (sub)program.
identifier
=
CblasAMatrix
identifier
=
CblasBMatrix
trans
=
CblasNoTrans
trans
=
CblasTrans
or
trans
=
CblasConjTrans
trans
=
CblasNoTrans
trans
=
CblasTrans
or
trans
=
CblasConjTrans
Layout
=
CblasColMajor
ld
must be at least
max(1,
m
)
.
ld
must be at least
max(1,
k
)
.
ld
must be at least
max(1,
k
)
.
ld
must be at least
max(1,
n
)
.
Layout
=
CblasRowMajor
ld
must be at least
max(1,
k
)
.
ld
must be at least
max(1,
m
)
.
ld
must be at least
max(1,
n
)
.
ld
must be at least
max(1,
k
)
.
dest
Scaled and packed internal storage buffer.
Output Parameters
dest
Overwritten by the matrix
alpha
*op(
src
)
.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.