Contents

cblas_?gemm_pack

Performs scaling and packing of the matrix into the previously allocated buffer.

Syntax

void cblas_sgemm_pack
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_IDENTIFIER
identifier
,
const
CBLAS_TRANSPOSE
trans
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
float
alpha
,
const
float
*src
,
const
MKL_INT
ld
,
float
*dest
);
void cblas_dgemm_pack
(
const
CBLAS_LAYOUT
Layout
,
const
CBLAS_IDENTIFIER
identifier
,
const
CBLAS_TRANSPOSE
trans
,
const
MKL_INT
m
,
const
MKL_INT
n
,
const
MKL_INT
k
,
const
double
alpha
,
const
double
*src
,
const
MKL_INT
ld
,
double
*dest
);
Include Files
  • mkl.h
Description
The
cblas_?gemm_pack
routine is one of a set of related routines that enable use of an internal packed storage. Call
cblas_?gemm_pack
after you allocate a buffer whose size is given by
cblas_?gemm_pack_getsize
. The
cblas_?gemm_pack
routine scales the identified matrix by alpha and packs it into the buffer allocated previously.
Do not copy the packed matrix to a different address because the internal implementation depends on the alignment of internally-stored metadata.
The
cblas_?gemm_pack
routine performs this operation:
dest
:=
alpha
*op(
src
)
as part of the computation
C
:=
alpha
*op(
A
)*op(
B
) +
beta
*
C
where:

    op(
    X
    )
    is one of the operations
    op(
    X
    ) =
    X
    ,
    op(
    X
    ) =
    X
    T
    , or
    op(
    X
    ) =
    X
    H
    ,

    alpha
    and
    beta
    are scalars,

    src
    is a matrix,

    A
    ,
    B
    , and
    C
    are matrices

    op(
    src
    )
    is an
    m
    -by-
    k
    matrix if
    identifier
    =
    CblasAMatrix
    ,

    op(
    src
    )
    is a
    k
    -by-
    n
    matrix if
    identifier
    =
    CblasBMatrix
    ,

    dest
    is an internal packed storage buffer.

You must use the same value of the
Layout
parameter for the entire sequence of related
cblas_?gemm_pack
and
cblas_?gemm_compute
calls.
For best performance, use the same number of threads for packing and for computing.
If packing for both
A
and
B
matrices, you must use the same number of threads for packing
A
as for packing
B
.
Input Parameters
Layout
Specifies whether two-dimensional array storage is row-major (
CblasRowMajor
) or column-major (
CblasColMajor
).
identifier
Specifies which matrix is to be packed:
If
identifier
=
CblasAMatrix
, the routine allocates storage to pack matrix
A
.
If
identifier
=
CblasBMatrix
, the routine allocates storage to pack matrix
B
.
trans
Specifies the form of
op(
src
)
used in the packing:
If
trans
=
CblasNoTrans
 
op(
src
) =
src
.
If
trans
=
CblasTrans
 
op(
src
) =
src
T
.
If
trans
=
CblasConjTrans
 
op(
src
) =
src
H
.
m
Specifies the number of rows of the matrix
op(
A
)
and of the matrix
C
. The value of
m
must be at least zero.
n
Specifies the number of columns of the matrix
op(
B
)
and the number of columns of the matrix
C
. The value of
n
must be at least zero.
k
Specifies the number of columns of the matrix
op(
A
)
and the number of rows of the matrix
op(
B
)
. The value of
k
must be at least zero.
alpha
Specifies the scalar
alpha
.
src
Array:
identifier
=
CblasAMatrix
identifier
=
CblasBMatrix
trans
=
CblasNoTrans
trans
=
CblasTrans
or
trans
=
CblasConjTrans
trans
=
CblasNoTrans
trans
=
CblasTrans
or
trans
=
CblasConjTrans
Layout
=
CblasColMajor
Size
ld
*
k
.
Before entry, the leading
m
-by-
k
part of the array
src
must contain the matrix
A
.
Size
ld
*
m
.
Before entry, the leading
k
-by-
m
part of the array
src
must contain the matrix
A
.
Size
ld
*
n
.
Before entry, the leading
k
-by-
n
part of the array
src
must contain the matrix
B
.
Size
ld
*
k
.
Before entry, the leading
n
-by-
k
part of the array
src
must contain the matrix
B
.
Layout
=
CblasRowMajor
Size
ld
*
m
.
Before entry, the leading
k
-by-
m
part of the array
src
must contain the matrix
A
.
Size
ld
*
k
.
Before entry, the leading
m
-by-
k
part of the array
src
must contain the matrix
A
.
Size
ld
*
k
.
Before entry, the leading
n
-by-
k
part of the array
src
must contain the matrix
B
.
Size
ld
*
n
.
Before entry, the leading
k
-by-
n
part of the array
src
must contain the matrix
B
.
ld
Specifies the leading dimension of
src
as declared in the calling (sub)program.
identifier
=
CblasAMatrix
identifier
=
CblasBMatrix
trans
=
CblasNoTrans
trans
=
CblasTrans
or
trans
=
CblasConjTrans
trans
=
CblasNoTrans
trans
=
CblasTrans
or
trans
=
CblasConjTrans
Layout
=
CblasColMajor
ld
must be at least
max(1,
m
)
.
ld
must be at least
max(1,
k
)
.
ld
must be at least
max(1,
k
)
.
ld
must be at least
max(1,
n
)
.
Layout
=
CblasRowMajor
ld
must be at least
max(1,
k
)
.
ld
must be at least
max(1,
m
)
.
ld
must be at least
max(1,
n
)
.
ld
must be at least
max(1,
k
)
.
dest
Scaled and packed internal storage buffer.
Output Parameters
dest
Overwritten by the matrix
alpha
*op(
src
)
.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reservered for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804