Developer Reference

Contents

p?gemm

Computes a scalar-matrix-matrix product and adds the result to a scalar-matrix product for distributed matrices.

Syntax

void psgemm
(
const char
*transa
,
const char
*transb
,
const MKL_INT
*m
,
const MKL_INT
*n
,
const MKL_INT
*k
,
const float
*alpha
,
const float
*a
,
const MKL_INT
*ia
,
const MKL_INT
*ja
,
const MKL_INT
*desca
,
const float
*b
,
const MKL_INT
*ib
,
const MKL_INT
*jb
,
const MKL_INT
*descb
,
const float
*beta
,
float
*c
,
const MKL_INT
*ic
,
const MKL_INT
*jc
,
const MKL_INT
*descc
);
void pdgemm
(
const char
*transa
,
const char
*transb
,
const MKL_INT
*m
,
const MKL_INT
*n
,
const MKL_INT
*k
,
const double
*alpha
,
const double
*a
,
const MKL_INT
*ia
,
const MKL_INT
*ja
,
const MKL_INT
*desca
,
const double
*b
,
const MKL_INT
*ib
,
const MKL_INT
*jb
,
const MKL_INT
*descb
,
const double
*beta
,
double
*c
,
const MKL_INT
*ic
,
const MKL_INT
*jc
,
const MKL_INT
*descc
);
void pcgemm
(
const char
*transa
,
const char
*transb
,
const MKL_INT
*m
,
const MKL_INT
*n
,
const MKL_INT
*k
,
const MKL_Complex8
*alpha
,
const MKL_Complex8
*a
,
const MKL_INT
*ia
,
const MKL_INT
*ja
,
const MKL_INT
*desca
,
const MKL_Complex8
*b
,
const MKL_INT
*ib
,
const MKL_INT
*jb
,
const MKL_INT
*descb
,
const MKL_Complex8
*beta
,
MKL_Complex8
*c
,
const MKL_INT
*ic
,
const MKL_INT
*jc
,
const MKL_INT
*descc
);
void pzgemm
(
const char
*transa
,
const char
*transb
,
const MKL_INT
*m
,
const MKL_INT
*n
,
const MKL_INT
*k
,
const MKL_Complex16
*alpha
,
const MKL_Complex16
*a
,
const MKL_INT
*ia
,
const MKL_INT
*ja
,
const MKL_INT
*desca
,
const MKL_Complex16
*b
,
const MKL_INT
*ib
,
const MKL_INT
*jb
,
const MKL_INT
*descb
,
const MKL_Complex16
*beta
,
MKL_Complex16
*c
,
const MKL_INT
*ic
,
const MKL_INT
*jc
,
const MKL_INT
*descc
);
Include Files
  • mkl_pblas.h
Description
The
p?gemm
routines perform a matrix-matrix operation with general distributed matrices. The operation is defined as
sub(
C
) :=
alpha
*op(sub(
A
))*op(sub(
B
)) +
beta
*sub(
C
),
where:
op(
x
)
is one of
op(
x
) =
x
, or
op(
x
) =
x
'
,
alpha
and
beta
are scalars,
sub(
A
)=
A
(
ia
:
ia
+
m
-1,
ja
:
ja
+
k
-1)
,
sub(
B
)=
B
(
ib
:
ib
+
k
-1,
jb
:
jb
+
n
-1)
, and
sub(
C
)=
C
(
ic
:
ic
+
m
-1,
jc
:
jc
+
n
-1)
, are distributed matrices.
Input Parameters
transa
(global) Specifies the form of
op(sub(
A
))
used in the matrix multiplication:
if
transa
= 'N'
or
'n'
, then
op(sub(
A
)) = sub(
A
)
;
if
transa
= 'T'
or
't'
, then
op(sub(
A
)) = sub(
A
)'
;
if
transa
= 'C'
or
'c'
, then
op(sub(
A
)) = sub(
A
)'
.
transb
(global) Specifies the form of
op(sub(
B
))
used in the matrix multiplication:
if
transb
= 'N'
or
'n'
, then
op(sub(
B
)) = sub(
B
)
;
if
transb
= 'T'
or
't'
, then
op(sub(
B
)) = sub(
B
)'
;
if
transb
= 'C'
or
'c'
, then
op(sub(
B
)) = sub(
B
)'
.
m
(global) Specifies the number of rows of the distributed matrices
op(sub(
A
))
and
sub(
C
)
,
m
0.
n
(global) Specifies the number of columns of the distributed matrices
op(sub(
B
))
and
sub(
C
)
,
n
0.
The value of
n
must be at least zero.
k
(global) Specifies the number of columns of the distributed matrix
op(sub(
A
))
and the number of rows of the distributed matrix
op(sub(
B
))
.
The value of
k
must be greater than or equal to 0.
alpha
(global)
Specifies the scalar
alpha
.
When
alpha
is equal to zero, then the local entries of the arrays
a
and
b
corresponding to the entries of the submatrices
sub(
A
)
and
sub(
B
)
respectively need not be set on input.
a
(local)
Array, size
lld_a
by
kla
, where
kla
is
LOCc(
ja
+
k
-1)
when
transa
=
'N'
or
'n'
, and is
LOCq(
ja
+
m
-1)
otherwise. Before entry this array must contain the local pieces of the distributed matrix
sub(
A
)
.
ia
,
ja
(global) The row and column indices in the distributed matrix
A
indicating the first row and the first column of the submatrix
sub(
A
)
, respectively
desca
(global and local) array of dimension 9. The array descriptor of the distributed matrix
A
.
b
(local)
Array, size
lld_b
by
klb
, where
klb
is
LOCc(
jb
+
n
-1)
when
transb
=
'N'
or
'n'
, and is
LOCq(
jb
+
k
-1)
otherwise. Before entry this array must contain the local pieces of the distributed matrix
sub(
B
)
.
ib
,
jb
(global) The row and column indices in the distributed matrix
B
indicating the first row and the first column of the submatrix
sub(
B
)
, respectively
descb
(global and local) array of dimension 9. The array descriptor of the distributed matrix
B
.
beta
(global)
Specifies the scalar
beta
.
When
beta
is equal to zero, then
sub(
C
)
need not be set on input.
c
(local)
Array, size (
lld_a
,
LOCq(
jc
+
n
-1)
). Before entry this array must contain the local pieces of the distributed matrix
sub(
C
)
.
ic
,
jc
(global) The row and column indices in the distributed matrix
C
indicating the first row and the first column of the submatrix
sub(
C
)
, respectively
descc
(global and local) array of dimension 9. The array descriptor of the distributed matrix
C
.
Output Parameters
c
Overwritten by the
m
-by-
n
distributed matrix
alpha
*op(sub(
A
))*op(sub(
B
)) +
beta
*sub(
C
)
.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804