p?gemr2d
p?gemr2d
Copies a submatrix from one general rectangular matrix to another.
Syntax
void
psgemr2d
(
MKL_INT
*m
,
MKL_INT
*n
,
float
*a
,
MKL_INT
*ia
,
MKL_INT
*ja
,
MKL_INT
*desca
,
float
*b
,
MKL_INT
*ib
,
MKL_INT
*jb
,
MKL_INT
*descb
,
MKL_INT
*ictxt
);
void
pdgemr2d
(
MKL_INT
*m
,
MKL_INT
*n
,
double
*a
,
MKL_INT
*ia
,
MKL_INT
*ja
,
MKL_INT
*desca
,
double
*b
,
MKL_INT
*ib
,
MKL_INT
*jb
,
MKL_INT
*descb
,
MKL_INT
*ictxt
);
void
pcgemr2d
(
MKL_INT
*m
,
MKL_INT
*n
MKL_Complex8
*a
,
MKL_INT
*ia
,
MKL_INT
*ja
,
MKL_INT
*desca
,
MKL_Complex8
*b
,
MKL_INT
*ib
,
MKL_INT
*jb
,
MKL_INT
*descb
,
MKL_INT
*ictxt
);
void
pzgemr2d
(
MKL_INT
*m
,
MKL_INT
*n
,
MKL_Complex16
*a
,
MKL_INT
*ia
,
MKL_INT
*ja
,
MKL_INT
*desca
,
MKL_Complex16
*b
,
MKL_INT
*ib
,
MKL_INT
*jb
,
MKL_INT
*descb
,
MKL_INT
*ictxt
);
void
pigemr2d
(
MKL_INT
*m
,
MKL_INT
*n
,
MKL_INT
*a
,
MKL_INT
*ia
,
MKL_INT
*ja
,
MKL_INT
*desca
,
MKL_INT
*b
,
MKL_INT
*ib
,
MKL_INT
*jb
,
MKL_INT
*descb
,
MKL_INT
*ictxt
);
Include Files
- mkl_scalapack.h
Description
The
p?gemr2d
function
copies the indicated matrix or submatrix of A
to the indicated matrix or submatrix of B
. It provides a truly general copy from any block cyclicly-distributed matrix or submatrix to any other block cyclicly-distributed matrix or submatrix. With p?trmr2d
, these functions
are the only ones in the ScaLAPACK library which provide inter-context operations: they can take a matrix or submatrix A
in context A
(distributed over process grid A
) and copy it to a matrix or submatrix B
in context B
(distributed over process grid B
).There does not need to be a relationship between the two operand matrices or submatrices other than their global size and the fact that they are both legal block cyclicly-distributed matrices or submatrices. This means that they can, for example, be distributed across different process grids, have varying block sizes and differing matrix starting points, or be contained in different sized distributed matrices.
Take care when context
A
is disjoint from context B
. The general rules for which parameters need to be set are:- All calling processes must have the correctmandn.
- Processes in contextAmust correctly define all parameters describingA.
- Processes in contextBmust correctly define all parameters describingB.
- Processes which are not members of contextAmust passctxt_a= -1 and need not set other parameters describingA.
- Processes which are not members of contextBmust passctxt_b= -1 and need not set other parameters describingB.
Because of its generality,
p?gemr2d
can be used for many operations not usually associated with copy functions
. For instance, it can be used to a take a matrix on one process and distribute it across a process grid, or the reverse. If a supercomputer is grouped into a virtual parallel machine with a workstation, for instance, this function
can be used to move the matrix from the workstation to the supercomputer and back. In ScaLAPACK, it is called to copy matrices from a two-dimensional process grid to a one-dimensional process grid. It can be used to redistribute matrices so that distributions providing maximal performance can be used by various component libraries, as well.Note that this
function
requires an array descriptor with dtype_
= 1.Optimization Notice
|
---|
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
|
This notice covers the following instruction sets: SSE2, SSE4.2, AVX2, AVX-512.
Input Parameters
- m
- (global) The number of rows of matrixAto be copied (m≥0).
- n
- (global) The number of columns of matrixAto be copied (n≥0).
- a
- (local)Pointer into the local memory to array of sizelld_a*containing the source matrixLOCc(ja+n-1)A.
- ia,ja
- (global) The row and column indices in the arrayAindicating the first row and the first column, respectively, of the submatrix ofA) to copy. 1≤ia≤total_rows_in_a-m+1, 1≤ja≤total_columns_in_a-n+1.
- desca
- (global and local) array of sizedlen_. The array descriptor for the distributed matrixA.Onlydtype_a= 1 is supported, sodlen_= 9.If the calling process is not part of the context ofA,ctxt_amust be equal to -1.
- ib,jb
- (global) The row and column indices in the arrayBindicating the first row and the first column, respectively, of the submatrixBto which to copy the matrix. 1≤ib≤total_rows_in_b-m+1, 1≤jb≤total_columns_in_b-n+1.
- descb
- (global and local) array of sizedlen_. The array descriptor for the distributed matrixB.Onlydtype_b= 1 is supported, sodlen_= 9.If the calling process is not part of the context ofB,ctxt_bmust be equal to -1.
- ictxt
- (global).The context encompassing at least the union of all processes in contextAand contextB. All processes in the contextictxtmust call thisfunction, even if they do not own a piece of either matrix.
Output Parameters
- b
- Pointer into the local memory to array of sizelld_b*.LOCc(jb+n-1)Overwritten by the submatrix fromA.