p?trmr2d
Copies a submatrix from one trapezoidal matrix to another.
Syntax

void pstrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
void pdtrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
void pctrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex8 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
void pztrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , MKL_INT *nrhs , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_Complex16 *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
void pitrmr2d (char *uplo , char *diag , MKL_INT *m , MKL_INT *n , MKL_INT *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , MKL_INT *b , MKL_INT *ib , MKL_INT *jb , MKL_INT *descb , MKL_INT *ictxt );
Include Files
 mkl_scalapack.h
Description
The p?trmr2dfunction copies the indicated matrix or submatrix of A to the indicated matrix or submatrix of B. It provides a truly general copy from any block cycliclydistributed matrix or submatrix to any other block cycliclydistributed matrix or submatrix. With p?gemr2d, these functions are the only ones in the ScaLAPACK library which provide intercontext operations: they can take a matrix or submatrix A in context A (distributed over process grid A) and copy it to a matrix or submatrix B in context B (distributed over process grid B).
The p?trmr2dfunction assumes the matrix or submatrix to be trapezoidal. Only the upper or lower part is copied, and the other part is unchanged.
There does not need to be a relationship between the two operand matrices or submatrices other than their global size and the fact that they are both legal block cycliclydistributed matrices or submatrices. This means that they can, for example, be distributed across different process grids, have varying block sizes and differing matrix starting points, or be contained in different sized distributed matrices.
Take care when context A is disjoint from context B. The general rules for which parameters need to be set are:

All calling processes must have the correct m and n.

Processes in context A must correctly define all parameters describing A.

Processes in context B must correctly define all parameters describing B.

Processes which are not members of context A must pass ctxt_a = 1 and need not set other parameters describing A.

Processes which are not members of contextB must pass ctxt_b = 1 and need not set other parameters describing B.
Because of its generality, p?trmr2d can be used for many operations not usually associated with copy functions. For instance, it can be used to a take a matrix on one process and distribute it across a process grid, or the reverse. If a supercomputer is grouped into a virtual parallel machine with a workstation, for instance, this function can be used to move the matrix from the workstation to the supercomputer and back. In ScaLAPACK, it is called to copy matrices from a twodimensional process grid to a onedimensional process grid. It can be used to redistribute matrices so that distributions providing maximal performance can be used by various component libraries, as well.
Note that this function requires an array descriptor with dtype_ = 1.
Optimization Notice 

Intel's compilers may or may not optimize to the same degree for nonIntel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessordependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 
Input Parameters
 uplo

(global) Specifies whether to copy the upper or lower part of the matrix or submatrix.
uplo = 'U'
Copy the upper triangular part.
uplo = 'L'
Copy the lower triangular part.
 diag

(global) Specifies whether to copy the diagonal of the matrix or submatrix.
diag = 'U'
Do not copy the diagonal.
diag = 'N'
Copy the diagonal.
 m

(global) The number of rows of matrix A to be copied (m≥0).
 n

(global) The number of columns of matrix A to be copied (n≥0).
 a

(local)
Pointer into the local memory to array of size lld_a* LOCc(ja+n1) containing the source matrix A.
 ia, ja

(global) The row and column indices in the array A indicating the first row and the first column, respectively, of the submatrix of A) to copy. 1 ≤ia≤total_rows_in_a  m +1, 1 ≤ja≤total_columns_in_a  n +1.
 desca

(global and local) array of size dlen_. The array descriptor for the distributed matrix A.
Only dtype_a = 1 is supported, so dlen_ = 9.
If the calling process is not part of the context of A, ctxt_a must be equal to 1.
 ib, jb

(global) The row and column indices in the array B indicating the first row and the first column, respectively, of the submatrix B to which to copy the matrix. 1 ≤ib≤total_rows_in_b  m +1, 1 ≤jb≤total_columns_in_b  n +1.
 descb

(global and local) array of size dlen_. The array descriptor for the distributed matrix B.
Only dtype_b = 1 is supported, so dlen_ = 9.
If the calling process is not part of the context of B, ctxt_b must be equal to 1.
 ictxt

(global).
The context encompassing at least the union of all processes in context A and context B. All processes in the context ictxt must call this function, even if they do not own a piece of either matrix.
Output Parameters
 b

Pointer into the local memory to array of size lld_b*LOCc(jb+n1).
Overwritten by the submatrix from A.