Contents

ScaLAPACK Array Descriptors

ScaLAPACK uses two-dimensional block-cyclic data distribution as a layout for dense matrix computations. This distribution provides good work balance between available processors, and also allows use of BLAS Level 3 routines for optimal local computations. Information about the data distribution that is required to establish the mapping between each global matrix and its corresponding process and memory location is contained in the array called the
array descriptor
associated with each global matrix. The size of the array descriptor is denoted as
dlen_
.
Let
A
be a two-dimensional block cyclicly distributed matrix with the array descriptor array
desca
. The meaning of each array descriptor element depends on the type of the matrix
A
. The tables "Array descriptor for dense matrices" and "Array descriptor for narrow-band and tridiagonal matrices" describe the meaning of each element for the different types of matrices.
Array descriptor for dense matrices (
dlen_
=9)
Element Name
Stored in
Description
Element Index Number
dtype_a
desca
[
dtype_
]
Descriptor type ( =1 for dense matrices).
0
ctxt_a
desca
[
ctxt_
]
BLACS context handle for the process grid.
1
m_a
desca
[
m_
]
Number of rows in the global matrix
A
.
2
n_a
desca
[
n_
]
Number of columns in the global matrix
A
.
3
mb_a
desca
[
mb_
]
Row blocking factor.
4
nb_a
desca
[
nb_
]
Column blocking factor.
5
rsrc_a
desca
[
rsrc_
]
Process row over which the first row of the global matrix
A
is distributed.
6
csrc_a
desca
[
csrc_
]
Process column over which the first column of the global matrix
A
is distributed.
7
lld_a
desca
[
lld_
]
Leading dimension of the local matrix
A
.
8
Array descriptor for narrow-band and tridiagonal matrices (
dlen_
=7)
Element Name
Stored in
Description
Element Index Number
dtype_a
desca
[
dtype_
]
Descriptor type
  • dtype_a
    =501: 1-by-
    P
    grid,
  • dtype_a
    =502:
    P
    -by-1 grid.
0
ctxt_a
desca
[
ctxt_
]
BLACS context handle indicating the BLACS process grid over which the global matrix
A
is distributed. The context itself is global, but the handle (the integer value) can vary.
1
n_a
desca
[
n_
]
The size of the matrix dimension being distributed.
2
nb_a
desca
[
nb_
]
The blocking factor used to distribute the distributed dimension of the matrix
A
.
3
src_a
desca
[
src_
]
The process row or column over which the first row or column of the matrix
A
is distributed.
4
lld_a
desca
[
lld_
]
The leading dimension of the local matrix storing the local blocks of the distributed matrix
A
. The minimum value of
lld_a
depends on
dtype_a
.
  • dtype_a
    =501:
    lld_a
    max(size of undistributed dimension, 1),
  • dtype_a
    =502:
    lld_a
    max(
    nb_a
    , 1).
5
Not applicable
Reserved for future use.
6
Similar notations are used for different matrices. For example:
lld_b
is the leading dimension of the local matrix storing the local blocks of the distributed matrix
B
and
dtype_z
is the type of the global matrix
Z
.
The number of rows and columns of a global dense matrix that a particular process in a grid receives after data distributing is denoted by
LOC
r
()
and
LOC
c
()
, respectively. To compute these numbers, you can use the ScaLAPACK tool routine
numroc
.
After the block-cyclic distribution of global data is done, you may choose to perform an operation on a submatrix sub(
A
) of the global matrix
A
defined by the following 6 values (for dense matrices):
m
The number of rows of sub(
A
)
n
The number of columns of sub(
A
)
a
A pointer to the local matrix containing the entire global matrix A
ia
The row index of sub(
A
) in the global matrix
A
ja
The column index of sub(
A
) in the global matrix
A
desca
The array descriptor for the global matrix
A
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
This notice covers the following instruction sets: SSE2, SSE4.2, AVX2, AVX-512.
1

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reservered for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804