Developer Guide

Contents

Coding Techniques

This section discusses coding techniques to improve performance on processors based on supported architectures.
To improve performance, properly align arrays in your code. Additional conditions can improve performance for specific function domains.

Data Alignment and Leading Dimensions

To improve performance of your application that calls
Intel® oneAPI Math Kernel Library
, align your arrays on 64-byte boundaries and ensure that the leading dimensions of the arrays are divisible by 64/
element_size
, where
element_size
is the number of bytes for the matrix elements (4 for single-precision real, 8 for double-precision real and single-precision complex, and 16 for double-precision complex) . For more details, see Example of Data Alignment.
For Intel® Xeon Phi™ processor x200 product family, codenamed Knights Landing, align your matrices on 4096-byte boundaries and set the leading dimension to the following integer expression:
(((
n
*
element_size
+ 511) / 512) * 512 + 64) /
element_size
,
where
n
is the matrix dimension along the leading dimension.

LAPACK Packed Routines

The routines with the names that contain the letters
HP, OP, PP, SP, TP, UP
in the matrix type and storage position (the second and third letters respectively) operate on the matrices in the packed format (see LAPACK "Routine Naming Conventions" sections in the
Intel® oneAPI Math Kernel Library
Developer Reference). Their functionality is strictly equivalent to the functionality of the unpacked routines with the names containing the letters
HE, OR, PO, SY, TR, UN
in the same positions, but the performance is significantly lower.
If the memory restriction is not too tight, use an unpacked routine for better performance. In this case, you need to allocate
N
2
/2 more memory than the memory required by a respective packed routine, where
N
is the problem size (the number of equations).
For example, to speed up solving a symmetric eigenproblem with an expert driver, use the unpacked routine:
call dsyevx(jobz, range, uplo, n, a, lda, vl, vu, il, iu, abstol, m, w, z, ldz, work, lwork, iwork, ifail, info)
where
a
is the dimension
lda
-by-
n
, which is at least
N
2
elements,
instead of the packed routine:
call dspevx(jobz, range, uplo, n, ap, vl, vu, il, iu, abstol, m, w, z, ldz, work, iwork, ifail, info)
where
ap
is the dimension
N
*(
N
+1)/2.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.
Notice revision #20201201

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.