Developer Guide

Contents

OpenMP*
Threaded Functions and Problems

The following
Intel® oneAPI Math Kernel Library
function domains are threaded
with the OpenMP* technology
:
  • Direct sparse solver.
  • LAPACK.
    For a list of threaded routines, see LAPACK Routines.
  • Level1 and Level2 BLAS.
    For a list of threaded routines, see BLAS Level1 and Level2 Routines.
  • All Level 3 BLAS and all Sparse BLAS routines except Level 2 Sparse Triangular solvers.
  • All Vector Mathematics functions (except service functions).
  • FFT.
    For a list of FFT transforms that can be threaded, see Threaded FFT Problems.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.
Notice revision #20201201

LAPACK Routines

In this section,
?
stands for a precision prefix of
each
flavor of the respective routine and may have the value of
s, d, c
, or
z
.
The following LAPACK routines are threaded
with OpenMP*
:
  • Linear equations, computational routines:
    • Factorization:
      ?getrf, ?getrfnpi, ?gbtrf, ?potrf, ?pptrf, ?sytrf, ?hetrf, ?sptrf, ?hptrf
    • Solving:
      ?dttrsb, ?gbtrs, ?gttrs, ?pptrs, ?pbtrs, ?pttrs, ?sytrs, ?sptrs, ?hptrs, ?tptrs, ?tbtrs
  • Orthogonal factorization, computational routines:
    ?geqrf, ?ormqr, ?unmqr, ?ormlq, ?unmlq, ?ormql, ?unmql, ?ormrq, ?unmrq
  • Singular Value Decomposition, computational routines:
    ?gebrd, ?bdsqr
  • Symmetric Eigenvalue Problems, computational routines:
    ?sytrd, ?hetrd, ?sptrd, ?hptrd, ?steqr, ?stedc
    .
  • Generalized Nonsymmetric Eigenvalue Problems, computational routines:
    chgeqz/zhgeqz
    .
A number of other LAPACK routines, which are based on threaded LAPACK or BLAS routines, make effective use of
OpenMP*
parallelism:
?gesv, ?posv, ?gels, ?gesvd, ?syev, ?heev, cgegs/zgegs, cgegv/zgegv, cgges/zgges, cggesx/zggesx, cggev/zggev, cggevx/zggevx,
and so on.

Threaded BLAS Level1 and Level2 Routines

In the following list,
?
stands for a precision prefix of
each
flavor of the respective routine and may have the value of
s, d, c
, or
z
.
The following routines are threaded
with OpenMP*
:
  • Level1 BLAS:
    ?axpy, ?copy, ?swap, ddot/sdot, cdotc, drot/srot
  • Level2 BLAS:
    ?gemv, ?trsv, ?trmv, dsyr/ssyr, dsyr2/ssyr2, dsymv/ssymv

Threaded FFT Problems

The following characteristics of a specific problem determine whether your FFT computation may be threaded
with OpenMP*
:
  • rank
  • domain
  • size/length
  • precision (single or double)
  • placement (in-place or out-of-place)
  • strides
  • number of transforms
  • layout (for example, interleaved or split layout of complex data)
Most FFT problems are threaded. In particular, computation of multiple transforms in one call (number of transforms > 1) is threaded. Details of which transforms are threaded follow.
One-dimensional (1D) transforms
1D transforms are threaded in many cases.
1D complex-to-complex (c2c) transforms of size
N
using interleaved complex data layout are threaded under the following conditions depending on the architecture:
Architecture
Conditions
Intel® 64
N
is a power of 2,
log
2
(
N
) > 9, the transform is double-precision out-of-place, and input/output strides equal 1.
IA-32
N
is a power of 2,
log
2
(
N
) > 13, and the transform is single-precision.
N
is a power of 2,
log
2
(
N
) > 14, and the transform is double-precision.
Any
N
is composite,
log
2
(
N
) > 16, and input/output strides equal 1.
1D complex-to-complex transforms using split-complex layout are not threaded.
Multidimensional transforms
All multidimensional transforms on large-volume data are threaded.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.