OpenMP* Threaded Functions and Problems

The following Intel MKL function domains are threaded with the OpenMP* technology:

  • Direct sparse solver.

  • LAPACK.

    For a list of threaded routines, see LAPACK Routines.

  • Level1 and Level2 BLAS.

    For a list of threaded routines, see BLAS Level1 and Level2 Routines.

  • All Level 3 BLAS and all Sparse BLAS routines except Level 2 Sparse Triangular solvers.

  • All Vector Mathematics functions (except service functions).

  • FFT.

    For a list of FFT transforms that can be threaded, see Threaded FFT Problems.

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

LAPACK Routines

In this section, ? stands for a precision prefix of each flavor of the respective routine and may have the value of s, d, c, or z.

The following LAPACK routines are threaded with OpenMP*:

  • Linear equations, computational routines:
    • Factorization: ?getrf, ?getrfnpi, ?gbtrf, ?potrf, ?pptrf, ?sytrf, ?hetrf, ?sptrf, ?hptrf
    • Solving: ?dttrsb, ?gbtrs, ?gttrs, ?pptrs, ?pbtrs, ?pttrs, ?sytrs, ?sptrs, ?hptrs, ?tptrs, ?tbtrs
  • Orthogonal factorization, computational routines:
    ?geqrf, ?ormqr, ?unmqr, ?ormlq, ?unmlq, ?ormql, ?unmql, ?ormrq, ?unmrq
  • Singular Value Decomposition, computational routines:
    ?gebrd, ?bdsqr
  • Symmetric Eigenvalue Problems, computational routines:
    ?sytrd, ?hetrd, ?sptrd, ?hptrd, ?steqr, ?stedc
    .
  • Generalized Nonsymmetric Eigenvalue Problems, computational routines:
    chgeqz/zhgeqz
    .

A number of other LAPACK routines, which are based on threaded LAPACK or BLAS routines, make effective use of OpenMP* parallelism:
?gesv, ?posv, ?gels, ?gesvd, ?syev, ?heev, cgegs/zgegs, cgegv/zgegv, cgges/zgges, cggesx/zggesx, cggev/zggev, cggevx/zggevx, and so on.

Threaded BLAS Level1 and Level2 Routines

In the following list, ? stands for a precision prefix of each flavor of the respective routine and may have the value of s, d, c, or z.

The following routines are threaded with OpenMP* for Intel® Core™2 Duo and Intel® Core™ i7 processors:

  • Level1 BLAS:
    ?axpy, ?copy, ?swap, ddot/sdot, cdotc, drot/srot
  • Level2 BLAS:
    ?gemv, ?trsv, ?trmv, dsyr/ssyr, dsyr2/ssyr2, dsymv/ssymv

Threaded FFT Problems

The following characteristics of a specific problem determine whether your FFT computation may be threaded with OpenMP*:

  • rank
  • domain
  • size/length
  • precision (single or double)
  • placement (in-place or out-of-place)
  • strides
  • number of transforms
  • layout (for example, interleaved or split layout of complex data)

Most FFT problems are threaded. In particular, computation of multiple transforms in one call (number of transforms > 1) is threaded. Details of which transforms are threaded follow.

One-dimensional (1D) transforms

1D transforms are threaded in many cases.

1D complex-to-complex (c2c) transforms of size N using interleaved complex data layout are threaded under the following conditions depending on the architecture:

Architecture

Conditions

Intel® 64

N is a power of 2, log2(N) > 9, the transform is double-precision out-of-place, and input/output strides equal 1.

IA-32

N is a power of 2, log2(N) > 13, and the transform is single-precision.

N is a power of 2, log2(N) > 14, and the transform is double-precision.

Any

N is composite, log2(N) > 16, and input/output strides equal 1.

1D complex-to-complex transforms using split-complex layout are not threaded.

Multidimensional transforms

All multidimensional transforms on large-volume data are threaded.

For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)