Developer Reference

Contents

Examples of Using OpenMP* Threading for FFT Computation

The following sample program shows how to employ internal OpenMP* threading in
Intel® MKL
for FFT computation.
To specify the number of threads inside
Intel® MKL
, use the following settings:
set
MKL_NUM_THREADS = 1
for one-threaded mode;
set
MKL_NUM_THREADS = 4
for multi-threaded mode.
Using
Intel® MKL
Internal Threading Mode (C Example)
  /* C99 example */ #include "mkl_dfti.h" float data[200][100]; DFTI_DESCRIPTOR_HANDLE fft = NULL; MKL_LONG dim_sizes[2] = {200, 100}; /* ...put values into data[i][j] 0<=i<=199, 0<=j<=99 */ DftiCreateDescriptor(&fft, DFTI_SINGLE, DFTI_REAL, 2, dim_sizes); DftiCommitDescriptor(fft); DftiComputeForward(fft, data); DftiFreeDescriptor(&fft);  
set
MKL_NUM_THREADS = 1
for
Intel® MKL
to work in the single-threaded mode (recommended);
set
OMP_NUM_THREADS = 4
for the customer program to work in the multi-threaded mode.
Using Parallel Mode with Multiple Descriptors Initialized in a Parallel Region
Note that in this example, the program can be transformed to become single-threaded at the customer level but using parallel mode within
Intel® MKL
. To achieve this, you need to set the parameter
DFTI_NUMBER_OF_TRANSFORMS = 4
and to set the corresponding parameter
DFTI_INPUT_DISTANCE = 5000.
/* C99 example */ #include "mkl_dfti.h" #include <omp.h> #define ARRAY_LEN(a) sizeof(a)/sizeof(a[0]) // 4 OMP threads, each does 2D FFT 50x100 points MKL_Complex8 data[4][50][100]; int nth = ARRAY_LEN(data); MKL_LONG dim_sizes[2] = { ARRAY_LEN(data[0]), ARRAY_LEN(data[0][0]) }; /* {50, 100} */ int th; /* ...put values into data[i][j][k] 0<=i<=3, 0<=j<=49, 0<=k<=99 */ // assume data is initialized and do 2D FFTs #pragma omp parallel for shared(dim_sizes, data) for (th = 0; th < nth; ++th) { DFTI_DESCRIPTOR_HANDLE myFFT = NULL; DftiCreateDescriptor(&myFFT, DFTI_SINGLE, DFTI_COMPLEX, 2, dim_sizes); DftiCommitDescriptor(myFFT); DftiComputeForward(myFFT, data[th]); DftiFreeDescriptor(&myFFT); }
set
MKL_NUM_THREADS
= 1 for
Intel® MKL
to work in the single-threaded mode (obligatory);
set
OMP_NUM_THREADS
= 4 for the customer program to work in the multi-threaded mode.
Using Parallel Mode with Multiple Descriptors Initialized in One Thread
/* C99 example */ #include "mkl_dfti.h" #include <omp.h># define ARRAY_LEN(a) sizeof(a)/sizeof(a[0]) // 4 OMP threads, each does 2D FFT 50x100 points MKL_Complex8 data[4][50][100]; int nth = ARRAY_LEN(data); MKL_LONG dim_sizes[2] = { ARRAY_LEN(data[0]), ARRAY_LEN(data[0][0]) }; /* {50, 100} */ DFTI_DESCRIPTOR_HANDLE FFT[ARRAY_LEN(data)]; int th; /* ...put values into data[i][j][k] 0<=i<=3, 0<=j<=49, 0<=k<=99 */ for (th = 0; th < nth; ++th) DftiCreateDescriptor(&FFT[th], DFTI_SINGLE, DFTI_COMPLEX, 2, dim_sizes); for (th = 0; th < nth; ++th) DftiCommitDescriptor(FFT[th]); // assume data is initialized and do 2D FFTs #pragma omp parallel for shared(FFT, data) for (th = 0; th < nth; ++th) DftiComputeForward(FFT[th], data[th]); for (th = 0; th < nth; ++th) DftiFreeDescriptor(&FFT[th]);
The following Example
“Using Parallel Mode with a Common Descriptor”
illustrates a parallel customer program with a common descriptor used in several threads.
Using Parallel Mode with a Common Descriptor
#include "mkl_dfti.h" #include <omp.h> #define ARRAY_LEN(a) sizeof(a)/sizeof(a[0]) // 4 OMP threads, each does 2D FFT 50x100 points MKL_Complex8 data[4][50][100]; int nth = ARRAY_LEN(data); MKL_LONG len[2] = {ARRAY_LEN(data[0]), ARRAY_LEN(data[0][0])}; DFTI_DESCRIPTOR_HANDLE FFT; int th; /* ...put values into data[i][j][k] 0<=i<=3, 0<=j<=49, 0<=k<=99 */ DftiCreateDescriptor(&FFT, DFTI_SINGLE, DFTI_COMPLEX, 2, len); DftiCommitDescriptor(FFT); // assume data is initialized and do 2D FFTs #pragma omp parallel for shared(FFT, data) for (th = 0; th < nth; ++th) DftiComputeForward(FFT, data[th]); DftiFreeDescriptor(&FFT);

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804