Developer Reference

  • 2021.1
  • 12/04/2020
  • Public Content
Contents

Examples of Using OpenMP* Threading for FFT Computation

The following sample program shows how to employ internal OpenMP* threading in
Intel® oneAPI Math Kernel Library
for FFT computation.
To specify the number of threads inside
Intel® oneAPI Math Kernel Library
, use the following settings:
set
MKL_NUM_THREADS = 1
for one-threaded mode;
set
MKL_NUM_THREADS = 4
for multi-threaded mode.
Using
oneMKL
Internal Threading Mode (C Example)
  /* C99 example */ #include "mkl_dfti.h" float data[200][100]; DFTI_DESCRIPTOR_HANDLE fft = NULL; MKL_LONG dim_sizes[2] = {200, 100}; /* ...put values into data[i][j] 0<=i<=199, 0<=j<=99 */ DftiCreateDescriptor(&fft, DFTI_SINGLE, DFTI_REAL, 2, dim_sizes); DftiCommitDescriptor(fft); DftiComputeForward(fft, data); DftiFreeDescriptor(&fft);  
The following Example
“Using Parallel Mode with Multiple Descriptors Initialized in a Parallel Region”
and Example
“Using Parallel Mode with Multiple Descriptors Initialized in One Thread”
illustrate a parallel customer program with each descriptor instance used only in a single thread.
set
MKL_NUM_THREADS = 1
for
Intel® oneAPI Math Kernel Library
to work in the single-threaded mode (recommended);
set
OMP_NUM_THREADS = 4
for the customer program to work in the multi-threaded mode.
Using Parallel Mode with Multiple Descriptors Initialized in a Parallel Region
Note that in this example, the program can be transformed to become single-threaded at the customer level but using parallel mode within
Intel® oneAPI Math Kernel Library
. To achieve this, you need to set the parameter
DFTI_NUMBER_OF_TRANSFORMS = 4
and to set the corresponding parameter
DFTI_INPUT_DISTANCE = 5000.
/* C99 example */ #include "mkl_dfti.h" #include <omp.h> #define ARRAY_LEN(a) sizeof(a)/sizeof(a[0]) // 4 OMP threads, each does 2D FFT 50x100 points MKL_Complex8 data[4][50][100]; int nth = ARRAY_LEN(data); MKL_LONG dim_sizes[2] = { ARRAY_LEN(data[0]), ARRAY_LEN(data[0][0]) }; /* {50, 100} */ int th; /* ...put values into data[i][j][k] 0<=i<=3, 0<=j<=49, 0<=k<=99 */ // assume data is initialized and do 2D FFTs #pragma omp parallel for shared(dim_sizes, data) for (th = 0; th < nth; ++th) { DFTI_DESCRIPTOR_HANDLE myFFT = NULL; DftiCreateDescriptor(&myFFT, DFTI_SINGLE, DFTI_COMPLEX, 2, dim_sizes); DftiCommitDescriptor(myFFT); DftiComputeForward(myFFT, data[th]); DftiFreeDescriptor(&myFFT); }
set
MKL_NUM_THREADS
= 1 for
Intel® oneAPI Math Kernel Library
to work in the single-threaded mode (obligatory);
set
OMP_NUM_THREADS
= 4 for the customer program to work in the multi-threaded mode.
Using Parallel Mode with Multiple Descriptors Initialized in One Thread
/* C99 example */ #include "mkl_dfti.h" #include <omp.h># define ARRAY_LEN(a) sizeof(a)/sizeof(a[0]) // 4 OMP threads, each does 2D FFT 50x100 points MKL_Complex8 data[4][50][100]; int nth = ARRAY_LEN(data); MKL_LONG dim_sizes[2] = { ARRAY_LEN(data[0]), ARRAY_LEN(data[0][0]) }; /* {50, 100} */ DFTI_DESCRIPTOR_HANDLE FFT[ARRAY_LEN(data)]; int th; /* ...put values into data[i][j][k] 0<=i<=3, 0<=j<=49, 0<=k<=99 */ for (th = 0; th < nth; ++th) DftiCreateDescriptor(&FFT[th], DFTI_SINGLE, DFTI_COMPLEX, 2, dim_sizes); for (th = 0; th < nth; ++th) DftiCommitDescriptor(FFT[th]); // assume data is initialized and do 2D FFTs #pragma omp parallel for shared(FFT, data) for (th = 0; th < nth; ++th) DftiComputeForward(FFT[th], data[th]); for (th = 0; th < nth; ++th) DftiFreeDescriptor(&FFT[th]);
The following Example
“Using Parallel Mode with a Common Descriptor”
illustrates a parallel customer program with a common descriptor used in several threads.
Using Parallel Mode with a Common Descriptor
#include "mkl_dfti.h" #include <omp.h> #define ARRAY_LEN(a) sizeof(a)/sizeof(a[0]) // 4 OMP threads, each does 2D FFT 50x100 points MKL_Complex8 data[4][50][100]; int nth = ARRAY_LEN(data); MKL_LONG len[2] = {ARRAY_LEN(data[0]), ARRAY_LEN(data[0][0])}; DFTI_DESCRIPTOR_HANDLE FFT; int th; /* ...put values into data[i][j][k] 0<=i<=3, 0<=j<=49, 0<=k<=99 */ DftiCreateDescriptor(&FFT, DFTI_SINGLE, DFTI_COMPLEX, 2, len); DftiCommitDescriptor(FFT); // assume data is initialized and do 2D FFTs #pragma omp parallel for shared(FFT, data) for (th = 0; th < nth; ++th) DftiComputeForward(FFT, data[th]); DftiFreeDescriptor(&FFT);

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.