Setting number_of_user_threads for Intel® Math Kernel Library FFTW3 wrappers

Consider the case when you

  • Create a FFTW3 plan and use the plan for sequential DFT computation on each thread in your parallel region
  • Use Intel Math Kernal Library (Intel MKL) FFTW3 wrappers
  • Want the best performance

Intel MKL FFTW3 wrappers are thread safe by default. However, you should set one additional Intel MKL variable to get the best performance with Intel MKL. Set the number_of_users_threads variable as described below.

In C:

#include "fftw3.h"

/* Added for Intel MKL wrappers to set number of user threads */
#include "fftw3_mkl.h"

/*nthreads -- number of threads sharing the same plan; should be set before the plan is created*/
fftw3_mkl.number_of_user_threads = nthreads;
plan = fftw_plan_dft(...);

In Fortran:

1. Fortran programs should declare use of the global structure declared in the mkl/include/fftw/fftw3_mkl.h (your compiler should support the BIND statement):

!DIR$ ATTRIBUTES ALIGN : 8 :: fftw3_mkl
COMMON/fftw3_mkl/ignore(4),mkl_dft_number_of_user_threads,ignore2(7)
INTEGER*4 :: ignore, mkl_dft_number_of_user_threads, ignore2
BIND (c) :: /fftw3_mkl/     

2. After the declaration, the number of threads that are supposed to concurrently share an FFTW plan should be set before the plan is created with any of *fftw_plan_* functions:

mkl_dft_number_of_user_threads = nthreads

call dfftw_plan_dft_1d(...)

The attached examples demonstrate setting number_of_user_threads in both C and Fortran.

Note that this hint is applicable for FFTW3 wrappers only, not for FFTW2 wrappers. To get performance advantage with FFTW2 wrappers you should create a plan for each thread separately.

For more complete information about compiler optimizations, see our Optimization Notice.
AttachmentSize
File dp.c6.07 KB
File sp.c6.09 KB
Binary Data dp.f905.9 KB
Binary Data sp.f905.89 KB

3 comments

Top
wilf-kruggel's picture

Hi,

I downloaded the file sp_0.c, then increased N to 64000 and M to 10000.  This was to ensure that the program runs for a measurable amount of time.  I then ran two timing tests:  one with nthreads left at 4 and another with nthread set to 1.  Both tests were run on an idle 4 core machine. Here are the results of time sp_0 for both jobs:

# nthread = 4

real    0m25.538s
user    0m31.743s
sys    0m2.002s

# nthread = 1

real    0m29.116s
user    0m28.480s
sys    0m0.557s

As you can see, the 4 thread job barely managed to outperform the 1 thread job.  I'm struggling with the same issue in my own application and am wondering if my expectations are faulty or if there is some other kind of error at play.

Here is my compile line:

icc -std=c99 -g -fopenmp -xHost -fargument-noalias -qopt-subscript-in-range  -fno-inline-functions -ansi-alias     sp_0.c  -Wl,--start-group /data/intel_2018/compilers_and_libraries_2018.0.128/linux/mkl/lib/intel64/libmkl_intel_ilp64.a /data/intel_2018/compilers_and_libraries_2018.0.128/linux/mkl/lib/intel64/libmkl_intel_thread.a /data/intel_2018/compilers_and_libraries_2018.0.128/linux/mkl/lib/intel64/libmkl_core.a /data/intel_2018/compilers_and_libraries_2018.0.128/linux/mkl/lib/intel64/libmkl_blas95_ilp64.a /data/intel_2018/compilers_and_libraries/linux/lib/intel64/libiomp5.a -Wl,--end-group -lm -ldl -o sp_0

Thanks

Vinutha V (Intel)'s picture

Hi,

I have issue with respect to FFTW mkl function - which does not seem to be threaded. I have compiled it for native xeon phi.

How can I get these mkl functions(FFTW_EXECUTE_DFT_C2R and FFTW_EXECUTE_DFT_R2C) threaded?

 

Thanks,

Vinutha

Maciej O.'s picture

Thanks for the article, it would be great if you could update the MKL reference manual too as it currently says

FFTW3 wrappers are not fully thread safe. If the new-array execute functions, such as fftw_execute_dft(), share the same plan from parallel user threads, set the number of the sharing threads before creation of the plan. For this purpose, the FFTW3 wrappers provide a header file fftw3_mkl.h, which defines a global structure fftw3_mkl with a field to be set to the number of sharing threads. Below is an example of setting the number of sharing threads

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.