Building and Running 3D-FFT Code that Leverages MPI-3 Non-Blocking Collectives with the Intel® Parallel Studio XE Cluster Edition


This application note assists developers with using Intel® Software Development Tools with the 3D-FFT MPI-3 based code sample from the Scalable Parallel Computing Lab (SPCL), ETH Zurich.


The original 3D-FFT code based on the prototype library libNBC was developed to help in optimizing parallel high performance applications by overlapping computation and communication [1]. The updated version of the code based on MPI-3 Non-Blocking Collectives (NBC) has now been posted at the SPCL, ETH Zurich web site. This new version relies on the MPI-3 API and therefore can be used by modern MPI libraries that implement it. One such MPI library implementation is Intel® MPI Library that fully supports the MPI-3 Standard [2].     

Obtaining the latest Version of Intel® Parallel Studio XE 2015 Cluster Edition

The Intel® Parallel Studio XE 2015 Cluster Edition software product includes the following components used to build the 3D-FFT code:

  •    Intel® C++ Compiler XE
  •    Intel® MPI Library (version 5.0 or above) which supports the MPI-3 Standard
  •    Intel® Math Kernel Library (Intel® MKL) that contains an optimized FFT (Fast Fourier Transform) solver and the wrappers for FFTW (Fastest Fourier Transform in the West)

 The latest versions of Intel® Parallel Studio XE 2015 Cluster Edition may be purchased, or evaluation copies requested, from the URL  Existing customers with current support for Intel® Parallel Studio XE 2015 Cluster Edition can download the latest software updates directly from     

Code Access

To download the 3D-FFT MPI-3 NBC code, please go to the URL 

Building the 3D-FFT NBC Binary      

To build the 3D-FFT NBC code:


  1. Set up the build environment, e.g.,

01source /opt/intel/composer_xe_2015.2.164/bin/ intel64
02source /opt/intel/impi/


   Regarding the above mentioned versions (.../composer_xe_2015.2.164 and .../impi/, please source the corresponding versions that are installed on your system.

  2. Untar the 3D-FFT code download from the link provided in the Code Access section above and build the 3D-FFT NBC binary

01mpiicc -o 3d-fft_nbc 3d-fft_nbc.cpp -I$MKLROOT/include/fftw/ -mkl


Running the 3D-FFT NBC Application      

Intel® MPI Library support of asynchronous message progressing allows to overlap computation and communication in NBC operations [2]. To enable asynchronous progress in the Intel® MPI Library, the environment variable MPICH_ASYNC_PROGRESS should be set to 1:


Run the application using the mpirun command as usual. For example, the command shown below starts the application with 32 ranks on the 2 nodes (node1 and node2) with 16 processes per node:

01mpirun -n 32 -ppn 16 -hosts node1,node2 ./3d-fft_nbc

and produces this output

011 repetitions of N=320, testsize: 0, testint 0, tests: 0, max_n: 10
02approx. size: 62.500000 MB
03normal (MPI): 0.192095 (NBC_A2A: 0.037659/0.000000) (Test: 0.000000) (2x1d-fft: 0.069162) - 1x512000 byte
04normal (NBC): 0.203643 (NBC_A2A: 0.047140/0.046932) (Test: 0.000000) (2x1d-fft: 0.069410) - 1x512000 byte
05pipe (NBC): 0.173483 (NBC_A2A: 0.042651/0.031492) (Test: 0.000000) (2x1d-fft: 0.069383) - 1x512000 byte
06tile (NBC): 0.155921 (NBC_A2A: 0.018214/0.010794) (Test: 0.000000) (2x1d-fft: 0.069577) - 1x512000 byte
07win (NBC): 0.173479 (NBC_A2A: 0.042485/0.026085) (Pack: 0.000000) (2x1d-fft: 0.069385) - 1x512000 byte
08wintile (NBC): 0.169248 (NBC_A2A: 0.028918/0.021769) (Pack: 0.000000) (2x1d-fft: 0.069290) - 1x512000 byte



Thanks goes to Torsten Hoefler for hosting 3D-FFT distribution for Intel tools. Mikhail Brinskiy assisted in porting libNBC version of 3D-FFT code to MPI-3 Standard. James Tullos and Steve Healey suggested corrections and improvements to the draft. 


1. Torsten Hoefler, Peter Gottschling, Andrew Lumsdaine, Brief Announcement: Leveraging Non-Blocking Collectives Communication in High-Performance Applications, SPAA'08, pp. 113-115, June 14-16, 2008, Munich, Germany. 

2. Mikhail Brinskiy, Alexander Supalov, Michael Chuvelev, Evgeny Leksikov, Mastering Performance Challenges with the new MPI-3 Standard, PUM issue 18:


For more complete information about compiler optimizations, see our Optimization Notice.

1 comment


The 3D-FFT MPI-3 NBC code has a bug that results in the following mess.  Seems someone changed the allocation of a2as and a2ar to use MPI_Alloc_mem instead of 'new' and forgot to change the cleanup routine to use MPI_Free_mem.

Running under SLES12sp2

Intel(R) 64, Version Build 20170811

mpicc -cc=icc -g -fexceptions -O0 -traceback -ftrapuv -check=uninit -fp-trap=common -o 3d-fft_nbc 3d-fft_nbc.cpp -I$MKLROOT/include/fftw/ -mkl

mpiexec -np 32 ./3d-fft_nbc

gdb -c core ./3d-fft_nbc

Program terminated with signal SIGABRT, Aborted.
#0  0x00002aaab000a8d7 in raise () from /lib64/
Missing separate debuginfos, use: zypper install glibc-debuginfo-2.22-61.3.x86_64 libcxgb3-rdmav2-debuginfo-1.3.1-6.2.x86_64 libgcc_s1-debuginfo-6.2.1+r239768-2.4.x86_64 libmthca-rdmav2-debuginfo-1.0.6-5.2.x86_64 libnl3-200-debuginfo-3.2.23-2.21.x86_64 libstdc++6-debuginfo-6.2.1+r239768-2.4.x86_64
(gdb) where
#0  0x00002aaab000a8d7 in raise () from /lib64/
#1  0x00002aaab000bcaa in abort () from /lib64/
#2  0x00002aaab00481b4 in __libc_message () from /lib64/
#3  0x00002aaab004d706 in malloc_printerr () from /lib64/
#4  0x00002aaab004e453 in _int_free () from /lib64/
#5  0x0000000000405c2f in buffer_t::~buffer_t (this=0x1587bf0) at 3d-fft_nbc.cpp:161
#6  0x00000000004041cc in do_3dfft (comm=..., test=0x3e691c0, out=0x2ec91b0, t=0x7fffffffc2f8) at 3d-fft_nbc.cpp:472
#7  0x0000000000402c39 in main (argc=1, argv=0x7fffffffc518) at 3d-fft_nbc.cpp:359
(gdb) up 5
#5  0x0000000000405c2f in buffer_t::~buffer_t (this=0x1587bf0) at 3d-fft_nbc.cpp:161
161        free(a2as);
(gdb) list
156        }
157        std::fill(a2ar, a2ar + size, 0);
158      }
160      ~buffer_t() {
161        free(a2as);
162        free(a2ar);
163      }
165      int tile_size() const {
(gdb) whatis a2as
type = double *


Here's my solution which works on my platform

  ~buffer_t() {

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.