Building and Running 3D-FFT Code that Leverages MPI-3 Non-Blocking Collectives with the Intel® Parallel Studio XE Cluster Edition

By Mark I Lubin, Published: 05/26/2015, Last Updated: 05/26/2015


This application note assists developers with using Intel® Software Development Tools with the 3D-FFT MPI-3 based code sample from the Scalable Parallel Computing Lab (SPCL), ETH Zurich.


The original 3D-FFT code based on the prototype library libNBC was developed to help in optimizing parallel high performance applications by overlapping computation and communication [1]. The updated version of the code based on MPI-3 Non-Blocking Collectives (NBC) has now been posted at the SPCL, ETH Zurich web site. This new version relies on the MPI-3 API and therefore can be used by modern MPI libraries that implement it. One such MPI library implementation is Intel® MPI Library that fully supports the MPI-3 Standard [2].     

Obtaining the latest Version of Intel® Parallel Studio XE 2015 Cluster Edition

The Intel® Parallel Studio XE 2015 Cluster Edition software product includes the following components used to build the 3D-FFT code:

  •    Intel® C++ Compiler XE
  •    Intel® MPI Library (version 5.0 or above) which supports the MPI-3 Standard
  •    Intel® Math Kernel Library (Intel® MKL) that contains an optimized FFT (Fast Fourier Transform) solver and the wrappers for FFTW (Fastest Fourier Transform in the West)

 The latest versions of Intel® Parallel Studio XE 2015 Cluster Edition may be purchased, or evaluation copies requested, from the URL  Existing customers with current support for Intel® Parallel Studio XE 2015 Cluster Edition can download the latest software updates directly from     

Code Access

To download the 3D-FFT MPI-3 NBC code, please go to the URL 

Building the 3D-FFT NBC Binary      

To build the 3D-FFT NBC code:


  1. Set up the build environment, e.g.,

01 source /opt/intel/composer_xe_2015.2.164/bin/ intel64
02 source /opt/intel/impi/


   Regarding the above mentioned versions (.../composer_xe_2015.2.164 and .../impi/, please source the corresponding versions that are installed on your system.

  2. Untar the 3D-FFT code download from the link provided in the Code Access section above and build the 3D-FFT NBC binary

01 mpiicc -o 3d-fft_nbc 3d-fft_nbc.cpp -I$MKLROOT/include/fftw/ -mkl


Running the 3D-FFT NBC Application      

Intel® MPI Library support of asynchronous message progressing allows to overlap computation and communication in NBC operations [2]. To enable asynchronous progress in the Intel® MPI Library, the environment variable MPICH_ASYNC_PROGRESS should be set to 1:


Run the application using the mpirun command as usual. For example, the command shown below starts the application with 32 ranks on the 2 nodes (node1 and node2) with 16 processes per node:

01 mpirun -n 32 -ppn 16 -hosts node1,node2 ./3d-fft_nbc

and produces this output

01 1 repetitions of N=320, testsize: 0, testint 0, tests: 0, max_n: 10
02 approx. size: 62.500000 MB
03 normal (MPI): 0.192095 (NBC_A2A: 0.037659/0.000000) (Test: 0.000000) (2x1d-fft: 0.069162) - 1x512000 byte
04 normal (NBC): 0.203643 (NBC_A2A: 0.047140/0.046932) (Test: 0.000000) (2x1d-fft: 0.069410) - 1x512000 byte
05 pipe (NBC): 0.173483 (NBC_A2A: 0.042651/0.031492) (Test: 0.000000) (2x1d-fft: 0.069383) - 1x512000 byte
06 tile (NBC): 0.155921 (NBC_A2A: 0.018214/0.010794) (Test: 0.000000) (2x1d-fft: 0.069577) - 1x512000 byte
07 win (NBC): 0.173479 (NBC_A2A: 0.042485/0.026085) (Pack: 0.000000) (2x1d-fft: 0.069385) - 1x512000 byte
08 wintile (NBC): 0.169248 (NBC_A2A: 0.028918/0.021769) (Pack: 0.000000) (2x1d-fft: 0.069290) - 1x512000 byte



Thanks goes to Torsten Hoefler for hosting 3D-FFT distribution for Intel tools. Mikhail Brinskiy assisted in porting libNBC version of 3D-FFT code to MPI-3 Standard. James Tullos and Steve Healey suggested corrections and improvements to the draft. 


1. Torsten Hoefler, Peter Gottschling, Andrew Lumsdaine, Brief Announcement: Leveraging Non-Blocking Collectives Communication in High-Performance Applications, SPAA'08, pp. 113-115, June 14-16, 2008, Munich, Germany. 

2. Mikhail Brinskiy, Alexander Supalov, Michael Chuvelev, Evgeny Leksikov, Mastering Performance Challenges with the new MPI-3 Standard, PUM issue 18:


Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804