Building and Running 3D-FFT Code that Leverages MPI-3 Non-Blocking Collectives with the Intel® Parallel Studio XE Cluster Edition


This application note assists developers with using Intel® Software Development Tools with the 3D-FFT MPI-3 based code sample from the Scalable Parallel Computing Lab (SPCL), ETH Zurich.


The original 3D-FFT code based on the prototype library libNBC was developed to help in optimizing parallel high performance applications by overlapping computation and communication [1]. The updated version of the code based on MPI-3 Non-Blocking Collectives (NBC) has now been posted at the SPCL, ETH Zurich web site. This new version relies on the MPI-3 API and therefore can be used by modern MPI libraries that implement it. One such MPI library implementation is Intel® MPI Library that fully supports the MPI-3 Standard [2].     

Obtaining the latest Version of Intel® Parallel Studio XE 2015 Cluster Edition

The Intel® Parallel Studio XE 2015 Cluster Edition software product includes the following components used to build the 3D-FFT code:

  •    Intel® C++ Compiler XE
  •    Intel® MPI Library (version 5.0 or above) which supports the MPI-3 Standard
  •    Intel® Math Kernel Library (Intel® MKL) that contains an optimized FFT (Fast Fourier Transform) solver and the wrappers for FFTW (Fastest Fourier Transform in the West)

 The latest versions of Intel® Parallel Studio XE 2015 Cluster Edition may be purchased, or evaluation copies requested, from the URL  Existing customers with current support for Intel® Parallel Studio XE 2015 Cluster Edition can download the latest software updates directly from     

Code Access

To download the 3D-FFT MPI-3 NBC code, please go to the URL 

Building the 3D-FFT NBC Binary      

To build the 3D-FFT NBC code:


  1. Set up the build environment, e.g.,

01source /opt/intel/composer_xe_2015.2.164/bin/ intel64
02source /opt/intel/impi/


   Regarding the above mentioned versions (.../composer_xe_2015.2.164 and .../impi/, please source the corresponding versions that are installed on your system.

  2. Untar the 3D-FFT code download from the link provided in the Code Access section above and build the 3D-FFT NBC binary

01mpiicc -o 3d-fft_nbc 3d-fft_nbc.cpp -I$MKLROOT/include/fftw/ -mkl


Running the 3D-FFT NBC Application      

Intel® MPI Library support of asynchronous message progressing allows to overlap computation and communication in NBC operations [2]. To enable asynchronous progress in the Intel® MPI Library, the environment variable MPICH_ASYNC_PROGRESS should be set to 1:


Run the application using the mpirun command as usual. For example, the command shown below starts the application with 32 ranks on the 2 nodes (node1 and node2) with 16 processes per node:

01mpirun -n 32 -ppn 16 -hosts node1,node2 ./3d-fft_nbc

and produces this output

011 repetitions of N=320, testsize: 0, testint 0, tests: 0, max_n: 10
02approx. size: 62.500000 MB
03normal (MPI): 0.192095 (NBC_A2A: 0.037659/0.000000) (Test: 0.000000) (2x1d-fft: 0.069162) - 1x512000 byte
04normal (NBC): 0.203643 (NBC_A2A: 0.047140/0.046932) (Test: 0.000000) (2x1d-fft: 0.069410) - 1x512000 byte
05pipe (NBC): 0.173483 (NBC_A2A: 0.042651/0.031492) (Test: 0.000000) (2x1d-fft: 0.069383) - 1x512000 byte
06tile (NBC): 0.155921 (NBC_A2A: 0.018214/0.010794) (Test: 0.000000) (2x1d-fft: 0.069577) - 1x512000 byte
07win (NBC): 0.173479 (NBC_A2A: 0.042485/0.026085) (Pack: 0.000000) (2x1d-fft: 0.069385) - 1x512000 byte
08wintile (NBC): 0.169248 (NBC_A2A: 0.028918/0.021769) (Pack: 0.000000) (2x1d-fft: 0.069290) - 1x512000 byte



Thanks goes to Torsten Hoefler for hosting 3D-FFT distribution for Intel tools. Mikhail Brinskiy assisted in porting libNBC version of 3D-FFT code to MPI-3 Standard. James Tullos and Steve Healey suggested corrections and improvements to the draft. 


1. Torsten Hoefler, Peter Gottschling, Andrew Lumsdaine, Brief Announcement: Leveraging Non-Blocking Collectives Communication in High-Performance Applications, SPAA'08, pp. 113-115, June 14-16, 2008, Munich, Germany. 

2. Mikhail Brinskiy, Alexander Supalov, Michael Chuvelev, Evgeny Leksikov, Mastering Performance Challenges with the new MPI-3 Standard, PUM issue 18:


Для получения подробной информации о возможностях оптимизации компилятора обратитесь к нашему Уведомлению об оптимизации.
Возможность комментирования русскоязычного контента была отключена. Узнать подробнее.