Using Pause/Resume API for MPI program?

I wrote the article about using Pause/Resume API from VTune™ Amplifier XE in C/C++ and Fortran code. With using my simple example, the user can control sampling data collection in their code.

Some users are not sure if they can use Pause/Resume API in MPI program. Actually it should work because we write same Pause/Resume API in code which ran in different process. Those APIs will run in different memory space, and don’t interact among. 

Here is a simple example named mpi_pi.c (see attached) – which is a MPI program to work on one node, it calculates PI in many processes (in this example, we only collect performance data during PI calculating) then do reduction to get final result. Below are detail the steps to build & use VTune™ Amplifier to analyzer.

1. Prepare running environment.

# source /opt/intel/impi/4.1.0/bin64/mpivars.sh

# source /opt/intel/composer_xe_2013_sp1.0.051/bin/compilervars.sh intel64

# source /opt/intel/vtune_amplifier_xe_2013/amplxe-vars.sh

2. Compile MPI program

# mpiicc -g mpi_pi.c -I/opt/intel/vtune_amplifier_xe_2013/include /opt/intel/vtune_amplifier_xe_2013/lib64/libittnotify.a -o mpi_pi

3. Run VTune Amplifier XE

# amplxe-cl -collect hotspots -start-paused -- mpirun -n 16 ./mpi_pi

4. Observe result

However if the user does “amplxe-cl -collect advanced-hotspots -start-paused -- mpirun -n 16 ./mpi_pi”, it will cause unexpected result. Why? The reason is that Resume/Pause API should communicate with vtune drivers (hardware PMU Event-based sampling mode) but different process will send many requests of pause or resume which will confused vtune drivers (my test environment is not cluster system). Since Hotspots is user-mode data collection which is to use OS timers – which are different in many processes, the results are expected.

For more complete information about compiler optimizations, see our Optimization Notice.