Why can’t Hotspots Analysis trace spawned processes of MPI job?

Problem:

Using mpiexec command like as "mpiexec -np 4 program" can run MPI jobs on local host.

You may find that VTune™ Amplifier XE 2011's Hotspots Analysis only displays your MPI program as only one process, one thread, one module, even the user enable "Analyze child processes" option.   Note: "Analyze system-wide" option is unnecessary or useless for Hotspots Analysis.

Here is an example to show this problem.  It calculate Pi by using MPI Library, code looks like:

MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);

MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);

h   = 1.0 / (double) n;
sum = 0.0;
for (i = myid + 1; i <= n; i += numprocs) {
     x = h * ((double)i - 0.5);
     sum += 4.0 / (1.0 + x*x);
}
mypi = h * sum;

MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0,
                  MPI_COMM_WORLD);

if (myid == 0)
   printf("pi is approximately %.16f, Error is %.16f\n",
                  pi, fabs(pi - PI25DT));
MPI_Finalize();

 


Normally you may run the following steps to compile and run your MPI program: (Note: Intel® MPI Library 4.0.0 and Intel® Amplifier XE 2011 have been already installed in your local machine)

1) export PATH=$PATH:/opt/intel/vtune_amplifier_xe_2011/bin64/
2) source /opt/intel/impi/4.0.0/bin64/mpivars.sh
3) mpicc -g pi.c -o pi.gcc
4) mpdboot
5) amplxe-cl -collect hotspots -r r0001hs -- mpiexec -np 4 ./pi.gcc
6) amplxe-cl -report hotspots -r r0001hs -group-by process

User can also view results via GUI by using command "amplxe-gui".  You will find only process "python" was displayed, for example:

mpi1.jpg


Root-cause:


mpiexec doesn't run MPI program directly, it run connection to MPI's mpd daemon via socket and pass all parameters, so the program is not child process of mpiexec. 

Solution: 

Running your MPI program on local host by using command "mpiexec.hydra" instead of "mpiexec", and with "-bootstrap fork" options.  Thus, it will run MPI programs on local host but using fork mechanism from operation system.   For example, like as "mpiexec.hydra -bootstrap fork -np 4 program". 

So change step 5 to :
5b) amplxe-cl -collect hotspots -r r0002hs -- mpiexec.hydra -bootstrap fork -np 4 ./pi.gcc
6b) amplxe-cl -report hotspots -r r0002hs -group-by process

Now we can see the correct results after using command 'amplxe-gui" to open result.  Bsides the process "mpiexec.hydra" and "pmi_proxy", four "pi.gcc" processes were displayed too.

mpi2.jpg 


Note:
1) This method will be helpful for Lightweight hotspots function, if user won't use system-wide analysis
2) It only works if your MPI program is runnning in your local single node machine (with multi-core or multi-processors).

Per informazioni più dettagliate sulle ottimizzazioni basate su compilatore, vedere il nostro Avviso sull'ottimizzazione.