MPI_Barrier using 100% CPU

MPI_Barrier using 100% CPU

In my code, I put all ranks > 0 in a MPI_Barrier while rank 0 does some processing. But while on the barrier, the processes are consuming 100% of the processor.

Below is a sample code. Rank zero consumes 0% while on scanf but all other ranks consume 100% while on the barrier.

#include <stdio.h>
#include <mpi.h>

int main(int argc, char **argv) {

int rank;
int dummy;

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &rank);

printf("MPI rank %i reporting in.\n", rank);

if ( rank == 0 ) {

printf("waiting for input\n");

scanf("%i", &dummy);

printf("ok\n");

MPI_Barrier(MPI_COMM_WORLD);

} else {

MPI_Barrier(MPI_COMM_WORLD);

}

return 0;

 

 

8 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.
Portrait de James Tullos (Intel)

Hi italo,

If you set I_MPI_WAIT_MODE=1, this will cause the processes to wait at MPI_Barrier instead of constantly polling.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

It worked. Thank you.

Portrait de James Tullos (Intel)

Hi italo,

Good. Please feel free to contact us again for any future concerns.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

My code is still not working as I want. My code uses both OpenMP and MPI. I want to do this:

- Start up the code. Put all ranks >0 on barrier and rank 0 stars to work.
- While all other ranks are on the barrier, rank 0 uses multiple OpenMP threads.
- Rank 0 finshes its job, reaches the barrier and all ranks start to work. Each rank uses one thread.
- Work finishes. Ranks >0 loop back to the barrier and rank 0 does more work with OpenMP.

The problem is rank 0 will only launch one OpemMP thread. I then omp_set_num_threads(8) on rank 0. It launches 8 threads, but they are confined to one processor.

So, basically, I want rank 0 to be able to use all 8 processors with OpenMP while the other ranks are on the barrier. Is it possible?

Portrait de James Tullos (Intel)

Hi italo,

How are you determining where the processes are pinned? Set I_MPI_DEBUG=4 to verify the pinning set for each rank. You can use I_MPI_PIN_DOMAIN to specify which cores are available to which ranks. I would recommend looking through the documentation for this environment variable (found in the Reference Manual) and choosing the appropriate settings for your system. If you want some advice on this, let me know your system's CPU configuration and how you want the ranks and threads pinned and I can offer assistance.

Make certain you are using the multithreaded MPI library. Use the option -mt_mpi with your MPI compiler script to link to the correct library.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

I_MPI_PIN_DOMAIN=omp

That's what I was looking for. Thanks again.

Portrait de James Tullos (Intel)

Glad to help!

Connectez-vous pour laisser un commentaire.