MPI is blocking on MPI_Test, Trace Analyser question

MPI is blocking on MPI_Test, Trace Analyser question

Hi all,

I'm implementing a dynamic scheduler for solving several sparse matrices (using the well known MUMPS solver) in parallel. Each process will ask for new work (new matrix, actually just a number of the matrix) to the work manager when it completes his task. The manager code is ran as a separate thread in master processes so the master process can do some work as well. This works well 9 out of 10 times but sometimes everything is just hanging. When I attach the debugger when this happens it seems that the processes are blocking at MPI_Test for some reason. This should not happen because MPI_Test is the non-blocking version of MPI_Wait. Any idea what could be wrong or how I can debug this.

I'm trying to use Intel Trace Analyser but I'm only able to get traces of working runs. When my program hangs (some kind of deadlock i guess) I have to kill all processes but this also means I do not get a trace.

I tried using VTmt.lib to check for errors but get none.
I tried using VTfs.lib to automatically detect deadlocks when tracing but it is unable do detect this case.

Please advice me on what could cause MPI_Test to become blocking of how I can debug this case.

Thanks in advance

12 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

Hi Gert,

Are you linking with the multithreaded MPI library?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Yes and I'm using MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &MpiThreadLevel);

Hi Gert,

Do you have a small reproducer for this behavior? If you prefer, you can either post it in a private reply or email it to me directly.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Not yet, I'm already trying to reproduce it in a smaller code.

Hi Gert,

Could you run it with -verbose or link with VTmc.lib?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

it doesn't detect any errors. not even a deadlaock situation. Hoever the master process in blocking on MPI_test.

Edit: actually it detected a no progress after 5 minutes after i've done some changes.

Hi Gert,

Do you have the output after running with -verbose? Please send that and I'll see if there's anything obvious there. You can also use "-genv I_MPI_DEBUG 5" for more information.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

0] WARNING: GLOBAL:DEADLOCK:NO_PROGRESS: warning
0] WARNING: Processes have been blocked on average inside MPI for the last 5:05 minutes:
0] WARNING: either the application has a load imbalance or a deadlock which is not detected
0] WARNING: because at least one process polls for message completion instead of blocking
0] WARNING: inside MPI.
0] WARNING: [0] last MPI call:
0] WARNING: MPI_COMM_FREE(*comm=0x0000000006dcc380, *ierr=0x00000000084da4cc)
0] WARNING: ZMUMPS (sysnoise)
0] WARNING: ZMUMPSCPP (...\mumpscpp.cpp:8)
0] WARNING: SOLVERMUMPS_CLEAR (...\mumps.f:256)

Hi Gert,

Based on that, I would check for something still using the communicator that you are attempting to free. Ensure that you are not reaching a race condition somewhere. I don't think the -verbose (or I_MPI_DEBUG) output will help here, but if you want to send that, feel free to do so.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Hi James,

The free is done by the MUMPS solver itself (on a duplicate of the COMM_SELF believe). The debug outpout doesn't give more information. But the good news is that I managed to reproduce the issue whe MPI_test becomes blocking in a small code example. It's 7mb including data and the mumps libs. How can is send this to you? I can also upload it on Intel Primier support.

Hi Gert,

Since you have Premier access, that would probably be the best option. Just attach it to a new issue and mention this thread, in case someone else gets the issue.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen