Segmentation Fault in MPI_Finalize()

Segmentation Fault in MPI_Finalize()

Аватар пользователя John Cavanaugh

I've run into a problem using Intel MPI 4.0.2.003.  I'm running the "Hello world" program that comes with MPI over OFA on two nodes.  The IB Verbs layer is provided by a library we've developed at Cray.  It presents an IB verbs interface to MPI and uses Cray's proprietary high-speed network to do the data transfer within the cluster.  If I use DAPL, it works fine, but it crashes when I try to use OFA.

The backtrace for the crash is:
> (gdb) bt
> #0  0x00002aaaab49d6c1 in MPID_OFA_module_Finalize_CM ()
>    from /cray/css/iaa/mpi_images/impi/4.0.2.003/intel64/lib/libmpi.so.4
> #1  0x00002aaaab4870d3 in MPID_nem_gen2_module_finalize ()
>    from /cray/css/iaa/mpi_images/impi/4.0.2.003/intel64/lib/libmpi.so.4
> #2  0x00002aaaab46a31b in MPID_nem_finalize ()
>    from /cray/css/iaa/mpi_images/impi/4.0.2.003/intel64/lib/libmpi.so.4
> #3  0x00002aaaab37928b in MPIDI_CH3_Finalize ()
>    from /cray/css/iaa/mpi_images/impi/4.0.2.003/intel64/lib/libmpi.so.4
> #4  0x00002aaaab46624e in MPID_Finalize ()
>    from /cray/css/iaa/mpi_images/impi/4.0.2.003/intel64/lib/libmpi.so.4
> #5  0x00002aaaab420afa in PMPI_Finalize ()
>    from /cray/css/iaa/mpi_images/impi/4.0.2.003/intel64/lib/libmpi.so.4
> #6  0x00002aaaaaccf229 in MPI_Finalize ()
>    from /lus/scratch/jdc/795857/lib/libpibgni.so.1.0.0
> #7  0x0000000000400be4 in main ()

I looked at MPID_OFA_module_Finalize_CM() with gdb.  It looks like it's picking up cm_pending_head, which has a value of zero, and using it as a pointer.

This is speculation, but I wonder whether "CM" stands for Connection Manager.  Our software doesn't provide a connection manager, and I wonder whether this is the root of the problem.

Is this a known problem in 4.1.2?

Thanks.

7 сообщений / 0 новое
Последнее сообщение
Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.
Аватар пользователя James Tullos (Intel)

Hi John,

Looking through our reports, I have only found one error that could be related.  However, the solution there was to increase the locked memory limit, which you have already done.  I'll check with our developers to see if they have any better ideas.

Also, I would recommend that for specific or complex issues, you submit them through Intel® Premier Support instead of through the forums.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Аватар пользователя James Tullos (Intel)

Hi John,

Have you tried this with Version 4.1 yet?  We have not released a Version 4.1.2 at this time.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Аватар пользователя John Cavanaugh

Sorry, I had a typo in my original message.  I meant 4.0.2.003.

I have run with 4.1.0.024.  I don't hit the problem there.  I get farther down in MPID_OFA_module_Finalize_CM() and hang because of a bug in my code.

Thanks for the information about Premier Support.  I'll be sure to use it in the future when it seems appropriate.

Аватар пользователя James Tullos (Intel)

Hi John,

Is there a particular reason you need to use 4.0.2?  If not, I'd recommend using 4.1.0.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Аватар пользователя John Cavanaugh

My goal is to understand the cause of the problem (and to fix it, if it's in my code).  If we have customers who run into this, we need to be able to explain to them why there's a problem and what to do about it.

I found the problem that was causing the hang in 4.1.0, so using it is an option.

Аватар пользователя James Tullos (Intel)

Hi John,

Understood.  I'll see what can be done to find a root cause for this.  But the simple answer if you have a customer running into the issue is to ask them to try with the latest version of the Intel® MPI Library.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Зарегистрируйтесь, чтобы оставить комментарий.