PMPI_Bcast: Message truncated,

PMPI_Bcast: Message truncated,

Hi,

I am trying to debug some problems with getting an exe developed by another group in our company to run on Intel MPI. I am using Linux version 4.1.

Debug  output as below....

Does the error indicate a "programming error" on their part ( buffers not sized correctly?) or some other issue.

Thanks

[0] MPI startup(): Intel(R) MPI Library, Version 4.1 Update 2  Build 20131023
[0] MPI startup(): Copyright (C) 2003-2013 Intel Corporation.  All rights reserved.
[0] MPI startup(): shm and tcp data transfer modes
[1] MPI startup(): shm and tcp data transfer modes
[0] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[1] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8

 

[0] MPI startup(): Rank    Pid      Node name     Pin cpu
[0] MPI startup(): 0       30601    linuxdev      {0,1,2,3}
[0] MPI startup(): 1       15240    centosserver  {0,1,2,3}
[0] MPI startup(): Recognition=2 Platform(code=8 ippn=0 dev=5) Fabric(intra=1 inter=6 flags=0x0)
[0] MPI startup(): Topology split mode = 1

[1] MPI startup(): Recognition=2 Platform(code=8 ippn=0 dev=5) Fabric(intra=1 inter=6 flags=0x0)
| rank | node | space=2
|  0  |  0  |
|  1  |  1  |
[0] MPI startup(): I_MPI_DEBUG=100
[0] MPI startup(): I_MPI_FABRICS=shm:tcp
[0] MPI startup(): I_MPI_INFO_BRAND=Intel(R) Xeon(R)
[0] MPI startup(): I_MPI_INFO_CACHE1=0,1,2,3
[0] MPI startup(): I_MPI_INFO_CACHE2=0,1,2,3
[0] MPI startup(): I_MPI_INFO_CACHE3=0,0,0,0
[0] MPI startup(): I_MPI_INFO_CACHES=3
[0] MPI startup(): I_MPI_INFO_CACHE_SHARE=2,2,16
[0] MPI startup(): I_MPI_INFO_CACHE_SIZE=32768,262144,6291456
[0] MPI startup(): I_MPI_INFO_CORE=0,1,2,3
[0] MPI startup(): I_MPI_INFO_C_NAME=Wolfdale
[0] MPI startup(): I_MPI_INFO_DESC=1342208505
[0] MPI startup(): I_MPI_INFO_FLGB=0
[0] MPI startup(): I_MPI_INFO_FLGC=398124031
[0] MPI startup(): I_MPI_INFO_FLGD=-1075053569
[0] MPI startup(): I_MPI_INFO_LCPU=4
[0] MPI startup(): I_MPI_INFO_MODE=263
[0] MPI startup(): I_MPI_INFO_PACK=0,0,0,0
[0] MPI startup(): I_MPI_INFO_SERIAL=E31225
[0] MPI startup(): I_MPI_INFO_SIGN=132775
[0] MPI startup(): I_MPI_INFO_STATE=0
[0] MPI startup(): I_MPI_INFO_THREAD=0,0,0,0
[0] MPI startup(): I_MPI_INFO_VEND=1
[0] MPI startup(): I_MPI_PIN_INFO=x0,1,2,3
[0] MPI startup(): I_MPI_PIN_MAPPING=1:0 0

.....

 

                     
Fatal error in PMPI_Bcast: Message truncated, error stack:
PMPI_Bcast(2112)......................: MPI_Bcast(buf=0x2ae6d2ef9010, count=1, dtype=USER<vector>, root=0, comm=0x84000000) failed
MPIR_Bcast_impl(1670).................:
I_MPIR_Bcast_intra(1887)..............: Failure during collective
MPIR_Bcast_intra(1524)................: Failure during collective
MPIR_Bcast_intra(1510)................:
MPIR_Bcast_scatter_ring_allgather(841):
MPIDI_CH3U_Receive_data_found(129)....: Message from rank 0 and tag 2 truncated; 50000000 bytes received but buffer size is 20000000
MPIR_Bcast_scatter_ring_allgather(789):
scatter_for_bcast(301)................:
MPIDI_CH3U_Receive_data_found(129)....: Message from rank 0 and tag 2 truncated; 50000000 bytes received but buffer size is 20000000
rank = 1, revents = 8, state = 8

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

That error indicates that the MPI_Bcast call is trying to send too large of a message.  Keep the message under 2 GB and it should work.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Excellent, thanks for  the tip!

Hi ,

I have compiled espresso with intel mpi and MKL library but  getting error Failure during collective error when ever it is working fine with openmpi.

is there problem with intel mpi

Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x516f460, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x5300310, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x6b295c0, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x67183d0, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x4f794c0, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
[0:n125] unexpected disconnect completion event from [22:n122]
Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
internal ABORT - process 0
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x56bfe30, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
/var/spool/PBS/mom_priv/epilogue: line 30: kill: (5089) - No such process

Kindly help us for resolving this

Thanks
sanjiv

Leave a Comment

Please sign in to add a comment. Not a member? Join today