Intel MPI with IMB-MPI1 on 64 nodes produces max amount of the cache entries exceeded

Intel MPI with IMB-MPI1 on 64 nodes produces max amount of the cache entries exceeded

running IMB-MPI1 with intel mpi on 64 nodes produce following result.

# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#

# List of Benchmarks to run:

# Alltoall

#----------------------------------------------------------------
# Benchmarking Alltoall
# #processes = 768
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.05 0.10 0.05
1 1000 656.06 657.42 656.32
2 1000 615.83 617.04 616.07
4 1000 712.77 713.89 712.90
8 1000 742.07 743.47 742.29
16 1000 993.12 994.09 993.91
32 1000 1608.20 1610.55 1609.40
64 1000 11716.64 11725.01 11719.16
128 1000 10610.38 10616.83 10612.43
256 1000 14339.20 14346.34 14342.88
512 1000 162409.88 162431.65 162418.70
1024 1000 169927.96 169945.88 169936.23
2048 1000 218936.47 218964.49 218951.46
4096 1000 239129.04 239165.10 239149.55
8192 1000 457376.07 457429.84 457412.10
16384 1000 717348.74 717371.93 717360.16
register failed 2162688 RTC Error [57] error(0x210000): OpenIB-mlx4_0-1: max amount of the cache entries exceeded

register failed 2162688 RTC Error [705] error(0x210000): OpenIB-mlx4_0-1: max amount of the cache entries exceeded

register failed 2162688 RTC Error [219] error(0x210000): OpenIB-mlx4_0-1: max amount of the cache entries exceeded

register failed 2162688 RTC Error [365] error(0x210000): OpenIB-mlx4_0-1: max amount of the cache entries exceeded

register failed 2162688 RTC Error [112] error(0x210000): OpenIB-mlx4_0-1: max amount of the cache entries exceeded

register failed 2162688 RTC Error [581] error(0x210000): OpenIB-mlx4_0-1: max amount of the cache entries exceeded

4 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

You can try to set I_MPI_DAPL_TRANSLATION_CACHE_MAX_ENTRY_NUM=2000
But remember that maximum message size for alltoall running on big number of processes depends on the volume of memory on each node.

Regards!
Dmitry

Hi Dmitry ,
I set I_MPI_DAPL_TRANSLATION_CACHE_MAX_ENTRY_NUM=2000 env variable but it still giving the same error . On each node 64GB memory is present . I also sarch for I_MPI_DAPL_TRANSLATION_CACHE_MAX_ENTRY_NUM variable but i did not get in intel mpi reference manual . Can you tell me where you get this variable reference.

I_MPI_DAPL_TRANSLATION_CACHE_MAX_ENTRY_NUM is used in Intel MPI version 4.0 and higher.
In versions 3.x I_MPI_RDMA_TRANSLATION_CACHE_MAX_ENTRY_NUM was used.

Pay attantion that IMB may have been built with 3.2 library statically and in this case doesn't matter what version you have installed. To check version number you need to set I_MPI_DEBUG=6.

Regards!
Dmitry

Faça login para deixar um comentário.