Hello everyone,
My name is Leonardo from Brazil and this is my first post.
I'm running an MPI program with Intel implementation (Intel MPI Library for Linux, 64-bit applications, Version 4.0 Update 1 Build 20100910).
The "program" is well-know algorithm called k-means. The algorithm is used to identify natural clusters within sets of data points; its input is a set of data points and an integer k, and its output is an assignment of each point to one of k clusters.
When I run it with 160 data-points on 3 nodes, everything goes fine. With 1.6K was ok too, but when I run it with 160K data-points the fowlling error appears:
Assertion failed in file ../../dapl_module_poll.c at line 3608: *p_vc_unsignal_sr_before_read == 0
internal ABORT - process 1
[2:super3] unexpected disconnect completion event from [1:super3]
Assertion failed in file ../../dapl_module_util.c at line 2682: 0
internal ABORT - process 2
[0:super3] unexpected disconnect completion event from [1:super3]
Assertion failed in file ../../dapl_module_util.c at line 2682: 0
internal ABORT - process 0
srun: error: super3: tasks 0-2: Exited with exit code 1
srun: Terminating job step 75987.0
I have no idea what can be...
Has anyone experienced this before? Or any idea what is going on? ...
Thanks and sorry about my english.
Obrigado,
Leonardo Fernandes


