The error message of Intel MPI

The error message of Intel MPI

Аватар пользователя Seifer Lin

Hi All:

I run the following command at Windows 7 x64.

mpiexec.exe -localonly -n 2 MyMPIApp.exe

And I got some problem.

result command received but the wait_list is empty.

unable to handle the command: "cmd=result src=1 dest=0
tag=0 cmd_tag=0 cmd_orig=

start_dbs kvs_name=30BA3E56-CB7A-40dd-9D07-8F5342D03976
domain_name=CB42CD80-05B

0-4e8b-9F76-CCD5882F9593 result=SUCCESS "

error closing the unknown context socket: Error = -1

sock_op_close returned while unknown context is in state:
SMPD_IDLE

Is there any way to do further debugging? Thank you!


regards,

Seifer

7 сообщений / 0 новое
Последнее сообщение
Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.
Аватар пользователя James Tullos (Intel)

Hi Seifer,

There are several debugging options you can try. Using the environment variable I_MPI_DEBUG at runtime will generate some debugging information, generally 5 is a good starting value.

mpiexec -n 2 -env I_MPI_DEBUG 5 test.exe
You can compile with -check_mpi in order to link to correctness checking libraries. You can get logs from the smpd by using

smpd -traceon              (as administrator)

mpiexec -n 2 test.exe

smpd -traceoff                           (as administrator)
I wouldrecommend using this, as the error you are getting appears to be from smpd. What version of the Intel MPI Library are you using? Have you been able to run one of the provided sample programs located in the test folder in the installation path?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Аватар пользователя Seifer Lin

Hi James: The followings are the log from smpd. [01:183868]./SMPDU_Sock_post_connect [01:183868].\smpd_enter_at_state [01:183868]..sock_waiting for the next event. [01:183868]..\SMPDU_Sock_wait [01:183868]../SMPDU_Sock_wait [01:183868]..SOCK_OP_CONNECT [01:183868]..\smpd_handle_op_connect [01:183868]../smpd_handle_op_connect [01:183868]./smpd_enter_at_state [01:183868].\smpd_create_command [01:183868]..\smpd_init_command [01:183868]../smpd_init_command [01:183868]./smpd_create_command [01:183868].\smpd_add_command_int_arg [01:183868]./smpd_add_command_int_arg [01:183868].\smpd_add_command_arg [01:183868]./smpd_add_command_arg [01:183868].\smpd_add_command_arg [01:183868]./smpd_add_command_arg [01:183868].\smpd_add_command_arg [01:183868]./smpd_add_command_arg [01:183868].\smpd_add_command_int_arg [01:183868]./smpd_add_command_int_arg [01:183868].\smpd_post_write_command [01:183868]..\smpd_package_command [01:183868]../smpd_package_command [01:183868]..\SMPDU_Sock_get_sock_id [01:183868]../SMPDU_Sock_get_sock_id [01:183868]..smpd_post_write_command on the pmi context sock 880: 118 bytes for command: "cmd=init src=1 dest=0 tag=0 ctx_key=0 name=57D3EA7D-F5DC-40ef-BEB6-A661AA3784B1 key=7 value=8 node_id=1 " [01:183868]..\SMPDU_Sock_post_writev [01:183868]../SMPDU_Sock_post_writev [01:183868]./smpd_post_write_command [01:183868].\smpd_post_read_command [01:183868]..\SMPDU_Sock_get_sock_id [01:183868]../SMPDU_Sock_get_sock_id [01:183868]..posting a read for a command header on the pmi context, sock 880 [01:183868]..\SMPDU_Sock_post_read [01:183868]...\SMPDU_Sock_post_readv [01:183868].../SMPDU_Sock_post_readv [01:183868]../SMPDU_Sock_post_read [01:183868]./smpd_post_read_command [01:183868].\smpd_enter_at_state [01:183868]..sock_waiting for the next event. [01:183868]..\SMPDU_Sock_wait [01:183868]../SMPDU_Sock_wait [01:183868]..SOCK_OP_READ event.error = 0, result = 0, context->type=15 [01:183868]..\smpd_handle_op_read [01:183868]...\smpd_state_reading_cmd_header [01:183868]....read command header [01:183868]....command header read, posting read for data: 66 bytes [01:183868]....\SMPDU_Sock_post_read [01:183868].....\SMPDU_Sock_post_readv [01:183868]...../SMPDU_Sock_post_readv [01:183868]..../SMPDU_Sock_post_read [01:183868].../smpd_state_reading_cmd_header [01:183868]../smpd_handle_op_read [01:183868]..sock_waiting for the next event. [01:183868]..\SMPDU_Sock_wait [01:183868]../SMPDU_Sock_wait [01:183868]..SOCK_OP_READ event.error = 0, result = 0, context->type=15 [01:183868]..\smpd_handle_op_read [01:183868]...\smpd_state_reading_cmd [01:183868]....read command [01:183868]....\smpd_parse_command [01:183868]..../smpd_parse_command [01:183868]....read command: "cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS " [01:183868]....\smpd_handle_command [01:183868].....handling command: [01:183868]..... src = 0 [01:183868]..... dest = 1 [01:183868]..... cmd = result [01:183868]..... tag = 9 [01:183868]..... ctx = pmi [01:183868]..... len = 66 [01:183868]..... str = cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS [01:183868].....\smpd_command_destination [01:183868]......1 -> 1 : returning NULL context [01:183868]...../smpd_command_destination [01:183868].....\smpd_handle_result [01:183868]......ERROR:result command received but the wait_list is empty. [01:183868]...../smpd_handle_result [01:183868]..../smpd_handle_command [01:183868]....ERROR:unable to handle the command: "cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS " [01:183868].../smpd_state_reading_cmd [01:183868]../smpd_handle_op_read [01:183868]..SOCK_OP_READ failed - result = -1, closing pmi context. [01:183868]..\SMPDU_Sock_post_close [01:183868]...\SMPDU_Sock_post_read [01:183868]....\SMPDU_Sock_post_readv [01:183868]..../SMPDU_Sock_post_readv [01:183868].../SMPDU_Sock_post_read [01:183868]../SMPDU_Sock_post_close [01:183868]..sock_waiting for the next event. [01:183868]..\SMPDU_Sock_wait [01:183868]../SMPDU_Sock_wait [01:183868]..SOCK_OP_CLOSE [01:183868]..\smpd_handle_op_close [01:183868]...\smpd_get_state_string [01:183868].../smpd_get_state_string [01:183868]...op_close received - SMPD_CLOSING state. [01:183868]...Unaffiliated pmi context closing. [01:183868]...\smpd_free_context [01:183868]....freeing pmi context. [01:183868]....\smpd_init_context [01:183868].....\smpd_init_command [01:183868]...../smpd_init_command [01:183868]..../smpd_init_context [01:183868].../smpd_free_context [01:183868]../smpd_handle_op_close [01:183868]..sock_waiting for the next event. [01:183868]..\SMPDU_Sock_wait [01:183868]../SMPDU_Sock_wait [01:183868]..SOCK_OP_CLOSE [01:183868]..\smpd_handle_op_close [01:183868]...\smpd_get_state_string [01:183868].../smpd_get_state_string [01:183868]...op_close received - SMPD_IDLE state. [01:183868]...\smpd_get_state_string [01:183868].../smpd_get_state_string [01:183868]...ERROR:sock_op_close returned while unknown context is in state: SMPD_IDLE [01:183868]...\smpd_free_context [01:183868]....freeing a context not in the global list - this should be impossible. [01:183868].../smpd_free_context [01:183868]../smpd_handle_op_close [01:183868]..sock_waiting for the next event. [01:183868]..\SMPDU_Sock_wait

[01:183868]./SMPDU_Sock_post_connect [01:183868].\smpd_enter_at_state [01:183868]..sock_waiting for the next event. [01:183868]..\SMPDU_Sock_wait [01:183868]../SMPDU_Sock_wait [01:183868]..SOCK_OP_CONNECT [01:183868]..\smpd_handle_op_connect [01:183868]../smpd_handle_op_connect [01:183868]./smpd_enter_at_state [01:183868].\smpd_create_command [01:183868]..\smpd_init_command [01:183868]../smpd_init_command [01:183868]./smpd_create_command [01:183868].\smpd_add_command_int_arg [01:183868]./smpd_add_command_int_arg [01:183868].\smpd_add_command_arg [01:183868]./smpd_add_command_arg [01:183868].\smpd_add_command_arg [01:183868]./smpd_add_command_arg [01:183868].\smpd_add_command_arg [01:183868]./smpd_add_command_arg [01:183868].\smpd_add_command_int_arg [01:183868]./smpd_add_command_int_arg [01:183868].\smpd_post_write_command [01:183868]..\smpd_package_command [01:183868]../smpd_package_command [01:183868]..\SMPDU_Sock_get_sock_id [01:183868]../SMPDU_Sock_get_sock_id [01:183868]..smpd_post_write_command on the pmi context sock 880: 118 bytes for command: "cmd=init src=1 dest=0 tag=0 ctx_key=0 name=57D3EA7D-F5DC-40ef-BEB6-A661AA3784B1 key=7 value=8 node_id=1 " [01:183868]..\SMPDU_Sock_post_writev [01:183868]../SMPDU_Sock_post_writev [01:183868]./smpd_post_write_command [01:183868].\smpd_post_read_command [01:183868]..\SMPDU_Sock_get_sock_id [01:183868]../SMPDU_Sock_get_sock_id [01:183868]..posting a read for a command header on the pmi context, sock 880 [01:183868]..\SMPDU_Sock_post_read [01:183868]...\SMPDU_Sock_post_readv [01:183868].../SMPDU_Sock_post_readv [01:183868]../SMPDU_Sock_post_read [01:183868]./smpd_post_read_command [01:183868].\smpd_enter_at_state [01:183868]..sock_waiting for the next event. [01:183868]..\SMPDU_Sock_wait [01:183868]../SMPDU_Sock_wait [01:183868]..SOCK_OP_READ event.error = 0, result = 0, context->type=15 [01:183868]..\smpd_handle_op_read [01:183868]...\smpd_state_reading_cmd_header [01:183868]....read command header [01:183868]....command header read, posting read for data: 66 bytes [01:183868]....\SMPDU_Sock_post_read [01:183868].....\SMPDU_Sock_post_readv [01:183868]...../SMPDU_Sock_post_readv [01:183868]..../SMPDU_Sock_post_read [01:183868].../smpd_state_reading_cmd_header [01:183868]../smpd_handle_op_read [01:183868]..sock_waiting for the next event. [01:183868]..\SMPDU_Sock_wait [01:183868]../SMPDU_Sock_wait [01:183868]..SOCK_OP_READ event.error = 0, result = 0, context->type=15 [01:183868]..\smpd_handle_op_read [01:183868]...\smpd_state_reading_cmd [01:183868]....read command [01:183868]....\smpd_parse_command [01:183868]..../smpd_parse_command [01:183868]....read command: "cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS " [01:183868]....\smpd_handle_command [01:183868].....handling command: [01:183868]..... src = 0 [01:183868]..... dest = 1 [01:183868]..... cmd = result [01:183868]..... tag = 9 [01:183868]..... ctx = pmi [01:183868]..... len = 66 [01:183868]..... str = cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS [01:183868].....\smpd_command_destination [01:183868]......1 -> 1 : returning NULL context [01:183868]...../smpd_command_destination [01:183868].....\smpd_handle_result [01:183868]......ERROR:result command received but the wait_list is empty. [01:183868]...../smpd_handle_result[01:183868]..../smpd_handle_command [01:183868]....ERROR:unable to handle the command: "cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS " [01:183868].../smpd_state_reading_cmd [01:183868]../smpd_handle_op_read [01:183868]..SOCK_OP_READ failed - result = -1, closing pmi context. [01:183868]..\SMPDU_Sock_post_close [01:183868]...\SMPDU_Sock_post_read [01:183868]....\SMPDU_Sock_post_readv [01:183868]..../SMPDU_Sock_post_readv [01:183868].../SMPDU_Sock_post_read [01:183868]../SMPDU_Sock_post_close [01:183868]..sock_waiting for the next event. [01:183868]..\SMPDU_Sock_wait [01:183868]../SMPDU_Sock_wait [01:183868]..SOCK_OP_CLOSE [01:183868]..\smpd_handle_op_close [01:183868]...\smpd_get_state_string [01:183868].../smpd_get_state_string [01:183868]...op_close received - SMPD_CLOSING state. [01:183868]...Unaffiliated pmi context closing. [01:183868]...\smpd_free_context [01:183868]....freeing pmi context. [01:183868]....\smpd_init_context [01:183868].....\smpd_init_command [01:183868]...../smpd_init_command [01:183868]..../smpd_init_context [01:183868].../smpd_free_context [01:183868]../smpd_handle_op_close [01:183868]..sock_waiting for the next event. [01:183868]..\SMPDU_Sock_wait [01:183868]../SMPDU_Sock_wait [01:183868]..SOCK_OP_CLOSE [01:183868]..\smpd_handle_op_close [01:183868]...\smpd_get_state_string [01:183868].../smpd_get_state_string [01:183868]...op_close received - SMPD_IDLE state. [01:183868]...\smpd_get_state_string [01:183868].../smpd_get_state_string [01:183868]...ERROR:sock_op_close returned while unknown context is in state: SMPD_IDLE [01:183868]...\smpd_free_context [01:183868]....freeing a context not in the global list - this should be impossible. [01:183868].../smpd_free_context [01:183868]../smpd_handle_op_close [01:183868]..sock_waiting for the next event. [01:183868]..\SMPDU_Sock_wait regards, Seifer

Аватар пользователя Seifer Lin

Hi James: Yesterday we've done another test with -genv I_MPI_FABRICS shm And we still got the same error: [01:441348]......ERROR:result command received but the wait_list is empty. [01:441348]....ERROR:unable to handle the command: "cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS "unable to read the cmd header on the pmi context, Error = -1. [01:441348]...ERROR:sock_op_close returned while unknown context is in state: SMPD_IDLE regards, Seifer

Аватар пользователя James Tullos (Intel)

Hi Seifer,

Have you tried using

mpiexec -n 2 -env I_MPI_DEBUG 5 test.exe

Both with your program and with the test program provided with the Intel MPI Library? What output do you get from this?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Аватар пользователя Seifer Lin

Hi James: The options for mpiexec.exe are -genv I_MPI_DEBUG 5 -genv I_MPI_PLATFORM auto -genv I_MPI_FABRICS shm -genv I_MPI_WAIT_MODE 1 Since the problems occurred in the computer of our customer, I testedonlyour program. But the problems doesn't occur every time in batch jobs. Is "the test program provided with the Intel MPI Library" you mentionedIMB-MPI1.exe ? regards, Seifer

Аватар пользователя James Tullos (Intel)

Hi Seifer,

I was intending you tocompile one of the files in\test\ and use it. These are simple hello world MPI programs that can test basic connectivity and MPI functionality.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Зарегистрируйтесь, чтобы оставить комментарий.