Error when trying to start paraller process using systemqq or system -functions in Windows 7

Error when trying to start paraller process using systemqq or system -functions in Windows 7

Hi all!

We have model-fitting software coded with Fortran that is running as paraller job. We got the software compiled and it runs nicely under Windows. However, certain models it is required that two models are solved at the same time. We have solved this in a way that the first parallel run makes on certain positions of the algorithm system calls to start a second parallel run forsolving the second model. After the second model has been solved, information for the first model is updated and the first parallel run continues to solve on the first model. This process continues until both models are solved.

When the second model solver is called the following error occures:
[01:5952]...ERROR:Error while connecting to host, No connection could be made be
cause the target machine actively refused it. (10061)
mpiexec aborting job...

The first program is started with command
mpiexec -localroot -n 2 path\\program.exe < directivefile > output.log

Within the program the second model is called
vmrunint=system("solve_2ndmodel.bat")
and we also tried systemqq() without success.

Within that .bat -file among other things the second model solver is started with command:
mpiexec -localroot -n 2 path\\program.exe< directivefile

The executable is same but it is working in with different data. This system works perfectly in linux. And we hope that it is possible to run it in same manner in windows. Are there any tricks that we could try?

The MPI library is
Intel MPI Library for Windows* OS, Version 4.0 Update 3 Build 8/24/2011 3:07:12 PM

The fortran compiler is:
Intel Visual Fortran Intel 64 Compiler XE for applications running on Intel 64, Version 12.1.3.300 Build 20120130

The program is compiled with:
mpifc.bat /O2 program.f90

Thanks in advance
TimoP

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Timo,

Try adding -verbose to the mpiexec arguments in your batch file. What does that give as output?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Hi James,

here is the output:

[01:6100]...ERROR:Error while connecting to host, No connection could be made be cause the target machine actively refused it. (10061)
[01:6100]...ERROR:Error while connecting to host, No connection could be made be
cause the target machine actively refused it. (10061)
[01:6100]...ERROR:Connect on sock (host=hostname fe80::1463:8205:ea48:3abd%13 192.168.106.74 , port=61630) failed, exhaused all end points
SMPDU_Sock_post_connect failed.
[0] PMI_ConnectToHost failed: unable to post a connect to hostname fe80::1463:8205:ea48:3abd%13 192.168.106.74 :61630, error: Undefined dynamic error code
uPMI_ConnectToHost returning PMI_FAIL
[0] PMI_Init failed.
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(645): Initialization failed
MPID_Init(134).......: channel initialization failed
MPID_Init(430).......: PMI_Init returned -1

Yours,
TimoP

Hi Timo,

I am able to reproduce this behavior on both Linux* and Windows*. I would recommend changing from system calls to using MPI_Comm_spawn.

integer parent, child

character*256 comstr, args(nargs)

logical launch_sub
...
call MPI_Get_parent(parent,ierr)

if (parent .eq. MPI_COMM_NULL) then

   ! This job was not launched by MPI_Comm_spawn, do appropriate init

   launch_sub=.true.

else

   ! This job was launched by MPI_Comm_spawn, do appropriate init

   launch_sub=.false.

end if
...
if (launch_sub) then

   comstr="program.exe"

   args(1)=" arg1 "

   ...

   nsub=2

   call MPI_Comm_spawn(comstr,args,nsub,MPI_INFO_NULL,0,MPI_COMM_WORLD,child,MPI_ERRCODES_IGNORE,ierr)

   call MPI_Barrier(child,ierr) ! Waits for the child to finish

end if
...
! Finishing up
if (parent .ne. MPI_COMM_NULL) then

   call MPI_Barrier(parent,ierr) ! Allows the parent to proceed

end if
MPI_Finalize()

I cannot get redirection of standard input to work here, so you might need to change your input method (if I recall correctly, it should be fairly simple to pass a filename as an argument and open that file on standard input from Fortran).

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Hi James,

We try to modify our inner program call in a way you suggested.

Thank you for your answer
Regards
Timo P

Leave a Comment

Please sign in to add a comment. Not a member? Join today