Where is the /Qcoarray:distributed option?

Where is the /Qcoarray:distributed option?

Hi All,

I have installed the Intel Cluster Studio XE 2012 for Windows (file "w_ics_2012.0.033.exe") using Evaluation license file received from Intel, but I cant evaluate the cluster work of Fortran in the Properties of my new project (RClick --> Properties --> Configuration Properties --> Fortran --> Language --> Enable Coarrays) I don't see the option for Distributed Memory (/Qcoarray:distributed) only "No" and "For Shared Memory (/Qcoarray:shared)" for both Win32 and x64 solution platforms.

My cluster system consists of 2 computers:
1) Head node: Windows Server 2008 R2 with SP1 + HPC Pack 2008 R2 with SP3 + Visual Studio 2010 with SP1;
2) Workstation node: Windows 7 (x64) with SP1 + HPC Pack 2008 R2 with SP3.

The Intel Cluster Studio was being installed on the head node, but automatically it was installed on the workstation node too.

If I insert the /Qcoarray:distributed option manually (RClick --> Properties --> Configuration Properties --> Fortran --> Command Line --> Additional Options: /Qcoarray:distributed), a test program works on the head node only, although the corresponding machines.Windows file (the environment (system) variable FOR_COARRAY_MACHINEFILE is assigned) has 2 lines with the computer node names.

The result of command "clusrun smpd status" is

----- Summary ----
2 Nodes succeeded
0 Nodes failed

What is wrong and what should I do to see the "/Qcoarray:distributed" option?

31 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

Please ask Windows Fortran questions in the Windows Fortran forum - questions posted elsewhere may not get proper attention. For information on how to build and run a coarray program on a Windows cluster, please read this article (linked from the compiler release notes), though you may have already seen this. A critical aspect is that the full path to the executable must be valid on all nodes of the cluster. This means that the same drive letters must be also defined on the cluster nodes.

If you still need help, please post in the Windows Fortran forum.

Steve - Intel Developer Support

Thanks for your advice, but I think my question concerns the Cluster Studio environment/installer or/and integration of the Cluster Studio into the Visual Studio rather than the Fortran compiler the compiler is working well.

In addition to the integration: if I assign a file name in the MPI Configuration File option (RClick --> Properties --> Configuration Properties --> Fortran --> Language -->MPI Configuration File), for example, MPIConfigFile, the compiler looks for MPIConfigFile\\Node0\CcpSpoolDir\Coar1\x64\Debug\Coar1.exe, where \\Node0\CcpSpoolDir\ is a shared directory on the head node accessible to the workstation node, and \\Node0\CcpSpoolDir\Coar1\x64\Debug\ is the correct executable file (Coar1.exe) path. The result of compilation: Can't open config file MPIConfigFile\\Node0\CcpSpoolDir\Coar1\x64\Debug\Coar1.exe: No such file or directory.

Intel Cluster Studio is a bundle of products, one of which is Intel Visual Forttan. It is the Fortran product that provides the VS integration and the coarray features.

I would urge you to try the experiments from the command line as described in the article I pointed to. I'm not sure how thoroughly we tested the VS integration for distributed coarray support. I will ask our developer who worked most with this support to read this thread and see what she can suggest.

Steve - Intel Developer Support

Of course I had read that article before I asked the question here (corresponding link to the article is in the documentation).

Below are the results of some experiments with the "coarray_samples" sample included in the software. WinSer2008R2 is the head node, Win7 is the workstation node. The "Additional Options" for the compiler: /Qcoarray:distributed /Qcoarray-num-images:8

1) Start from the Visual Studio: Debug --> Start Debugging
Result: task hangs.
Ctrl^C gives:

mpiexec aborting job...

job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: Win7: 123
2: WinSer2008R2.mynet.dom: 123
3: Win7: 123
4: WinSer2008R2.mynet.dom: 123
5: Win7: 123
6: WinSer2008R2.mynet.dom: 123
7: Win7: 123

2) Command: mpiexec -host WinSer2008R2 -n 3 -genv FOR_ICAF_STATUS launched -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\coarray_samples\x64\Debug\coarray_samples.exe
Result: OK:

[1#6880:5060@WinSer2008R2] MPI startup(): shm data transfer mode
[2#2040:6184@WinSer2008R2] MPI startup(): shm data transfer mode
[0#5100:7192@WinSer2008R2] MPI startup(): shm data transfer mode
[2#2040:6184@WinSer2008R2] MPI startup(): process is pinned to CPU02 on node WinSer2008R2
[0#5100:7192@WinSer2008R2] MPI startup(): process is pinned to CPU00 on node WinSer2008R2
[1#6880:5060@WinSer2008R2] MPI startup(): process is pinned to CPU01 on node WinSer2008R2

[0#5100:7192@WinSer2008R2] Rank Pid Node name Pin cpu
[0#5100:7192@WinSer2008R2] 0 5100 WinSer2008R2 0
[0#5100:7192@WinSer2008R2] 1 6880 WinSer2008R2 1
[0#5100:7192@WinSer2008R2] 2 2040 WinSer2008R2 2
[0#5100:7192@WinSer2008R2] MPI startup(): I_MPI_DEBUG=+5
[0#5100:7192@WinSer2008R2] MPI startup(): NUMBER_OF_PROCESSORS=4
[0#5100:7192@WinSer2008R2] MPI startup(): PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 23 Stepping 7, GenuineIntel

Hello from image 2 out of 3 total images
Hello from image 1 out of 3 total images
Hello from image 3 out of 3 total images

3) Command:mpiexec -host Win7 -n 3 -genv FOR_ICAF_STATUS launched -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\coarray_samples\x64\Debug\coarray_samples.exe
Result: OK:

[2#14200:10520@Win7] MPI startup(): shm data transfer mode
[0#14816:3836@Win7] MPI startup(): shm data transfer mode
[1#11572:4816@Win7] MPI startup(): shm data transfer mode
[2#14200:10520@Win7] MPI startup(): set domain to {4,5} on node Win7
[0#14816:3836@Win7] MPI startup(): set domain to {0,1} on node Win7
[1#11572:4816@Win7] MPI startup(): set domain to {2,3} on node Win7

[0#14816:3836@Win7] RankPid Node name Pin cpu
[0#14816:3836@Win7] 014816 Win7 {0,1}
[0#14816:3836@Win7] 111572 Win7 {2,3}
[0#14816:3836@Win7] 214200 Win7 {4,5}
[0#14816:3836@Win7] MPI startup(): I_MPI_DEBUG=+5
[0#14816:3836@Win7] MPI startup(): NUMBER_OF_PROCESSORS=8
[0#14816:3836@Win7] MPI startup(): PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 42 Stepping 7, GenuineIntel

Hello from image 2 out of 3 total images
Hello from image 3 out of 3 total images
Hello from image 1 out of 3 total images

4) Command:mpiexec -hosts 2 WinSer2008R2 3 Win7 3 -genv FOR_ICAF_STATUS launched -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\coarray_samples\x64\Debug\coarray_samples.exe
Result: task hangs.
Ctrl^C gives:

[0#7356:5672@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[1#7588:7328@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[2#7752:7672@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[3#15240:15828@Win7] MPI startup(): shm and tcp data transfer modes
[5#13376:14660@Win7] MPI startup(): shm and tcp data transfer modes
[4#13488:13232@Win7] MPI startup(): shm and tcp data transfer modes
mpiexec aborting job...

job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: WinSer2008R2.mynet.dom: 123
2: WinSer2008R2.mynet.dom: 123
3: Win7: 123
4: Win7: 123
5: Win7: 123

As it seems from item 4, I have some problems with tcp. What should I need to check and adjust?

Thanks

Hi --

I just wanted to let you know that Steve pointed me at this thread; I was the lucky developer who hooked
up DCAF on Windows.

I have reproduced the situation you've found, and am looking at how to resolve it. Note, I've reproduced it in a straight mpiprogram; no coarrays to be seen, so that complication is removed.

You did put this in the right forum; there are some really good people here, and actually, you might see a question I post too, looking for help to resolve this.

As an aside,I can use a machinefile if there is only one node in the file; it doesn't have to be this current node, so yeah, I have to agree with you that there is an interesting configuration issue.

By the way; this link (also in this forum)has some interesting info:
http://software.intel.com/en-us/forums/showthread.php?t=81922

I'll post more as I learn more ---

Thanks for using the Windows DCAF -

--Lorri

Hi Lorri,

Thank you for your time.
I have tried the straight mpi program too (although it is not a subject of this thread) test.f90 included in the software but the result shown below is the same: two nodes (item 3) do not work together.

1) Command: mpiexec -host WinSer2008R2 -n 3 -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\test\x64\Debug\test.exe
Result: OK:

[1#5168:2528@WinSer2008R2] MPI startup(): shm data transfer mode
[0#5612:2732@WinSer2008R2] MPI startup(): shm data transfer mode
[2#5744:3288@WinSer2008R2] MPI startup(): shm data transfer mode
[2#5744:3288@WinSer2008R2] MPI startup(): Internal info: pinning initialization was done
[0#5612:2732@WinSer2008R2] MPI startup(): Internal info: pinning initialization was done
[1#5168:2528@WinSer2008R2] MPI startup(): Internal info: pinning initialization was done

[0#5612:2732@WinSer2008R2] MPI startup(): Rank Pid Node name Pin cpu
[0#5612:2732@WinSer2008R2] MPI startup(): 0 5612 WinSer2008R2 0
[0#5612:2732@WinSer2008R2] MPI startup(): 1 5168 WinSer2008R2 1
[0#5612:2732@WinSer2008R2] MPI startup(): 2 5744 WinSer2008R2 2
[0#5612:2732@WinSer2008R2] MPI startup(): I_MPI_DEBUG=+5
[0#5612:2732@WinSer2008R2] MPI startup(): I_MPI_PIN_MAPPING=3:0 0,1 1,2 2
[0#5612:2732@WinSer2008R2] MPI startup(): PMI_RANK=0

Hello world: rank 0 of 3 running on
WinSer2008R2.mynet.dom

Hello world: rank 1 of 3 running on
WinSer2008R2.mynet.dom

Hello world: rank 2 of 3 running on
WinSer2008R2.mynet.dom

2) Command: mpiexec -host Win7 -n 3 -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\test\x64\Debug\test.exe
Result: OK:

[1#11724:10472@Win7] MPI startup(): shm data transfer mode
[0#11556:5968@Win7] MPI startup(): shm data transfer mode
[2#9576:1500@Win7] MPI startup(): shm data transfer mode
[1#11724:10472@Win7] MPI startup(): Internal info: pinning initialization was done
[0#11556:5968@Win7] MPI startup(): Internal info: pinning initialization was done
[2#9576:1500@Win7] MPI startup(): Internal info: pinning initialization was done
[0#11556:5968@Win7] MPI startup(): Rank Pid Node name Pin cpu
[0#11556:5968@Win7] MPI startup(): 0 11556 Win7 {0,1}
[0#11556:5968@Win7] MPI startup(): 1 11724 Win7 {2,3}
[0#11556:5968@Win7] MPI startup(): 2 9576 Win7 {4,5}
[0#11556:5968@Win7] MPI startup(): I_MPI_DEBUG=+5
[0#11556:5968@Win7] MPI startup(): I_MPI_PIN_MAPPING=3:0 0,1 2,2 4
[0#11556:5968@Win7] MPI startup(): PMI_RANK=0

Hello world: rank 0 of 3 running on
Win7.mynet.dom

Hello world: rank 1 of 3 running on
Win7.mynet.dom

Hello world: rank 2 of 3 running on
Win7.mynet.dom

3) Command: mpiexec -hosts 2 WinSer2008R2 3 Win7 3 -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\test\x64\Debug\test.exe
Result: task hangs.
Ctrl^C gives:

[2#4792:4804@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[0#3356:3696@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[1#5956:6004@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[4#10340:5736@Win7] MPI startup(): shm and tcp data transfer modes
[5#9228:9816@Win7] MPI startup(): shm and tcp data transfer modes
[3#11112:8964@Win7] MPI startup(): shm and tcp data transfer modes
mpiexec aborting job...

job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: WinSer2008R2.mynet.dom: 123
2: WinSer2008R2.mynet.dom: 123
3: Win7: 123
4: Win7: 123
5: Win7: 123

In accordance with the advice in http://software.intel.com/en-us/forums/showthread.php?t=81922 you had referred to, I used -genv I_MPI_PLATFORM 0, and added the DNS suffix to the node names in the mpiexec command it did not help.

Thanks

Hi obmeninfor,

I've seen similar behavior while doing some testing for a different issue. Could you try running the following commands?

mpiexec -genvnone -hosts 2 WinSer2008R2 3 Win7 3 -genv I_MPI_DEBUG +5 hostname
mpiexec -genvnone -hosts 2 WinSer2008R2 3 Win7 3 -genv I_MPI_DEBUG +5 WinSer2008R2CcpSpoolDirtestx64Debugtest.exe

Adding -genvnone is a quick check to prevent copying the environment variables from one system to another. If the MPI installations are in different locations on each computer, the environment variables from one will prevent it from being located on the other. See the thread http://software.intel.com/en-us/forums/showthread.php?t=85990&o=a&s=lr for more detail on the mismatch.

The first command will just insure that you can run across multiple hosts simultaneously. The second will insure that the processes can communicate with each other. Please let me know what happens from these commands.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Hi James,

Thank you for your advice.
The results of the commands:

1.mpiexec -genvnone -hosts 2 WinSer2008R2 3 Win7 3 -genv I_MPI_DEBUG +5 hostname
Result:

WinSer2008R2
WinSer2008R2
WinSer2008R2
Win7
Win7
Win7

2. mpiexec -genvnone -hosts 2 WinSer2008R2 3 Win7 3 -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\test\x64\Debug\test.exe
Result: task hangs.
Ctrl^C gives:

[2#1052:4080@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[0#780:3824@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[1#4788:4988@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[3#12960:13116@Win7] MPI startup(): shm and tcp data transfer modes
[5#3604:9880@Win7] MPI startup(): shm and tcp data transfer modes
[4#12716:10456@Win7] MPI startup(): shm and tcp data transfer modes
mpiexec aborting job...

job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: WinSer2008R2.mynet.dom: 123
2: WinSer2008R2.mynet.dom: 123
3: Win7: 123
4: Win7: 123
5: Win7: 123

The MPIlocation is "c:\Program Files (x86)\Intel\MPI" on each node.

Thanks

Hi obmeninfor,

I believe that the problem you are experiencing is due to your firewall. As one more check, please allow the program test.exe through your firewall on both computers, and try running the second command again. You can leave off the -genvnone option, it should have no effect here.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Hi James,

The firewall rule for the program was Enabled on WinSer2008R2. Adding the similar rule on Win7 changed the output but didn't change the result.

Command: mpiexec -hosts 2 WinSer2008R2 3 Win7 3 -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\test\x64\Debug\test.exe
Result: task hangs.
Ctrl^C gives:

[0#5924:6008@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[2#288:1144@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[1#5596:3536@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[4#3768:5408@Win7] MPI startup(): shm and tcp data transfer modes
[5#4036:736@Win7] MPI startup(): shm and tcp data transfer modes
[3#1236:4204@Win7] MPI startup(): shm and tcp data transfer modes

[2#288:1144@WinSer2008R2] MPI startup(): Internal info: pinning initialization was done
[4#3768:5408@Win7] MPI startup(): Internal info: pinning initialization was done
[0#5924:6008@WinSer2008R2] MPI startup(): Internal info: pinning initialization was done
[1#5596:3536@WinSer2008R2] MPI startup(): Internal info: pinning initialization was done
[5#4036:736@Win7] MPI startup(): Internal info: pinning initialization was done
[3#1236:4204@Win7] MPI startup(): Internal info: pinning initialization was done
mpiexec aborting job...

job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: WinSer2008R2.mynet.dom: 123
2: WinSer2008R2.mynet.dom: 123
3: Win7: 123
4: Win7: 123
5: Win7: 123

Thanks

Hi obmeninfor,

Do you also have your firewalls set to allow smpd and mpiexec? Are you using the native Windows* firewall, or a different one?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Hi James,

I removed-genv I_MPI_DEBUG +5 from the command and the straight mpi (test.exe) began to work (without the firewalls set to smpd and mpiexec)! Thank you very much for your previous advice about the firewall.

The behavior of the coarray_samples program (see above) changed as well, but one problem remains the program does not terminate:

1) start from VS: Debug --> Start Debugging
Result: the program prints 8 lines Hello (itswork) and hangs on both computers:

Hello from image 3 out of 8 total images
Hello from image 1 out of 8 total images
Hello from image 7 out of 8 total images
Hello from image 5 out of 8 total images
Hello from image 2 out of 8 total images
Hello from image 6 out of 8 total images
Hello from image 8 out of 8 total images
Hello from image 4 out of 8 total images

Ctrl^C gives:

mpiexec aborting job...

job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: Win7: 123
2: WinSer2008R2.mynet.dom: 123
3: Win7: 123
4: WinSer2008R2.mynet.dom: 123
5: Win7: 123
6: WinSer2008R2.mynet.dom: 123
7: Win7: 123

2) command: mpiexec -hosts 2 WinSer2008R2 4 Win7 4 -genv FOR_ICAF_STATUS launched \\WinSer2008R2\CcpSpoolDir\coarray_samples\x64\Debug\coarray_samples.exe
Result: the program does its main work (8 lines Hello ) and hangs on both computers:

Hello from image 3 out of 8 total images
Hello from image 1 out of 8 total images
Hello from image 2 out of 8 total images
Hello from image 7 out of 8 total images
Hello from image 8 out of 8 total images
Hello from image 6 out of 8 total images
Hello from image 5 out of 8 total images
Hello from image 4 out of 8 total images

Ctrl^C gives:

mpiexec aborting job...

job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: WinSer2008R2.mynet.dom: 123
2: WinSer2008R2.mynet.dom: 123
3: WinSer2008R2.mynet.dom: 123
4: Win7: 123
5: Win7: 123
6: Win7: 123
7: Win7: 123

3) command: mpiexec -hosts 2 WinSer2008R2 4 Win7 4 -genv FOR_ICAF_STATUS launched -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\coarray_samples\x64\Debug\coarray_samples.exe
Result: the program does its main work (8 lines Hello ) and hangs on both computers:

[3#6788:6184@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[2#6500:4260@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[0#5300:6652@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[1#5448:5812@WinSer2008R2] MPI startup(): shm and tcp data transfer modes
[5#7796:7544@Win7] MPI startup(): shm and tcp data transfer modes
[7#7920:7668@Win7] MPI startup(): shm and tcp data transfer modes
[6#3568:6996@Win7] MPI startup(): shm and tcp data transfer modes
[4#7816:7808@Win7] MPI startup(): shm and tcp data transfer modes
[5#7796:7544@Win7] MPI startup(): set domain to {2,3} on node Win7
[6#3568:6996@Win7] MPI startup(): set domain to {4,5} on node Win7
[3#6788:6184@WinSer2008R2] MPI startup(): process is pinned to CPU03 on node WinSer2008R2
[1#5448:5812@WinSer2008R2] MPI startup(): process is pinned to CPU01 on node WinSer2008R2
[2#6500:4260@WinSer2008R2] MPI startup(): process is pinned to CPU02 on node WinSer2008R2
[0#5300:6652@WinSer2008R2] MPI startup(): process is pinned to CPU00 on node WinSer2008R2
[7#7920:7668@Win7] MPI startup(): set domain to {6,7} on node Win7
[4#7816:7808@Win7] MPI startup(): set domain to {0,1} on node Win7

[0#5300:6652@WinSer2008R2] Rank Pid Node name Pin cpu
[0#5300:6652@WinSer2008R2] 0 5300 WinSer2008R2 0
[0#5300:6652@WinSer2008R2] 1 5448 WinSer2008R2 1
[0#5300:6652@WinSer2008R2] 2 6500 WinSer2008R2 2
[0#5300:6652@WinSer2008R2] 3 6788 WinSer2008R2 3
[0#5300:6652@WinSer2008R2] 4 7816 Win7 {0,1}
[0#5300:6652@WinSer2008R2] 5 7796 {2,3}
[0#5300:6652@WinSer2008R2] 6 3568 {4,5}
[0#5300:6652@WinSer2008R2] 7 7920 {6,7}
[0#5300:6652@WinSer2008R2] MPI startup(): I_MPI_DEBUG=+5
[0#5300:6652@WinSer2008R2] MPI startup(): NUMBER_OF_PROCESSORS=4
[0#5300:6652@WinSer2008R2] MPI startup(): PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 23 Stepping 7, GenuineIntel

Hello from image 1 out of 8 total images
Hello from image 2 out of 8 total images
Hello from image 4 out of 8 total images
Hello from image 3 out of 8 total images
Hello from image 8 out of 8 total images
Hello from image 6 out of 8 total images
Hello from image 5 out of 8 total images
Hello from image 7 out of 8 total images

Ctrl^C gives:

mpiexec aborting job...

job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: WinSer2008R2.mynet.dom: 123
2: WinSer2008R2.mynet.dom: 123
3: WinSer2008R2.mynet.dom: 123
4: Win7: 123
5: Win7: 123
6: Win7: 123
7: Win7: 123

The"Allow"firewall rules to smpd and mpiexec do not help. I am using the native Windows firewall.

Please let me know whatshouldI do else?

Thanks

Hi obmeninfor,

What happens if you run from the command line without mpiexec? I have not worked with coarrays before, but the sample does not run for me if I use mpiexec, it does run without it. This is only on a single computer, I will try it on multiple.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Hi James,

I.Below is the result of \\WinSer2008R2\CcpSpoolDir\coarray_samples\x64\Debug\coarray_samples.exe
from the command line. The program does its main work (8 lines Hello ) and hangs on both computers:

Hello from image 5 out of 8 total images
Hello from image 3 out of 8 total images
Hello from image 1 out of 8 total images
Hello from image 8 out of 8 total images
Hello from image 2 out of 8 total images
Hello from image 7 out of 8 total images
Hello from image 4 out of 8 total images
Hello from image 6 out of 8 total images

Ctrl^C gives:

mpiexec aborting job...
forrtl: error <200>: program aborting due to control-C event
In coarray image 1\nImage PCRoutine Line
Source
libifcoremdd.dll 00000000100E0407 Unknown Unknown Unknown
libifcoremdd.dll 00000000100DA252 Unknown Unknown Unknown
libifcoremdd.dll 00000000100C3261Unknown Unknown Unknown
libifcoremdd.dll 0000000010028316 Unknown Unknown Unknown
libifcoremdd.dll 000000001003BC54Unknown Unknown Unknown
kernel32.dll0000000076AA47C3Unknown Unknown Unknown
kernel32.dll0000000076A6652D Unknown Unknown Unknown
ntdll.dll0000000076CFC521Unknown Unknown Unknown

c:\Program Files\Microsoft HPC Pack 2008 R2\Data\SpoolDir\coarray_samples\x64\Debug>

job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: Win7: 123
2: WinSer2008R2.mynet.dom: 123
3: Win7: 123
4: WinSer2008R2.mynet.dom: 123
5: Win7: 123
6: WinSer2008R2.mynet.dom: 123
7: Win7: 123

II. About the-genv I_MPI_DEBUG +5 option in mpiexec -hosts 2 WinSer2008R2 3 Win7 3 -genv I_MPI_DEBUG +5 \\WinSer2008R2\CcpSpoolDir\test\x64\Debug\test.exe:why doesitcause the hanging up of the program on both computers?

Thanks

Hi obmeninfor,

Setting I_MPI_DEBUG to 5 should not cause a hang. This is possibly indicative of a deeper problem.What are your environment variables (just run set in a command prompt)?

As a side note, I am able torun the coarray sample program on a pair of Windows* 7 virtual machines with no problems. I did have to specifically tell the firewall to allow the coarray program, but with the firewall blocking it the program would hang at start, not at exit.

I have used two different methods for compiling and running the program. The first was

ifort /Qcoarray=distributed /Qcoarray-num-images=8 hello_image.f90 -o hello_image1.exe
ifort /Qcoarray=distributed /Qcoarray-config-file=cafconfig.txt hello_image.f90 -o hello_image2.exe

The file cafconfig.txt contained the following:

-n 8 -machinefile mpd.hosts hello_image2.exe

And mpd.hosts contained the names of the two computers, one per line. FOR_COARRAY_MACHINEFILE was set to point to the mpd.hosts file. Both of these forms ran with no problems. Could you try compiling from the command line (just to be certain there are no stray flags causing a problem from VS)? Either of these methods should lead to the same result.

Sincerely,
JamesTullos
Technical Consulting Engineer
Intel Cluster Tools

Hi James,

The environment variables are set by c:\Program Files (x86)\Intel\icsxe\2012.0.033\bin\ictvars.bat. I have only added FOR_COARRAY_MACHINEFILE.

Unfortunately, I cant do without VS because "c:\Program Files (x86)\Intel\Composer XE 2011 SP1\bin\intel64\ifort" /Qcoarray=distributed /Qcoarray-num-images=8 hello_image.f90 -o hello_image.exe requires link which is in the VS directory only. So, I will continue my coarrays experiments with VS.

Thank you very much for your help.

obmeninfor

Hi obmeninfor,

Running ictvars.bat should automatically set the Path to include link. If it does not, try running

C:Program Files (x86)Microsoft Visual Studio 10.0VCbinamd64vcvars64.bat

Or the equivalentfor your desired architecture target. I need to do some more testing, but attempting to compile the coarray sample program in Visual Studio* 2010with distributed coarrays does not allow me to run across multiple computers at all.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Hi obmeninfor,

My problem with running the executable from Visual Studio* was just that, my problem. The default for the sample is to compile 32-bit, and one of my test systems only had the 64-bit runtime libraries available. Once this was corrected (compiled 64-bit within VS), everything works as expected. So this is not likely to be the cause of what you are experiencing (though it would be prudent to verify that you do have the correct runtime libraries available for each system).

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Hi obmeninfor,

Let's take a look at the SMPD now. On each of the computers, run (as Administrator) the command

smpd -traceon 

You can name the logfile whatever you want, just make it distinct for each computer. This will turn on SMPD logging. Run the coarray_samples program. When it hangs and you've killed it, run

smpd -traceoff

to turn logging off. Attach the two files and I'll see if there's anything in there that could help diagnose what's happening.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Edit note: edited to correct code type in first code section

Hi James,

Please see 3 files.

Thank you.

Allegati: 

Hi obmeninfor,

What is the output from "smpd -get binary" on each of the computers?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Hi James,

1) smpd get binary on WinSer2008R2:
C:\Program Files (x86)\Intel\MPI\4.0.3.009\em64t\bin\smpd.exe
2) smpd get binary on Win7:
C:\Program Files (x86)\Intel\icsxe\2012.0.033\mpi\em64t\bin\smpd.exe

What is about the log files?

Thank you.

Hi obmeninfor,

I've attachedthe smpd log files from one of my runs if you are interested. What appears to be happening is that the processes are either not launching correctly or not attaching to the smpd correctly. In the logfile, you should see at the beginning of each line the rank and PID of the process involved in the event in the form [rank:PID]. When the rank is -1, the actual processes have not yet been started, the messages are related to the smpd itself.

On your Windows* 7 log, there are no processes started within the smpd. On your Windows* Server 2008, only 1 process is started, and it is incorrectly numbered (C numbering, the first process should be 00, not 01). With the PID changing in the Windows* Server 2008 log, it appears that the processes are not starting as a group, but as individual processes started simultaneously, but expecting a group.

Additionally, the Windows* Server 2008 log appears to be corrupted (see lines 371, 401, 404, and 416 for some examples).

Can you run the following commands on each computer?

smpd -V
smpd -version
mpiexec -V

I'm hoping to find a version mismatch that could be the cause of the problem, though it seems very odd that a standard MPI program would run and a coarray (using MPI) would not.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Allegati: 

AllegatoDimensione
Download host_log.txt49.13 KB
Download target_log.txt454.9 KB

Hi James,

Thank you for your analysis of the log files. It is really strangely as I wrote above, the coarray program runs and works but does not close.

On WinSer2008R2:

1)smpd V
Intel MPI Library for Windows* OS, Version 4.0 Update 2 Build 4/28/2011 6:04:28 PM
Copyright (C) 2007-2011, Intel Corporation. All rights reserved.

2)smpd version
3.1

3)mpiexec V
Intel MPI Library for Windows* OS, Version 4.0 Update 2 Build 4/28/2011 6:04:28 PM
Copyright (C) 2007-2011, Intel Corporation. All rights reserved.

On Win7:

1)smpd V
Intel MPI Library for Windows* OS, Version 4.0 Update 2 Build 4/28/2011 6:04:28 PM
Copyright (C) 2007-2011, Intel Corporation. All rights reserved.

2)smpd version
3.1

3)mpiexec V
Intel MPI Library for Windows* OS, Version 4.0 Update 2 Build 4/28/2011 6:04:28 PM
Copyright (C) 2007-2011, Intel Corporation. All rights reserved.

Any ideas?

Thanks

Hi obmeninfor,

Everything appears to match up between the two systems. Let's try one more experiment. Modifyhello_image.f90 to this:

      program hello_image

      character*256 hostname
      character*512 outstr
      integer length, status

      call get_environment_variable("COMPUTERNAME",hostanme,length,status,.true.)

      write(outsr,*) "Image ",this_image()," of ", num_images()," on ",trim(hostname)
      write(*,*) trim(outstr)

      end program hello_image

Try running this program and let me know what happens. As I said earlier, it looks like the processes are not being properly distributed, and this will show where each process is really running.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Hi James,

Your code prints 8 lines, - and hangs:
 Image 5 of 8 on WinSer2008R2
Image 3 of 8 on WinSer2008R2
Image 7 of 8 on WinSer2008R2
Image 1 of 8 on WinSer2008R2
Image 8 of 8 on Win7
Image 2 of 8 on Win7
Image 6 of 8 on Win7
Image 4 of 8 on Win7

Ctrl^C prints additional lines:

mpiexec aborting job...
job aborted:
rank: node: exit code[: error message]
0: WinSer2008R2.mynet.dom: 123: mpiexec aborting job
1: Win7: 123
2: WinSer2008R2.mynet.dom: 123
3: Win7: 123
4: WinSer2008R2.mynet.dom: 123
5: Win7: 123
6: WinSer2008R2.mynet.dom: 123
7: Win7: 123
Thanks.

Hi obmeninfor,

Would it be possible to try running the executable from a non-shared location? Use the same path on each computer, and have a copy of the executable in each folder.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Hi James,

I have done in accordance with your advice (setthe project Output File option: C:\$(ProjectName).exe, and copied thecoarray_samples.exeinto C:\ on Win7)- the result is the same.

Thanks

Hi --

I told you these guys were awesome; I'm glad to see they got a straight-MPI program running for you.

Anyway - since the problem now seems to be strictly coarray-related, I'm back.

What version of Intel Fortran do you have? From the command window, would you issue the following command, and let me know what comes back?

ifort -what

This should return the full logo-banner, and our internal edit number, something similar to this:

F:\tests>ifort -what end.f -c

Intel Visual Fortran Compiler XE for applications running on IA-32, Version 12.0.2.154 Build 20110112

Copyright (C) 1985-2011 Intel Corporation. All rights reserved.

Intel Visual Fortran 12.0-1259

We found some reports of coarrays hanging in our internal database, but they were fixed/available about a year ago. The "Build" date of the full logo-banner would let us know that we weren't chasing known problems.

There are a couple of other things we can try after we're sure it's not one of these older problems.

Again -- thank you for using the Windows DCAF!

-- Lorri

Hi Lorri,

I have installedw_ics_2012.0.033.exe.

Command ifort -what hello_image.f90 c prints:

Intel Visual Fortran Intel 64 Compiler XE for applications running on Intel 64,Version 12.1.0.233 Build 20110811
Copyright (C) 19852011 Intel Corporation. All rights reserved.

Intel Fortran 12.1-2054

Thanks.

Lascia un commento

Eseguire l'accesso per aggiungere un commento. Non siete membri? Iscriviti oggi