Introduction : This article describes a method to compile and run a distributed memory coarray program using Intel® Fortran Compiler XE 12.0. An example using Linux* is presented.
Version : Intel® Fortran Compiler XE 12.0
Application Notes : To compile for distributed memory coarrays, use compiler option -coarrays=distributed (Linux* OS) or /Qcoarrays:distributed (Windows* OS). This requires an Intel® Cluster Toolkit license. To compile for shared memory coarrays, use compiler option -coarrays=shared (Linux* OS) or /Qcoarrays:shared (Windows* OS). Compiling for shared memory coarrays does not require an Intel® Cluster Toolkit license.
Obtaining Source Code : The coarray example from the Composer XE 'coarray_samples' directory could be used.
Prerequisites : An Intel® Cluster Toolkit license is required for compilation, and the Intel® MPI Library must be installed on the cluster nodes.
Configuration Set Up : A key for running a distributed memory coarray program with process pinning on specific nodes is to use compiler option -coarray-config-file=filename (Linux* OS)or /Qcoarray-config-file:filename (Windows* OS). This enables you to take full advantage of Intel® MPI Library features in the coarrays environment, in the same way that 'mpiexec -config filename' allows mpiexec to take its commands from 'filename'.
The contents of the configuration file for this example:
-host host1 -env I_MPI_PIN_PROCESSOR_LIST 0,2,4 -n 3 <path to executable>coarry_dist_host.x : -host host2 -env I_MPI_PIN_PROCESSOR_LIST 1,3,5 -n 3 <path to executable>coarry_dist_host.x
This says to execute six coarray images 'coarry_dist_host.x' on nodes host1 and host2, using processors 0,2,4 on host1, and processors 1,3,5 on host2. The I_MPI_PIN_PROCESSOR_LIST environment variable is used to achieve the process pinning on the indicated nodes.
Source Code Changes : See Verifying Correctness
Building the Application : Compile for distributed coarrays, create one coarray image, and specify the coarray configuration file:
ifort -coarray=distributed -coarray-num-images=1 -coarray-config-file=coarray_config.txt coarry_dist_host.f90 -o coarry_dist_host.x
Running the Application : Simply specify the name of the executable:
> <path to executable>/coarry_dist_host.x
Hello from image 1 out of 6
total images, and running on host: host1
Hello from image 2 out of 6
total images, and running on host: host1
Hello from image 3 out of 6
total images, and running on host: host1
Hello from image 5 out of 6
total images, and running on host: host2
Hello from image 4 out of 6
total images, and running on host: host2
Hello from image 6 out of 6
total images, and running on host: host2
>
Verifying Correctness : Embed 'call hostnm(hostname)' in your coarray program, then print 'hostname' to verify the images are executed on the correct nodes/processors.
Benefits : This method enables coarray image pinning on specific nodes/node processors. Better load balance across cluster nodes might be obtained, or a subset of nodes easily partitioned.
Known Issues or Limitations :
-- Some users have reported MPI environment issues when trying to run the executable in a standalone fashion. These issues are under investigation, but as a workaround try using mpiexec to launch the executable.
--Distributed memory coarrays only work with Intel® MPI; other implementations of MPI are not supported.
Version : Intel® Fortran Compiler XE 12.0
Application Notes : To compile for distributed memory coarrays, use compiler option -coarrays=distributed (Linux* OS) or /Qcoarrays:distributed (Windows* OS). This requires an Intel® Cluster Toolkit license. To compile for shared memory coarrays, use compiler option -coarrays=shared (Linux* OS) or /Qcoarrays:shared (Windows* OS). Compiling for shared memory coarrays does not require an Intel® Cluster Toolkit license.
Obtaining Source Code : The coarray example from the Composer XE 'coarray_samples' directory could be used.
Prerequisites : An Intel® Cluster Toolkit license is required for compilation, and the Intel® MPI Library must be installed on the cluster nodes.
Configuration Set Up : A key for running a distributed memory coarray program with process pinning on specific nodes is to use compiler option -coarray-config-file=filename (Linux* OS)or /Qcoarray-config-file:filename (Windows* OS). This enables you to take full advantage of Intel® MPI Library features in the coarrays environment, in the same way that 'mpiexec -config filename' allows mpiexec to take its commands from 'filename'.
The contents of the configuration file for this example:
-host host1 -env I_MPI_PIN_PROCESSOR_LIST 0,2,4 -n 3 <path to executable>coarry_dist_host.x : -host host2 -env I_MPI_PIN_PROCESSOR_LIST 1,3,5 -n 3 <path to executable>coarry_dist_host.x
This says to execute six coarray images 'coarry_dist_host.x' on nodes host1 and host2, using processors 0,2,4 on host1, and processors 1,3,5 on host2. The I_MPI_PIN_PROCESSOR_LIST environment variable is used to achieve the process pinning on the indicated nodes.
Source Code Changes : See Verifying Correctness
Building the Application : Compile for distributed coarrays, create one coarray image, and specify the coarray configuration file:
ifort -coarray=distributed -coarray-num-images=1 -coarray-config-file=coarray_config.txt coarry_dist_host.f90 -o coarry_dist_host.x
Running the Application : Simply specify the name of the executable:
> <path to executable>/coarry_dist_host.x
Hello from image 1 out of 6
total images, and running on host: host1
Hello from image 2 out of 6
total images, and running on host: host1
Hello from image 3 out of 6
total images, and running on host: host1
Hello from image 5 out of 6
total images, and running on host: host2
Hello from image 4 out of 6
total images, and running on host: host2
Hello from image 6 out of 6
total images, and running on host: host2
>
Verifying Correctness : Embed 'call hostnm(hostname)' in your coarray program, then print 'hostname' to verify the images are executed on the correct nodes/processors.
Benefits : This method enables coarray image pinning on specific nodes/node processors. Better load balance across cluster nodes might be obtained, or a subset of nodes easily partitioned.
Known Issues or Limitations :
-- Some users have reported MPI environment issues when trying to run the executable in a standalone fashion. These issues are under investigation, but as a workaround try using mpiexec to launch the executable.
--Distributed memory coarrays only work with Intel® MPI; other implementations of MPI are not supported.
