Problem with distributed memory coarray program

Problem with distributed memory coarray program

Hi,I am having problems running a simple "Hello from this image" coarray program on distributed memory.

ifort -coarray=distributed -coarray-config-file=~/caf_config test_images.f90
mpdboot --file=mpd.hosts -n 8
mpiexec_??????.??????.???: mpd_uncaught_except_tb handling:
  exceptions.IOError: [Errno 2] No such file or directory: '/home/sliska/caf_config./a.out'
    /home/sliska/intel/impi/  480  mpiexec
        configFile = open(sys.argv[2],'r',0)
    /home/sliska/intel/impi/  3303  ?

I replaced some private information above with "?". The caf_config files contains "-nolocal -envall -n 8 ./a.out". Note that by copying the "caf_config" file to "/home/sliska/caf_config/.a.out" the program runs as expected:

mkdir caf_config.
cp caf_config caf_config./a.out
 Hello from image            1 out of            8  total images
 Hello from image            2 out of            8  total images
 Hello from image            3 out of            8  total images
 Hello from image            4 out of            8  total images
 Hello from image            8 out of            8  total images
 Hello from image            5 out of            8  total images
 Hello from image            6 out of            8  total images
 Hello from image            7 out of            8  total images

I am using Intel Fortran Composer XE for Linux 2011 6.233,andIntel MPI Library for Linux I been following the "Distributed Memory Coarray Fortran with the Intel Fortran Compiler for Linux: Essential Guide" (,Sebastian

8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

that is really odd.

Your test_images.f90 is in your home directory? /home/sliska?

and the caf_config is also in /home/sliska too, right? It looks like it is.

And you are using bash shell?
echo $0

will show this.

And on your cluster, /home/sliska is mounted on all the nodes in the system via NFS?


Hi Ron,
The test file (test_images.f90) and the configfile (caf_config) are on my home directory. I'm using bahs shell. And my home directory is mounted on all the nodes. The cluster I using runsCentOS release 5.6 (Final) and has server nfs v3, client nfs v3.I have the same installation on my desktop (running Ubuntu 11.10), and had the same error message as I originally reported when attempting to run any coarray program using distributed memory.I should mention that all the mpi programs I have tried (using Intel MPI for linux and compiled with mpiifort) have worked in both the cluster and my desktop.Having gone through the Intel coarray documentation and articles it is my understanding the when I run the executable (./a.out in my example) it calls or does something like 'mpiexec ....'. I have no problems running mpi programs with mpiexec. My guess is that the executable wants to do something like 'mpiexec -configfile caf_config', but instead it is doing 'mpiexec -configfilecaf_config./a.out'.As I mentioned in my original message coarray programs executes correctly if I copy 'caf_config' to 'caf_config./a.out'. The motivation for trying this follows from what I think is happening when mpiexec gets called. Thanks again,Sebastian

I got my problem to go away.The short summary is that in my .bashrc (both on my desktop and cluster accounts) I source various Intel scripts. The variable scripts for vtune and inspector always output some copyright information (like "Copyright (C) 2009-2011 Intel Corporation. All rights reserved. ..."). To avoid having this header appear when every time I open a terminal, I changed it so the output of "source ...' was sent to /dev/null. To my surprise, making this change allowed me to run the coarray programs on distributed memory without any problems. When I remove the "&>/dev/null" at the end of the source commands (so the headers get printed on the screen again) my original problem and error come back.I should also mentioned that this change also fixed some problems I had running mpi jobs though the queue system (torque) on the cluster I am using. Yet, I was able to run the mpi jobs outside queue system without any problems.Thanks,Sebastian

glad to hear it's working - I was not able to reproduce what you were seeing, and now we know why.

Great to hear you're testing CAF. Just an FYI - we have been working on core functionality for CAF features. We have done zero work on performance and optimizations. So I would expect your CAF code will not scale well at this time. We hope to do work on performance in the coming years.


Iappreciateyour help and information aboutcoarrays.I realize that the implementation ofcoarraysis still in its early stages. The reason I am using coarraysright now is to becomefamiliarwith them, so that I can more quickly develop new parallel codes, and convert serial codes into parallel codes in the coming years.
I have read that it might be possible to use MPI commands in CAF programs, I am wondering if this is possible with the Intel compiler and libraries (this might be necessary in order to get some reasonable scaling when performing some collective operations like global reductions and all-to-all)?Thanks,Sebastian

We don't support making explicit MPI calls with CAF, but it may work.

Steve - Intel Developer Support

I have exactly the same problem but i can't find the way to overcome this issue.
MPI works correctly.I have the simple "Hello world" for distributed memory.A configuration file ("config") with 4 nodes:-n 1 -host nodo0 ./hello : \-n 1 -host nodo1 ./hello : \-n 2 -host nodo2 ./hello : \-n 3 -host nodo3 ./hello
i compiled withifort hello.f90 -coarray=distributed -coarray-config-file=config -o helloin a directory mounted on all the nodes in the clusterthen i run ./helloand it tells me:"fileconfig./hello" not foundIs there any solution?

Leave a Comment

Please sign in to add a comment. Not a member? Join today