coarray distributed memory library

coarray distributed memory library

I've successfully used a coarray library with
shared memory and ifort 12. Now I've got
licence for distributed memory and ifort 14,
but my coarray library doesn't work anymore.

I can compile and run standalone coarray code
with distributed memory. However, I'm not clear
how to use the coarray-config-file with the library.

A typical scenario: I have coarray library code
under ~/project/lib and coarray code using the
library under ~/project/tests.
I compile under ~/project/lib with

ifort -c -coarray=distributed -debug full -free -fPIC -warn all

I put the resulting module files under ~/project/modules.
I put the resulting object files into a unix archive and under

Then under ~/project/tests I build and link the
coarray code using the library code like this:

ifort -c z.f90 -coarray=distributed -I~/project/modules -coarray-config-file=ca.conf -debug full -warn all
ifort -o z.x z.o -coarray=distribued -L~/project/libs -l

I get an executable, but when I run it with this ca.conf:
-envall -n 64 ./z.x 4 4

and with an appropriate PBS script, I get various runtime
errors, e.g.:

rank 0 in job 1 node32-034_45144 caused collective abort of all ranks
exit status of rank 0: killed by signal 9

I will investigate the code, of course, but just wanted
to check that I'm using the logic of -coarray-config-file correctly
for building/linking coarray library code.



3 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

In your description, the "link" step has "distributed" spelled incorrectly; I assume that's a transcription error.

That aside - your basic syntax looks fine.

The configuration file is an MPI configuration file; it's not a secret that we use MPI as the underlying transport mechanism.  If you suspect the problem is in the underlying transport, you can use the I_MPI_DEBUG environment variable to help get more info.

This configuration file does not look like it "distributes".  That it, I might have expected -hostX options.

What happens if you use the configuration file, but shared memory?  Does your program still fail?


Thank you for the confirmation.
I think this is the same problem as in my PR:

What happens is that because remote reads take
so long, the program exceeds the queue allocation.
The error message then reflects that.

I have no problems with shared memory runs.
In fact I only got access to the distributed memory
licence this week, so these are my first attempts
to move from shared to distributed memory, and
from ifort 12 to 14.

I'll try I_MPI_DEBUG though.



Lascia un commento

Eseguire l'accesso per aggiungere un commento. Non siete membri? Iscriviti oggi