I would like to parallelise certain parts of my fortran project and to start with changed the enable coarrays parameter to For Shared Memory (/Qcoarray:shared).  The program recompiled and linked successfully but when I run it it fails dramatically at a read from disc file.  It seems that the file is trying to be write to the file in parallel although I have not requested this as I get a number of severe(47) write to read only errors.

I suppose I am being naive but I hoped that without adding any coarray commands the program would simply run as before?

Thanks,  ACAR.

10 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

That's not how it works. If you enable the /Qcoarray option, then your program is run in parallel, and file activity will interfere. Coarray applications typically use a test such as IF (THIS_IMAGE() == 1) to conditionalize file I/O.

Given your needs, you would be better off using OpenMP, or even autoparallelism. I'd suggest starting with the latter and see if that gets you the performance boost you want. With OpenMP, you have to explicitly specify that a loop is to be parallelized.

Retired 12/31/2016

Do you have any WRITE or PRINT statements in your program at all?

When you add the /Qcoarray switch the program becomes configured to run itself multiple times in parallel.

Without any coarray references, these images are quite likely to be at very different places in the execution and it's possible that one is doing a READ while another is doing a WRITE to that same file.

Thank you both for your comments/suggestions.  I assumed incorrectly it turns out that even if coarrays were enabled with the /Qcoarray option then only the bits of code where I used coarrays would become parallel.  Question: if I wanted to pursue coarrays would the answer be to put these parts in a seperate file and create a DLL.  In the meantime I will take a look at autoparallelism.  

I have changed the compiler options to enable autoparallelism (I think):

/nologo /debug:full /MP /O3 /Qparallel /Qopt-prefetch=2 /assume:buffered_io /heap-arrays0 /Qip /fpp /I"C:\Program Files\gino\v7.5\modules" /I"C:\RMA\Programs\EFE_V1.0\ansys" /I"C:\Program Files\ANSYS Inc\v110\ANSYS\custom\include" /I"C:\Program Files\Intel\Composer XE 2013 SP1\mkl\include" /I"C:\Program Files\Intel\Composer XE 2013 SP1\compiler\include" /recursive /reentrancy:threaded /debug-parameters:all /warn:declarations /warn:unused /warn:ignore_loc /warn:noalignments /warn:uncalled /warn:interfaces /Qguide:1 /fp:fast=2 /module:"Debug/" /object:"Debug/" /Fd"Debug\vc90.pdb" /traceback /check:bounds /check:uninit /libs:static /threads /dbglibs /winapp /Qmkl:parallel /c

On compilation I got the following message:

Number of advice-messages emitted for this compilation session: 0.

So no advice given?

Right - no advice. You could turn on optimization reports to look at what the compiler did, or profile the program in VTune Amplifier XE to see how efficient it is.

Retired 12/31/2016

Belay my last.  I've found out how to do this.  Seems I just need to pursue PURE routines and use FORALL loops...

FORALL doesn't usually offer as much facility for auto-parallelization as DO CONCURRENT, in cases where the latter is valid.

Thanks Tim I'll take a look at DO CONCURRENT

I have implemented DO CONCURRENT for critical work loops in my code and it does indeed speed execution significantly.  Many thanks.

Leave a Comment

Please sign in to add a comment. Not a member? Join today