I'd appreciate comments, suggestions, and/or criticisms on the results of some 'playing' with COARRAYS in Intel Visual Fortran Composer XE 2013 SP1 Update 3 integrated with MS VS Pro 2013. All of the results were obtained with the i7-860 2.8 GHz CPU and 8 GB RAM; Windows 7 64 bit OS.
The two attached *.f90 files summarize the timing results in comments at the tops of the files, particularly, the "compare_to_coarray.f90" file. 'Project/property' Fortran 'command lines' are in comments at the ends of the files. Notice that the times are for the actual multiplications excluding initializations, distributions, etc.
Note that lines 24 and 28 of "test coarray matmul.f90" define variables without codimensions. I've been unable to find anything in various documentations about doing that. When I give all of the variables (except dsecnd) codimensions [*], the program still runs correctly, but the time is more than double that without those codimensions (8.2 vs 3.8 secs for 2000x2000 matrices).
All of the times given here are for 'release' 'x64' configs. Win32 configs run nearly 2x faster for both the coarray and non-coarray programs. Times vary slightly from run to run and are probably significant only to ~0.3 secs.
Unless I've done something dumb (a distinct possibility), I don't understand why anyone would use COARRAYS. The only argument that I've seen for them is the 'ease' of programming for those accustomed to FORTRAN. I certainly didn't experience 'ease', at least in this application. I found openMP much easier to learn, much more versatile than coarrays, and much better documented both online and in books. In this simple app, even Intel's auto-parallelism is much more impressive than coarrays.