I am testing co-array using /Qcoarray:shared on my core 2 duo PC. It looks like Intel's co-array is just a wrapper to mpiexec.
For example, if I run a co-array application, there will be 3 processes launched, presumably, one master controlling process, and 2 worker processes. From what I can tell, the co-array are not using multi-thread, but rather, process based. If I do a ctrl-C in the middle of the run, it aborts 2 process, and a message saying mpiexec abort is done.
So, am I right that Intel's co-array (with /Qcoarray:shared) is process based MPI wrapper?
Basically on above observation, I expect the co-array code will be much slower than the multi-threading based OpenMP code.
I actually tested some small codes in "A Comparison of Co-Array Fortran and OpenMP
Fortran for SPMD Programming" (http://www7300.nrlssc.navy.mil/global_nlom/globalnlom/pubs-data/wallcraf...). In deed, the co-array version of the calculating PI is about 30-40% slower than the OpenMP version.
A stunning and disappointing finding.