ifort 13.0, 14.0 coarray extremly slow read/write between nodes

ifort 13.0, 14.0 coarray extremly slow read/write between nodes

This is my test code: $ cat ca_check.f90 program z implicit none integer :: x(10)[*], img, nimgs, i real :: time1, time2 img = this_image() nimgs = num_images() x = img if (img .eq. 1) then do i=1,nimgs call cpu_time(time1) x = x(:)[i] call cpu_time(time2) write (*,"(a,f)") "Remote read took, s : ", time2-time1 call cpu_time(time1) x(:)[i] = x call cpu_time(time2) write (*,"(a,f)") "Remote write took, s : ", time2-time1 write (*,"(99999(i0,tr1))") x end do end if sync all write (*,"(a,i0,a,i0,a)") "Image: ", img, " out of ", nimgs, "completed ok" end program z $ Compiled with: ifort -o ca_check.xcack ca_check.f90 -coarray=distributed -coarray-config-file=ca.conf -debug full -warn all $ cat ca.conf -envall -n 64 ./ca_check.xcack $ $ cat zpbs #!/bin/sh #PBS -l walltime=00:01:00,nodes=4:ppn=16 #PBS -j oe #PBS -m abe cd $HOME/nobackup/cgpack/branches/coarray/tests echo "LD_LIBRARY_PATH: " $LD_LIBRARY_PATH > zzz echo "which mpirun: " `which mpirun` >> zzz export I_MPI_DAPL_PROVIDER=ofa-v2-ib0 mpdboot --rsh=ssh --file=$PBS_NODEFILE -n 4 mpdtrace -l >> zzz cm-launcher ./ca_check.xcack >> zzz mpdallexit $ $ cat zzz LD_LIBRARY_PATH: /cm/shared/apps/torque/4.2.4.1/lib:/cm/shared/apps/moab/7.2.2/lib:/cm/shared/tools/subversion-1.8.4/lib:/cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib/intel64:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib/intel64 which mpirun: /cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/bin/mpirun node32-035_47536 (10.131.0.179) node33-002_50475 (10.131.0.98) node33-003_55287 (10.131.0.99) node34-006_42324 (10.131.0.54) Remote read took, s : 0.0010000 Remote write took, s : 0.0000000 1 1 1 1 1 1 1 1 1 1 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 0 0 0 0 0 0 0 0 0 0 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 3 3 3 3 3 3 3 3 3 3 Remote read took, s : 0.0000000 Remote write took, s : 0.0010000 0 0 0 0 0 0 0 0 0 0 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 5 5 5 5 5 5 5 5 5 5 Remote read took, s : 0.0000000 Remote write took, s : 0.0010000 0 0 0 0 0 0 0 0 0 0 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 7 7 7 7 7 7 7 7 7 7 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 0 0 0 0 0 0 0 0 0 0 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 9 9 9 9 9 9 9 9 9 9 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 10 10 10 10 10 10 10 10 10 10 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 11 11 11 11 11 11 11 11 11 11 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 12 12 12 12 12 12 12 12 12 12 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 13 13 13 13 13 13 13 13 13 13 Remote read took, s : 0.0009990 Remote write took, s : 0.0000000 14 14 14 14 14 14 14 14 14 14 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 15 15 15 15 15 15 15 15 15 15 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 16 16 16 16 16 16 16 16 16 16 Remote read took, s : 0.0000000 Remote write took, s : 0.0000000 17 17 17 17 17 17 17 17 17 17 Remote read took, s : 13.3259735 Remote write took, s : 12.9360342 18 18 18 18 18 18 18 18 18 18 Remote read took, s : 13.8728924 Remote write took, s : 12.5950813 19 19 19 19 19 19 19 19 19 19 Remote read took, s : 14.5117950 Remote write took, s : 12.9060364 20 20 20 20 20 20 20 20 20 20 $ Note that: - values read from processors 2,4,6,8 are just wrong. They are all zero, but must be equal to the processor number. - There are 16 cores in a node. Read/write to/from the first 16 processors are very fast, <1us. Read/write to/from processor 17, which probably is the first processor in another node, is still fast, but every other processor beyond that takes over 10 seconds for read or write. I've checked with both 13.0 and 14.0. I'm happy to provide further details of MPI setup. Thanks Anton

12 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

The format is all wrong. I'll try again.

The problem: remote read or write operations
across node boundaries take over 10 sec!

The code:

$ cat z.f90
program z
implicit none
integer :: x(10)[*], img, nimgs, i
real :: time1, time2
img = this_image()
nimgs = num_images()
x = img
if (img .eq. 1) then
do i=1,nimgs
call cpu_time(time1)
x = x(:)[i]
call cpu_time(time2)
write (*,"(a,f)") "Remote read took, s : ", time2-time1
call cpu_time(time1)
x(:)[i] = x
call cpu_time(time2)
write (*,"(a,f)") "Remote write took, s : ", time2-time1
write (*,"(99999(i0,tr1))") x
end do
end if
sync all
write (*,"(a,i0,a,i0,a)") "Image: ", img, " out of ", nimgs, "completed ok"
end program z
$

The purpose of the code is to time remote read/write,
and to check the correctness of the answer.

I have 16 cores per node. I'm running on 4 nodes, 64 cores.

Compilation and linking:
ifort -o z.x z.f90 -coarray=distributed -coarray-config-file=ca.conf -debug full -warn all

$ cat ca.conf
-envall -n 64 ./z.x
$

PBS job submission script:
$ cat zpbs
#!/bin/sh
#PBS -l walltime=00:01:00,nodes=4:ppn=16
#PBS -j oe
#PBS -m abe

echo "LD_LIBRARY_PATH: " $LD_LIBRARY_PATH > zzz
echo "which mpirun: " `which mpirun` >> zzz
export I_MPI_DAPL_PROVIDER=ofa-v2-ib0

mpdboot --rsh=ssh --file=$PBS_NODEFILE -n 4
mpdtrace -l >> zzz
cm-launcher ./z.x >> zzz
mpdallexit
$

And this is the result.
Note that read/write time goes from under 1us to over 10s.
Also note the results from images 2,4,6,8 are wrong.
These are all zeros, but should have been equal
to image number.

$ cat zzz
LD_LIBRARY_PATH: /cm/shared/apps/torque/4.2.4.1/lib:/cm/shared/apps/moab/7.2.2/lib:/cm/shared/tools/subversion-1.8.4/lib:/cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib/intel64:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib/intel64
which mpirun: /cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/bin/mpirun
node32-035_47536 (10.131.0.179)
node33-002_50475 (10.131.0.98)
node33-003_55287 (10.131.0.99)
node34-006_42324 (10.131.0.54)
Remote read took, s : 0.0010000
Remote write took, s : 0.0000000
1 1 1 1 1 1 1 1 1 1
Remote read took, s : 0.0000000
Remote write took, s : 0.0000000
0 0 0 0 0 0 0 0 0 0
Remote read took, s : 0.0000000
Remote write took, s : 0.0000000
3 3 3 3 3 3 3 3 3 3
Remote read took, s : 0.0000000
Remote write took, s : 0.0010000
0 0 0 0 0 0 0 0 0 0
Remote read took, s : 0.0000000
Remote write took, s : 0.0000000
5 5 5 5 5 5 5 5 5 5
Remote read took, s : 0.0000000
Remote write took, s : 0.0010000
0 0 0 0 0 0 0 0 0 0
Remote read took, s : 0.0000000
Remote write took, s : 0.0000000
7 7 7 7 7 7 7 7 7 7
Remote read took, s : 0.0000000
Remote write took, s : 0.0000000
0 0 0 0 0 0 0 0 0 0
Remote read took, s : 0.0000000
Remote write took, s : 0.0000000
9 9 9 9 9 9 9 9 9 9
Remote read took, s : 0.0000000
Remote write took, s : 0.0000000
10 10 10 10 10 10 10 10 10 10
Remote read took, s : 0.0000000
Remote write took, s : 0.0000000
11 11 11 11 11 11 11 11 11 11
Remote read took, s : 0.0000000
Remote write took, s : 0.0000000
12 12 12 12 12 12 12 12 12 12
Remote read took, s : 0.0000000
Remote write took, s : 0.0000000
13 13 13 13 13 13 13 13 13 13
Remote read took, s : 0.0009990
Remote write took, s : 0.0000000
14 14 14 14 14 14 14 14 14 14
Remote read took, s : 0.0000000
Remote write took, s : 0.0000000
15 15 15 15 15 15 15 15 15 15
Remote read took, s : 0.0000000
Remote write took, s : 0.0000000
16 16 16 16 16 16 16 16 16 16
Remote read took, s : 0.0000000
Remote write took, s : 0.0000000
17 17 17 17 17 17 17 17 17 17
Remote read took, s : 13.3259735
Remote write took, s : 12.9360342
18 18 18 18 18 18 18 18 18 18
Remote read took, s : 13.8728924
Remote write took, s : 12.5950813
19 19 19 19 19 19 19 19 19 19
Remote read took, s : 14.5117950
Remote write took, s : 12.9060364
20 20 20 20 20 20 20 20 20 20

Here the allocated time for the job finished,
otherwise the remaining 44 images would
presumably do their read/write too, but it
takes too long to wait.

Thanks

Anton

Insert a SYNC ALL after x=img and see what happens.

Jim Dempsey

www.quickthreadprogramming.com

No, this makes no difference.
The modified code:

program z
implicit none
integer :: x(10)[*], img, nimgs, i
real :: time1, time2
img = this_image()
nimgs = num_images()
x = img
sync all
if (img .eq. 1) then
do i=1,nimgs
call cpu_time(time1)
x = x(:)[i]
call cpu_time(time2)
write (*,"(a,f)") "Remote read took, s : ", time2-time1
call cpu_time(time1)
x(:)[i] = x
call cpu_time(time2)
write (*,"(a,f)") "Remote write took, s : ", time2-time1
write (*,"(99999(i0,tr1))") x
end do
end if
sync all
write (*,"(a,i0,a,i0,a)") "Image: ", img, " out of ", nimgs, "completed ok"
end program z

At runtime I still get enourmous times for remote
read and write calls:

Remote read took, s : 0.0000000
Remote write took, s : 0.0000000
12 12 12 12 12 12 12 12 12 12
Remote read took, s : 0.0000000
Remote write took, s : 0.0000000
13 13 13 13 13 13 13 13 13 13
Remote read took, s : 0.0000000
Remote write took, s : 0.0000000
14 14 14 14 14 14 14 14 14 14
Remote read took, s : 0.0000000
Remote write took, s : 0.0000000
15 15 15 15 15 15 15 15 15 15
Remote read took, s : 0.0000000
Remote write took, s : 0.0000000
16 16 16 16 16 16 16 16 16 16
Remote read took, s : 0.0000000
Remote write took, s : 0.0010000
17 17 17 17 17 17 17 17 17 17
Remote read took, s : 16.4365025
Remote write took, s : 13.6949177
18 18 18 18 18 18 18 18 18 18
Remote read took, s : 15.1436958
Remote write took, s : 13.7209167
19 19 19 19 19 19 19 19 19 19
Remote read took, s : 16.4264984
Remote write took, s : 13.6939240
20 20 20 20 20 20 20 20 20 20
Remote read took, s : 15.7575989
Remote write took, s : 13.6139297
21 21 21 21 21 21 21 21 21 21
Remote read took, s : 13.9138794
Remote write took, s : 13.7969055
22 22 22 22 22 22 22 22 22 22

Perhaps something is wrong with MPI setup?
Anything else I could check?

Thanks

Anton,

I'll take a look at this.  What compiler version are you using?

ron

I tried
13.0.1 20121010
14.0.0 20130728

Thank you

Ron, any progress?

Thanks

Anton

Our mutual friend Stephen reminded me to revisit this post.

First, a little status on Intel's CAF implementation:  Our initial goal was to get a functional CAF implementation that conforms strictly to the Standard.  Performance has not been fully addressed at this time and may take a while to get to acceptable levels for production purposes - particularly for distributed memory systems.

But next, I see some errors in the test and question what it is you're timing.  In particular, let's visit correctness first.  Removing the timing and writes you have this code on Image 1:

do i=1,imgs

  x = x(:)[i]

  x(:)[i] = x

end do

The problem here - CAF remote reads/writes are inherently asychronous or 1-sided.  So the value of X from the read may not have been completed by the time you use X in the next statement on the RHS of the assignment.  So the results are unpredictable.   And a minor point, for Image 1, do you want to test self-read and write (the do loop goes from 1 to imgs, but do we care about image 1 reading/writing itself in shared memory? ).  What I think you want is something like a neighbor exchange, something like this for the read maybe?:

do i = 2,imgs
  sync all
  if ( img = 1 ) then
     !...start timer here
     x = x(:)[i]
     sync images(i) !...wait for remote read to complete
     !...finish timer here, print result?
  else if ( img = i ) then
     sync images(1) !...sync point with image 1
  end if
end do

Remembering that image control statements (like the SYNC IMAGES) imply a SYNC MEMORY.  Maybe I should have used SYNC MEMORY instead, but so it goes.

So the next question is, what do you want to time?  Do you want find the time for the data transfer as we're doing above?  Or do you want to time how long the statement takes to see if it's true asynchronous or synchronous or just darn inefficient?   In the above we're also capturing the time for the 1-1 synchronization, so it's not a good measure of throughput.   Also, note I had a SYNC ALL at the top of the loop to make sure all the images execute the I iterations in lock step.  Thinking of this, I believe it would be OK to remove that.  Then each remote image would quickly drop into the SYNC IMAGES(1) and be waiting for image 1's SYNC IMAGES(img).  That would be faster, obviously.

Tricky stuff.  I might suggest rethinking this experiment to see if we can derive a better test.  ALSO, don't use cpu_time as it gathers the sum of thread times for the process, which with threads running in background to do the IO might give too much time.  I use a wall-clock instead like this contained procedure mytime() :

program foo
use ISO_FORTRAN_ENV
implicit none
integer, parameter :: dp = REAL64
real (kind=dp) :: tstart, tstop, ttime

!... ready to time a block of code
tstart = mytime()
!...do something
tstop = mytime()
ttime = tstop - tstart

contains
  function mytime()  result (tseconds)
    real (dp)       :: tseconds
    integer (INT64) ::  count, count_rate, count_max
    real (dp)       :: tsec, rate

    CALL SYSTEM_CLOCK(count, count_rate, count_max)

    tsec = count
    rate = count_rate
    tseconds = tsec / rate
  end function mytime 

end program foo

 

 

Ron

Sorry for the delay.

I bothered several people here in Bristol,
including Jim Cownie, but got nowhere,
and then have given up on this issue.
Hence I missed your reply.
So thank you for your help.

1. Stephen who?

2. I disagree with your statement that the result of this code is unpredictable:

do i=1,imgs
x = x(:)[i]
x(:)[i] = x
end do

If you bear in mind that this fragment is executing only on one image,
and the fragment is within a single segment, it must
be executed in order. I confirmed this with Dan Nagle
on comp.lang.fortran. So really the fragment must include:

sync all

if ( img .eq. 1 ) then
do i = 2, nimgs
x = x(:)[i]
x(:)[i] = x
end do
end if

sync all

Do you agree?

3. I'm happy with your timer, so my complete program is now:

program z
use iso_fortran_env
implicit none
integer, parameter :: dp = real64
real( kind=dp) :: time1, time2

integer :: x(10)[*], img, nimgs, i
img = this_image()
nimgs = num_images()
x = img

sync all

if ( img .eq. 1 ) then
do i = 2, nimgs
time1 = mytime()
x = x(:)[i]
time2 = mytime()
write (*,"(a,g)") "Remote read took, s : ", time2-time1
time1 = mytime()
x(:)[i] = x
time2 = mytime()
write (*,"(a,g)") "Remote write took, s : ", time2-time1
write (*,"(a,i0,a,10(i0))") "img: ", i, "x:", x(:)[i]
end do
end if

sync all

write (*,"(a,i0,a,i0,a)") "Image: ", img, " out of ", nimgs, "completed ok"

contains

function mytime() result (tseconds)
real( dp ) :: tseconds
integer( INT64 ) :: tsec, trate
CALL SYSTEM_CLOCK( count=tsec, count_rate=trate )
tseconds = real(tsec,kind=dp) / real(trate,kind=dp)
end function mytime

end program z

HOwever, the performance is still the same:

LD_LIBRARY_PATH: /cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/lib:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/mpirt/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/ipp/../compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/ipp/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/mkl/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/tbb/lib/intel64/gcc4.4:/cm/shared/apps/ParaView-4.0.1/ParaView-4.0.1-Linux-64bit/lib:/cm/shared/apps/torque/4.2.4.1/lib:/cm/shared/apps/moab/7.2.2/lib:/cm/shared/tools/subversion-1.8.4/lib:/cm/shared/languages/Intel-Compiler-XE-14/compiler/lib:/cm/shared/languages/Intel-Compiler-XE-14/compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/lib:/cm/shared/languages/Intel-Compiler-XE-14/lib/intel64:/cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib/intel64:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib/intel64
which mpirun: /cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/bin/mpirun
node43-037_33944 (10.131.1.97)
node43-038_56186 (10.131.1.98)
node43-039_35896 (10.131.1.99)
node43-040_49240 (10.131.1.100)
Remote read took, s : .2717971801757812E-04
Remote write took, s : .7009506225585938E-04
img: 2x:2222222222
Remote read took, s : .9059906005859375E-05
Remote write took, s : .1382827758789062E-04
img: 3x:3333333333
Remote read took, s : .6914138793945312E-05
Remote write took, s : .1406669616699219E-04
img: 4x:4444444444
Remote read took, s : .5960464477539062E-05
Remote write took, s : .1192092895507812E-04
img: 5x:5555555555
Remote read took, s : .8106231689453125E-05
Remote write took, s : .1406669616699219E-04
img: 6x:6666666666
Remote read took, s : .8106231689453125E-05
Remote write took, s : .1406669616699219E-04
img: 7x:7777777777
Remote read took, s : .8106231689453125E-05
Remote write took, s : .1287460327148438E-04
img: 8x:8888888888
Remote read took, s : .8106231689453125E-05
Remote write took, s : .2098083496093750E-04
img: 9x:9999999999
Remote read took, s : .9059906005859375E-05
Remote write took, s : .2098083496093750E-04
img: 10x:10101010101010101010
Remote read took, s : .9059906005859375E-05
Remote write took, s : .2217292785644531E-04
img: 11x:11111111111111111111
Remote read took, s : .9059906005859375E-05
Remote write took, s : .2217292785644531E-04
img: 12x:12121212121212121212
Remote read took, s : .8821487426757812E-05
Remote write took, s : .2002716064453125E-04
img: 13x:13131313131313131313
Remote read took, s : .9059906005859375E-05
Remote write took, s : .2193450927734375E-04
img: 14x:14141414141414141414
Remote read took, s : .8106231689453125E-05
Remote write took, s : .2002716064453125E-04
img: 15x:15151515151515151515
Remote read took, s : .9059906005859375E-05
Remote write took, s : .2002716064453125E-04
img: 16x:16161616161616161616
Remote read took, s : .2694129943847656E-04
Remote write took, s : .1330375671386719E-03
img: 17x:17171717171717171717
Remote read took, s : 4.086822986602783
Remote write took, s : 13.64371013641357
img: 18x:18181818181818181818
Remote read took, s : 4.057008028030396
Remote write took, s : 13.62605905532837
img: 19x:19191919191919191919
Remote read took, s : 4.033457994461060
Remote write took, s : 13.60342288017273
img: 20x:20202020202020202020
Remote read took, s : 3.867400169372559
Remote write took, s : 13.55423808097839
img: 21x:21212121212121212121
Remote read took, s : 2.599767923355103
Remote write took, s : 13.53067493438721
img: 22x:22222222222222222222
Remote read took, s : 3.370637893676758
Remote write took, s : 13.62505698204041
img: 23x:23232323232323232323
Remote read took, s : 4.130011081695557
Remote write took, s : 13.79351282119751
img: 24x:24242424242424242424
Remote read took, s : 3.336811780929565
Remote write took, s : 13.72070097923279
img: 25x:25252525252525252525
Remote read took, s : 3.968912124633789
Remote write took, s : 13.58001303672791
img: 26x:26262626262626262626
Remote read took, s : 2.945718050003052
Remote write took, s : 13.59926700592041
img: 27x:27272727272727272727
Remote read took, s : 3.360033988952637
Remote write took, s : 13.64630603790283
img: 28x:28282828282828282828
Remote read took, s : 3.888566970825195
Remote write took, s : 13.63198804855347
img: 29x:29292929292929292929
Remote read took, s : 2.508543968200684
Remote write took, s : 13.61940813064575
img: 30x:30303030303030303030
Remote read took, s : 4.009042024612427
Remote write took, s : 13.61328911781311
img: 31x:31313131313131313131
Remote read took, s : 3.974460840225220
Remote write took, s : 13.60890007019043
img: 32x:32323232323232323232
Remote read took, s : .5388259887695312E-04
Remote write took, s : .1411437988281250E-03
img: 33x:33333333333333333333
Remote read took, s : .5396170616149902
Remote write took, s : 13.62038612365723
img: 34x:34343434343434343434
Remote read took, s : 3.770447015762329

At which point my 5 min, allocated to the job,
which should have been more than enough for
such a simple program, has run out.

This was on 4 16-core nodes, i.e. 64 images in total.

Let me know what you think.

I'll now redo the timing test with your suggested
modification, in case remote read/writes are indeed
out of order, even though they are on the same image
and within the same segment.

Many thanks

Anton

ok, maybe you are right. With your modification, the times are reasonable:

LD_LIBRARY_PATH:  /cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/lib:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/mpirt/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/ipp/../compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/ipp/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/mkl/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/composer_xe_2013_sp1.0.080/tbb/lib/intel64/gcc4.4:/cm/shared/apps/ParaView-4.0.1/ParaView-4.0.1-Linux-64bit/lib:/cm/shared/apps/torque/4.2.4.1/lib:/cm/shared/apps/moab/7.2.2/lib:/cm/shared/tools/subversion-1.8.4/lib:/cm/shared/languages/Intel-Compiler-XE-14/compiler/lib:/cm/shared/languages/Intel-Compiler-XE-14/compiler/lib/intel64:/cm/shared/languages/Intel-Compiler-XE-14/lib:/cm/shared/languages/Intel-Compiler-XE-14/lib/intel64:/cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/compiler/lib/intel64:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib:/cm/shared/apps/intel-cluster-studio/composer_xe_2013.1.117/lib/intel64
which mpirun:  /cm/shared/apps/intel-cluster-studio/impi/4.1.0.024/intel64/bin/mpirun
node46-009_46215 (10.131.1.33)
node46-012_41759 (10.131.1.36)
node46-010_52398 (10.131.1.34)
node46-011_56689 (10.131.1.35)
Remote read took, s :     .1471042633056641E-03
img: 2x:2222222222
Remote write took, s :     .2789497375488281E-04
img: 2x:2222222222
Remote read took, s :     .7867813110351562E-05
img: 3x:3333333333
Remote write took, s :     .1621246337890625E-04
img: 3x:3333333333
Remote read took, s :     .9059906005859375E-05
img: 4x:4444444444
Remote write took, s :     .1502037048339844E-04
img: 4x:4444444444
Remote read took, s :     .9059906005859375E-05
img: 5x:5555555555
Remote write took, s :     .1478195190429688E-04
img: 5x:5555555555
Remote read took, s :     .9059906005859375E-05
img: 6x:6666666666
Remote write took, s :     .1502037048339844E-04
img: 6x:6666666666
Remote read took, s :     .8106231689453125E-05
img: 7x:7777777777
Remote write took, s :     .1502037048339844E-04
img: 7x:7777777777
Remote read took, s :     .9059906005859375E-05
img: 8x:8888888888
Remote write took, s :     .1502037048339844E-04
img: 8x:8888888888
Remote read took, s :     .1001358032226562E-04
img: 9x:9999999999
Remote write took, s :     .2384185791015625E-04
img: 9x:9999999999
Remote read took, s :     .1192092895507812E-04
img: 10x:10101010101010101010
Remote write took, s :     .2217292785644531E-04
img: 10x:10101010101010101010
Remote read took, s :     .1001358032226562E-04
img: 11x:11111111111111111111
Remote write took, s :     .2384185791015625E-04
img: 11x:11111111111111111111
Remote read took, s :     .1096725463867188E-04
img: 12x:12121212121212121212
Remote write took, s :     .2288818359375000E-04
img: 12x:12121212121212121212
Remote read took, s :     .1001358032226562E-04
img: 13x:13131313131313131313
Remote write took, s :     .2408027648925781E-04
img: 13x:13131313131313131313
Remote read took, s :     .8821487426757812E-05
img: 14x:14141414141414141414
Remote write took, s :     .2193450927734375E-04
img: 14x:14141414141414141414
Remote read took, s :     .9059906005859375E-05

img: 15x:15151515151515151515
Remote write took, s :     .2312660217285156E-04
img: 15x:15151515151515151515
Remote read took, s :     .1001358032226562E-04
img: 16x:16161616161616161616
Remote write took, s :     .2193450927734375E-04
img: 16x:16161616161616161616
Remote read took, s :     .1902410984039307
img: 17x:17171717171717171717
Remote write took, s :     .1580715179443359E-03
img: 17x:17171717171717171717
Remote read took, s :     .4100799560546875E-04
img: 18x:18181818181818181818
Remote write took, s :     .1480579376220703E-03
img: 18x:18181818181818181818
Remote read took, s :     .4005432128906250E-04
img: 19x:19191919191919191919
Remote write took, s :     .1471042633056641E-03
img: 19x:19191919191919191919
Remote read took, s :     .3600120544433594E-04
img: 20x:20202020202020202020
Remote write took, s :     .1480579376220703E-03
img: 20x:20202020202020202020
Remote read took, s :     .4196166992187500E-04
img: 21x:21212121212121212121
Remote write took, s :     .1480579376220703E-03
img: 21x:21212121212121212121
Remote read took, s :     .4315376281738281E-04
img: 22x:22222222222222222222
Remote write took, s :     .1480579376220703E-03
img: 22x:22222222222222222222
Remote read took, s :     .3886222839355469E-04
img: 23x:23232323232323232323
Remote write took, s :     .1478195190429688E-03
img: 23x:23232323232323232323
Remote read took, s :     .4196166992187500E-04
img: 24x:24242424242424242424
Remote write took, s :     .1480579376220703E-03
img: 24x:24242424242424242424
Remote read took, s :     .3504753112792969E-04
img: 25x:25252525252525252525
Remote write took, s :     .1192092895507812E-03
img: 25x:25252525252525252525
Remote read took, s :     .3695487976074219E-04
img: 26x:26262626262626262626
Remote write took, s :     .1280307769775391E-03
img: 26x:26262626262626262626
Remote read took, s :     .4291534423828125E-04
img: 27x:27272727272727272727
Remote write took, s :     .1170635223388672E-03
img: 27x:27272727272727272727
Remote read took, s :     .4315376281738281E-04
img: 28x:28282828282828282828
Remote write took, s :     .1189708709716797E-03
img: 28x:28282828282828282828
Remote read took, s :     .4291534423828125E-04
img: 29x:29292929292929292929
Remote write took, s :     .1189708709716797E-03
img: 29x:29292929292929292929
Remote read took, s :     .4291534423828125E-04
img: 30x:30303030303030303030
Remote write took, s :     .1161098480224609E-03
img: 30x:30303030303030303030
Remote read took, s :     .4196166992187500E-04
img: 31x:31313131313131313131
Remote write took, s :     .1189708709716797E-03
img: 31x:31313131313131313131
Remote read took, s :     .5912780761718750E-04
img: 32x:32323232323232323232
Remote write took, s :     .1139640808105469E-03
img: 32x:32323232323232323232
Remote read took, s :     .5888938903808594E-04

img: 33x:33333333333333333333
Remote write took, s :     .1428127288818359E-03
img: 33x:33333333333333333333
Remote read took, s :     .4220008850097656E-04
img: 34x:34343434343434343434
Remote write took, s :     .1418590545654297E-03
img: 34x:34343434343434343434
Remote read took, s :     .4601478576660156E-04
img: 35x:35353535353535353535
Remote write took, s :     .1418590545654297E-03
img: 35x:35353535353535353535
Remote read took, s :     .4506111145019531E-04
img: 36x:36363636363636363636
Remote write took, s :     .1428127288818359E-03
img: 36x:36363636363636363636
Remote read took, s :     .4386901855468750E-04
img: 37x:37373737373737373737
Remote write took, s :     .1420974731445312E-03
img: 37x:37373737373737373737
Remote read took, s :     .4506111145019531E-04
img: 38x:38383838383838383838
Remote write took, s :     .1409053802490234E-03
img: 38x:38383838383838383838
Remote read took, s :     .4792213439941406E-04
img: 39x:39393939393939393939
Remote write took, s :     .1471042633056641E-03
img: 39x:39393939393939393939
Remote read took, s :     .4196166992187500E-04
img: 40x:40404040404040404040
Remote write took, s :     .1418590545654297E-03
img: 40x:40404040404040404040
Remote read took, s :     .4887580871582031E-04
img: 41x:41414141414141414141
Remote write took, s :     .1149177551269531E-03
img: 41x:41414141414141414141
Remote read took, s :     .3910064697265625E-04
img: 42x:42424242424242424242
Remote write took, s :     .1330375671386719E-03
img: 42x:42424242424242424242
Remote read took, s :     .3790855407714844E-04
img: 43x:43434343434343434343
Remote write took, s :     .1099109649658203E-03
img: 43x:43434343434343434343
Remote read took, s :     .3910064697265625E-04
img: 44x:44444444444444444444
Remote write took, s :     .1099109649658203E-03
img: 44x:44444444444444444444
Remote read took, s :     .4220008850097656E-04
img: 45x:45454545454545454545
Remote write took, s :     .1099109649658203E-03
img: 45x:45454545454545454545
Remote read took, s :     .4291534423828125E-04
img: 46x:46464646464646464646
Remote write took, s :     .1101493835449219E-03
img: 46x:46464646464646464646
Remote read took, s :     .4100799560546875E-04
img: 47x:47474747474747474747
Remote write took, s :     .1111030578613281E-03
img: 47x:47474747474747474747
Remote read took, s :     .4386901855468750E-04
img: 48x:48484848484848484848
Remote write took, s :     .1118183135986328E-03
img: 48x:48484848484848484848
Remote read took, s :     .5507469177246094E-04
img: 49x:49494949494949494949
Remote write took, s :     .1418590545654297E-03
img: 49x:49494949494949494949
Remote read took, s :     .4196166992187500E-04
img: 50x:50505050505050505050
Remote write took, s :     .1440048217773438E-03
img: 50x:50505050505050505050
Remote read took, s :     .4816055297851562E-04

Remote read took, s :     .4816055297851562E-04
img: 51x:51515151515151515151
Remote write took, s :     .1430511474609375E-03
img: 51x:51515151515151515151
Remote read took, s :     .4506111145019531E-04
img: 52x:52525252525252525252
Remote write took, s :     .1418590545654297E-03
img: 52x:52525252525252525252
Remote read took, s :     .4506111145019531E-04
img: 53x:53535353535353535353
Remote write took, s :     .1420974731445312E-03
img: 53x:53535353535353535353
Remote read took, s :     .4386901855468750E-04
img: 54x:54545454545454545454
Remote write took, s :     .1428127288818359E-03
img: 54x:54545454545454545454
Remote read took, s :     .4506111145019531E-04
img: 55x:55555555555555555555
Remote write took, s :     .1440048217773438E-03
img: 55x:55555555555555555555
Remote read took, s :     .4696846008300781E-04
img: 56x:56565656565656565656
Remote write took, s :     .1420974731445312E-03
img: 56x:56565656565656565656
Remote read took, s :     .4506111145019531E-04
img: 57x:57575757575757575757
Remote write took, s :     .1139640808105469E-03
img: 57x:57575757575757575757
Remote read took, s :     .4506111145019531E-04
img: 58x:58585858585858585858
Remote write took, s :     .1301765441894531E-03
img: 58x:58585858585858585858
Remote read took, s :     .4196166992187500E-04
img: 59x:59595959595959595959
Remote write took, s :     .1120567321777344E-03
img: 59x:59595959595959595959
Remote read took, s :     .3790855407714844E-04
img: 60x:60606060606060606060
Remote write took, s :     .1120567321777344E-03
img: 60x:60606060606060606060
Remote read took, s :     .3886222839355469E-04
img: 61x:61616161616161616161
Remote write took, s :     .1130104064941406E-03
img: 61x:61616161616161616161
Remote read took, s :     .4100799560546875E-04
img: 62x:62626262626262626262
Remote write took, s :     .1130104064941406E-03
img: 62x:62626262626262626262
Remote read took, s :     .4100799560546875E-04
img: 63x:63636363636363636363
Remote write took, s :     .1120567321777344E-03
img: 63x:63636363636363636363
Remote read took, s :     .4196166992187500E-04
img: 64x:64646464646464646464
Remote write took, s :     .1130104064941406E-03
img: 64x:64646464646464646464

 

The fragment in question was this:

 

sync all

do i = 2, nimgs
  if ( img .eq. 1 ) then
    time1 = mytime()
    x = x(:)[i]
    sync images ( i )
    time2 = mytime()
    write (*,"(a,g)") "Remote read took, s : ", time2-time1
    write (*,"(a,i0,a,10(i0))") "img: ", i, "x:", x(:)[i]
  else if ( img .eq. i ) then
    sync images( 1 )
  end if

  if ( img .eq. 1 ) then
    time1 = mytime()
    x(:)[i] = x
    sync images ( i )
    time2 = mytime()
    write (*,"(a,g)") "Remote write took, s : ", time2-time1
    write (*,"(a,i0,a,10(i0))") "img: ", i, "x:", x(:)[i]
  else if ( img .eq. i ) then
    sync images( 1 )
  end if
end do

sync all

 

Thanks

Anton

 

 

 

 

Several people, who are all on the
Fortran standardisation committee,
confirmed in comp.lang.fortran and
comp-fortran-90@jiscmail.ac.uk
that Ron's interpretation of one-sided
read/write is incorrect.

I therefore think the problem described
in my report is a compiler bug.
I think a bug report should be opened
on this issue, but I don't know how do this.

Many thanks

Anton

Ron is out of the office today - I've asked him to revisit this when he returns. I saw Bill Long's explanation.

Steve

Connectez-vous pour laisser un commentaire.