I recently got a machine with 2xXeon Phi and I am making some simple tests to better understand the features that I will need for utilizing them in our production code. Basically, in the code below I just want to measure how many exponential functions the Phi's can calculate pr. second. I first create an OpenMP thread for each Phi device and then make target sectiosn inside each of these threads. When inside the target section the code is further parallelized using openMP directives.
In the code below, the strange thing is that DevNo never seem to make it correctly to the device. The DevNo value written out in the print statement appear to be a random uninitialized value. In the version below NoDevices=1 and everything seems to work regardless of this issue, but if you instead set NoDevices=2 it doesn't. It seems to me that the code attempts to do both offloads to the same device, which sometimes result in half the performance and sometimes it crashes ..
Is this not how you are supposed to do multi device offloading with openMP 4.0? I use the latest version of parallel studio XE on Windows ..
Thank you in advance,
real*8 :: TimeBegin,TimeEnd,GOps
real*8,allocatable,Dimension(:) :: ExpIn,ExpOut
!DEC$ATTRIBUTES ALIGN: 64 :: ExpIn,ExpOut
integer :: NumThreads,NInner,NOuter,i,j,DevNo,PhiNo,NoDevices=1,NExps=165189!NExps=1651898
!First, fill a vector with random values to calculate exp for
!Now we do the actual benchmark calculation of exp's in parallel using openMP distributed over multiple phis
!Outer OMP parallel region - performing the same calculation on multiple phis in parallel
!$OMP PARALLEL NUM_THREADS(NoDevices) DEFAULT(SHARED) PRIVATE(DevNo)
print *,'Entered OMP parallel region for device', DevNo
!Initialize each target phi
!$OMP TARGET DATA DEVICE(DEVNO) MAP(to:NExps,DevNo,ExpIn(1:NExps),NumThreads)
!Somehow DevNo is not correct when we get here ...
print *,'Running on Xeon Phi device ',DevNo,'using',NumThreads,'threads'
!Run parallel benchmark on each phi
!$OMP PARALLEL SHARED(ExpIn,NOuter,NExps) PRIVATE(I,J,ExpOut) NUM_THREADS(NumThreads)
!$OMP DO SCHEDULE(DYNAMIC)
!$OMP END DO
!$OMP END PARALLEL
print *,'Result:',GOps,'G exponential functions/s'
!$OMP END TARGET
!$OMP END TARGET DATA
!$OMP END PARALLEL
end program Source1