Hi all,
I have been puzzled with the behavior of the threading. In my example below, if the parameters and data array size are small, I can usually get the cpu usage consistent with the # of threads I specified in nThread. When I sress tested with a much larger data structure size - 10-20GB memory, which is something I eventually need, the cpu usage dropped dramatically to about 15-16% (on a 8-core computer). In this example, I have to allocate/deallocate arrays inside the loop because the massive size of the arrays. Does this allocate/deallcate cause the problem? If so, why was this not obvious in the small case, but caused problem in the larger case?
Any suggestion would be much appreciated.
!$OMP PARALLEL PRIVATE(iLooper) Firstprivate(pSize) NUM_THREADS(nThread)
!$OMP DO SCHEDULE(Dynamic)
DO iLooper = 1, UniqCT1-1
ALLOCATE(Ejd(iLooper)%unit( Noofarcs ))
ALLOCATE(Pred(iLooper)%unit( noofarcs ))
ALLOCATE(pathtmp(iLooper)%unit(maxnu_pa))
CALL RETRIEVE_VEH_PATH(Arg_OriginSet(iLooper), &
Arg_DestSet(iLooper), &
Arg_TimeSet(iLooper), &
iLooper,1,pSize)
DEALLOCATE(Ejd(iLooper)%unit)
DEALLOCATE(Pred(iLooper)%unit)
ENDDO
!$OMP END DO
!$OMP END PARALLEL
CalmagC
CPU not fully utilized under different memory usage situations
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione



