>>I added the commands and i see that all processors are working but the program is much slower. How can i detect bootlenick.
This is symptomatic of the inner OpenMP do loop running serially. Place the "!$OMP DO ..." (or "C$OMP DO ..." at the left margine.
Should this not improve matters then try
!$omp parallel do private(i,j)
do i=nx1+1,nx2-2
do j=ny1+2,ny2-2
“stuff1”
enddo
enddo
!$omp end parallel do
!$omp parallel do private(i,j)
do i=nx1+2,nx2-2
do j=ny1+1,ny2-2
“stuff2”
enddo
enddo
!$omp end parallel do
Note, the above is not inside an !$OMP PARALLEL region
The purpose of coding the first way was to permit the threads finishing the STUFF1 loop first to begin processing the STUFF2 loop prior to the remaining threads working on STUFF1 loop finishing.
If this too does not improve the performance then the code in STUFF1 and STUFF2 are likely memory copy statements as opposed to computational statements.
Jim Dempsey