Hi,
I am developing a time-domain simulation tool for power systems. The heart of the simulator is this code:
do while(not converged)
CODE THAT HAS TO BE EXECUTED IN SERIAL (prepare data)
do i=1, "several thousands"
CALL DGETRS(data of i)
enddo
CODE THAT HAS TO BE EXECUTED IN SERIAL (check convergence)
enddo

The middle loop has no data dependencies and counts for 35% of my total CPU time in the serial version of the program (counted with intel vtune). DGETRS is a function in mkl_lapack95. I made a first parallelisation try by adding a !$omp parallel do directive right before the middle do-loop and playing with the scheduling, number of threads etc to optimise. I received a very small speed-up (marginal). The cpu time spend on DGETRS is being evenly distributed between the threads but suddenly I have a huge cpu consumption from libomp5.so.

I thought this is because the threads are created and killed after each one of do while loops. So, my second approach was this:
!$omp parallel
do while(not converged)
!$omp single
CODE THAT HAS TO BE EXECUTED IN SERIAL (prepare data)
!$omp end single
!$omp do
do i=1, "several thousands"
CALL DGETRS(data of i)
enddo
!$omp end do
!$omp single
CODE THAT HAS TO BE EXECUTED IN SERIAL (check convergence)
!$omp end single
enddo
!$omp end parallel
This way I though the threads would stay alive throughout all the loop and have a better speedup. All the numbers come worst. More elapsed time, more cpu time and less concurrency. The time that was awarded to libomp5.so is halved now, but I have a lot of time spend on the !$omp end single (that I have 2).

I can provide any screenshots and other info from vtune or run any profiling you need. I use fortran 95 with the latest intel compiler on a linux (ubuntu) machine.
Any comments (on the problem or in general) how to optimise the parallelisation are welcome! The incentive for parallelising is that the middle do loop is going to become more intensive soon with more detailed models. I expect the job done in there to go over 50% of the total.
Thanks in advance,
Petros



