Forum Jump

Select Group :
Select Forum :
Sorted By :
Sort Order :
From The :
 
Thread Tools  Search this thread 
Idefix
Total Points:
100
Status Points:
50
Green Belt
July 8, 2009 1:00 PM PDT
MKL Solvers Parallel Performance
Dear all,
we are using MKL for solving transient water flow and solute transport in porous media. We apply both conjugate gradient and dfgmres solvers because the matrices are symmetric and asymmetric for the different problem categories. We are using ILU(0) preconditioning in both cases. The codes are running well and we observe a significant speedup in comparison to the unparallelized solvers we used before. Great! There is, however, something that struck us by surprise. The speedup observed on a dualcore machine was about 2 and the speedup on a quadcore machine was 2 as well, although we expected it to be higher. In both cases, CPU usage amounts to more than 90%. We have also played around with the environment variable mkl_num_threads. On the quadcore machine, setting this to one resulted in a quicker execution compared to not specifying at explicitly and letting MKL assign it dynamically! CPU usage was still more than 90%. We would be glad if someone could explain these results and could come up with some suggestions for a further speedup on the quadcore machine.

Thank you very much in advance

Idefix
Alexander Kalinkin (Intel)
Total Points:
710
Status Points:
210
Brown Belt
July 8, 2009 9:19 PM PDT
Rate
 
#1
Quoting - Idefix
Dear all,
we are using MKL for solving transient water flow and solute transport in porous media. We apply both conjugate gradient and dfgmres solvers because the matrices are symmetric and asymmetric for the different problem categories. We are using ILU(0) preconditioning in both cases. The codes are running well and we observe a significant speedup in comparison to the unparallelized solvers we used before. Great! There is, however, something that struck us by surprise. The speedup observed on a dualcore machine was about 2 and the speedup on a quadcore machine was 2 as well, although we expected it to be higher. In both cases, CPU usage amounts to more than 90%. We have also played around with the environment variable mkl_num_threads. On the quadcore machine, setting this to one resulted in a quicker execution compared to not specifying at explicitly and letting MKL assign it dynamically! CPU usage was still more than 90%. We would be glad if someone could explain these results and could come up with some suggestions for a further speedup on the quadcore machine.

Thank you very much in advance

Idefix

Hi Idefix,
As I understand your program is enough complicated and uses many different parts of MKL. Could you measured what time your code spent on computing precondition (ILU(0)), on matrix multiplication, and calling CG subroutines on different machines. With these data we could understand situation more deeply.
With best regards,
Alexander





Intel Software Network Forums Statistics

8445 users have contributed to 31553 threads and 100398 posts to date.
In the past 24 hours, we have 10 new thread(s) 30 new posts(s), and 43 new user(s).

In the past 3 days, the most popular thread for everyone has been Lost in MKL The most posts were made to TBB on linux segfaulting The post with the most views is Hi,if you were using imsl yo

Please welcome our newest member nonamez