Loading...
You are not logged-in Login/Register





  • Posts   Search Threads
  • IdefixJuly 8, 2009 1:00 PM PDT   
    MKL Solvers Parallel Performance

    Dear all,
    we are using MKL for solving transient water flow and solute transport in porous media. We apply both conjugate gradient and dfgmres solvers because the matrices are symmetric and asymmetric for the different problem categories. We are using ILU(0) preconditioning in both cases. The codes are running well and we observe a significant speedup in comparison to the unparallelized solvers we used before. Great! There is, however, something that struck us by surprise. The speedup observed on a dualcore machine was about 2 and the speedup on a quadcore machine was 2 as well, although we expected it to be higher. In both cases, CPU usage amounts to more than 90%. We have also played around with the environment variable mkl_num_threads. On the quadcore machine, setting this to one resulted in a quicker execution compared to not specifying at explicitly and letting MKL assign it dynamically! CPU usage was still more than 90%. We would be glad if someone could explain these results and could come up with some suggestions for a further speedup on the quadcore machine.

    Thank you very much in advance

    Idefix


    Alexander Kalinkin (Intel)July 8, 2009 9:19 PM PDT
    Rate
     
    Re: MKL Solvers Parallel Performance

    Quoting - Idefix
    Dear all,
    we are using MKL for solving transient water flow and solute transport in porous media. We apply both conjugate gradient and dfgmres solvers because the matrices are symmetric and asymmetric for the different problem categories. We are using ILU(0) preconditioning in both cases. The codes are running well and we observe a significant speedup in comparison to the unparallelized solvers we used before. Great! There is, however, something that struck us by surprise. The speedup observed on a dualcore machine was about 2 and the speedup on a quadcore machine was 2 as well, although we expected it to be higher. In both cases, CPU usage amounts to more than 90%. We have also played around with the environment variable mkl_num_threads. On the quadcore machine, setting this to one resulted in a quicker execution compared to not specifying at explicitly and letting MKL assign it dynamically! CPU usage was still more than 90%. We would be glad if someone could explain these results and could come up with some suggestions for a further speedup on the quadcore machine.

    Thank you very much in advance

    Idefix

    Hi Idefix,
    As I understand your program is enough complicated and uses many different parts of MKL. Could you measured what time your code spent on computing precondition (ILU(0)), on matrix multiplication, and calling CG subroutines on different machines. With these data we could understand situation more deeply.
    With best regards,
    Alexander



Forum jump:  

Intel Software Network Forums Statistics

16,369 users have contributed to 46,341 threads and 163,954 posts to date.

In the past 24 hours, we have 18 new thread(s) 102 new posts(s), and 67 new user(s).

In the past 3 days, the most popular thread for everyone has been Formula for the intersection of straight lines The most posts were made to Take a look at John Burkhard&# The post with the most views is \"-check none\" generates error

Please welcome our newest member bikerepair8


For more complete information about compiler optimizations, see our Optimization Notice.