Fast Small Dense Matrix Solver

Fast Small Dense Matrix Solver

I have a general square dense matrix A (not symmetric) which is formed by A=PTBP where B was in a compressed storage scheme and P is a rectangular matrix. The size of A ranges from 10x10 to 500x500, where B can be 150,000x150,000 and is sparse.

What would be the best way to solve for x given b (system of linear equations) that result from

Ax=b  =>  x=A-1b

Right now I am just using LAPACK DGESV that is linked to MKL (so assume I am using their solver). Is there any benifit to going to a interative solver or any recomendations as to how to best solve this system of equations as fast as possible.

Thanks for any comments

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Scott, I have a generic question.

>>...The size of A ... 500x500, where B can be 150,000x150,000...

How long does it take to solve it on your computer? Thanks in advance.

Note: I see that there are two threads already, one is in MKL forum and another is in Intel Visual Fortran forum...

Someone had suggested after I posted on Intel Fortrnal that I post my question on here since I am using the MKL library to solve the LAPACK routines.

It only takes a few seconds, but for each solution of A creates a new version of B and which is then matrix multiplied by P to build a new version of A which then needs a new solution. I like to speed up, even by a fraction of a second, solving the system of equations. There also is of course a slow down do to the A=PTBP, but I am unsure if there is anything faster than using DGEMM.

It is a particular program where time is important, even for a few extra milliseconds.

Thanks for the details!

>>...It only takes a few seconds...

Is it for B when it has dimensions 150000x150000?

Note 1: In case of a single-precision 84GB of memory is needed for B
Note 2: In case of a double-precision 168GB of memory is needed for B

PS: Of course it is possible if a Cray-like supercomputer is used...

B is formed as a result of finite differences, so its stored in a band like structure/vector to minimize storage then is transformed from the pre and post multiplication of P. Actually what I will post another time is how is it best to multiply out PTBP

Leave a Comment

Please sign in to add a comment. Not a member? Join today