I'm using Intel MKL 10.3 update 1.When I cmpare time to solve Ax=b with Pardiso, I get slower time with multiple right hand side than in single rhs.I made many tests with different matrix size and number of non zero values and I get always same behavior.Matrix is symmetricalMatrix and vector B are fill with complex numberMatrix size : 10000x10000Number of non zero value per row : 3All non zero values are on the diagonal and near the diagonalVector B is fill with 1+j0 at each index. (to simplify the test)Reordoring and Factorisation are runned only 1 time.Time to solve 10000 times the system with 1 rhs : 4.485 sec.Time to solve 10000/8 times the system with 8 rhs: 15.89 sec.Time to solve 10000/32 times the system with 32 rhs: 7.672 sec.Time to solve 10000/128 times the system with 128 rhs: 6.89 sec.Is it normal that the faster result is always wit 1rhs no matter the matrix size, the number of non zero and values in B vector ?I'm on windows XP, visual studio 2005 and 2010Intel Core 2 Quad CPU Q9550 @ 2.83GHz 3.25 GB RAM

# multiple right hand side

## multiple right hand side

For more complete information about compiler optimizations, see our Optimization Notice.

Hello,

From the tests, the result seems fine. The test repeated on the same data set. The memory access performance is important on the solver stage. If you only have one right hand date, this data will be easier to keep into the cache after the first run, which will have better performance, than it has more multiple right hands data. I am checking with the function owner for more details on this.

Thanks,

Chao

SergeyS.

non zero values per line = number of non zero values in diagonal and upper triangle per lineMatrix size 5000 x 5000 with 2 non zero values per line :rhs = 1 => 1.172 sec.rhs = 2 => 10.515sec.rhs = 4 => 5.718sec.rhs = 8 => 3.000sec.rhs = 16 => 2.188sec.rhs = 32 => 1.593sec.rhs = 64 => 1.469sec.rhs = 128 => 1.563sec.Conclusion : Slower with multiple rhs than single rhs no matter number of rhsMatrix size 5000 x 5000 with 10 non zero values per line :rhs = 1 => 9.734 sec.rhs = 2 => 33.140 sec.rhs = 4 => 19.734 sec.rhs = 8 => 15.687 sec.rhs = 16 => 13.203 sec.rhs = 32 => 12.406 sec.rhs = 64 => 12.531 sec.rhs = 128 => 15.734 sec.Conclusion : Slower with multiple rhs than single rhsno matter number of rhsMatrix size 5000 x 5000 with 100 non zero values per line :rhs = 1 => 116.248sec.rhs = 2 => 126.904sec.rhs = 4 => 89.264sec.rhs = 8 => 72.514sec.rhs = 16 => 67.468sec.rhs = 32 => 60.061sec.rhs = 64 => 58.186sec.rhs = 128 => 90.030sec.rhs = 256 => 117.185sec.Conclusion : Faster with multiple rhs than single rhs, but multiple rhs must be <= 64 and != 2Matrix size 20000 x 20000 with 50 non zero values per line :

rhs = 1 => 1143.416sec.rhs = 2 => 1451.363sec.rhs = 4 => 955.747sec.rhs = 8 => 789.063sec.rhs = 16 => 695.283sec.rhs = 32 => 632.784sec.rhs = 64 => 626.066sec.rhs = 128 => 831.969sec.Conclusion : Faster with multiple rhs than single rhs, but multiple rhs must be <= 64and != 2The more nnz values per line are in matrix, faster is multiple rhs solving.Unfortunately, in my case all my matrices are highly sparse. (Between 3 and 10 non zero values per line no matter the matrix size) and I have to solve it with many different B, so I need good performance with multiple right hand side with small number of nnz per line.ThanksMarcB

Could you provide us with info about you? We will create request and add this info into this request.

It would be nice if you provide us with your test case. We will add it in request as well.

Sergey Solovev.

Name: Marc BelleteteCompany: CymeEmail: marc.belletete@cyme.comEnvioronment: Visual Studio 2010I attached a benchmark .cpp file.Here is another subject :All our matrices are highly sparse (between 3 and 10 nnz values per line) and almost all nnz values are on diagonal or very close of diagonal.With this kind of matrix, I saw that KLU solver ( non symmetrical ) is, maybe, 1.5x faster than intel mkl pardiso (symmetrical).However, KLU have problems that Pardiso don't have (problem to solve singular matrix).I would like your opinion about that.Note: KLU is a free solver.ThanksMarcB

## Attachments:

Attachment | Size |
---|---|

Download intelMKLbenchmark.cpp | 8.94 KB |

Which comparative resultsyou have withthese solvers (KLU vs Pardiso)?

Thanks for detailed info!

We added it into our request.

Hello Marc,

I did investigation of the problem and found much better instrument for this kind of matrices.

The reason of relatively poor performance of multiple RHS solve is simple.

We have two types of parallelism inside PARDISO solving step. The first is used for problems with 1 RHS (the work for the only RHS is split between cores) and the second is used for systems with many RHS (many RHS are solved in parallel, but the sequential algorithm is used for each of them). There is no mixed version for the problems with few RHS. For instance, if we have 2 right hand sides and 4 cores, the problem will be solved in parallel using 2 cores, but 2 cores will remain idle.

The second reason of poor performance for such matrices is that amount of work for each RHS is too small. We dont even compensate the threading overhead. Indeed, the sequential version of PARDISO is faster than parallel for such problems (set OMP_NUM_THREADS=1 to check it). If we increase the number of RHS or nnz per line, the parallel version shows better performance as you mentioned.

In fact, matrices of size 10000x10000 are considered to be very small for PARDISO. If you always have all non-zero elements very close to diagonal, I would recommend you to use MKL LAPACK solver with band matrix storage scheme (zgbtrf and zgbtrs stay for factorization and solve correspondingly). Unfortunately, there is no storage scheme in LAPACK that takes advantage of both banded structure and symmetry (the latter has to be sacrificed), but the memory consumption seems not to be critical here. I rewrote the benchmark so that it does exactly the same computations, but uses these two functions instead of PARDISO (see attachment). It demonstrates much higher performance (at least few times faster, but sometimes even an order of magnitude faster). In fact there is the same issue with 2 RHS on 4 cores, but overall results situation is much better.

Best regards,

Andrey Kuzmin.

Here is the attachment.

## Attachments:

Attachment | Size |
---|---|

Download lapack_slv_mrhs.cpp | 5.38 KB |

Marc,

I see your question andwill investigate it soon.

Sincerely,

Andrey.

Marc,

>4- In my case, I only need some values in x solution vector (position

that I need change for each solve). I can >use PARDISO partial solution,

but it will be slower than solve all values in x vector because I need

to re-factorize >the matrix for each solve if I use partial solution.

You should try to use the release 4.1.3 from www.pardiso-project.org.

We have recently implemented a feature that exactly is a good solution for (4).

Olaf