Observed performance depends on many factors. Do you use parallel MKL or sequential MKL? Which version of MKL are you using? What is your OS and CPU? How many threads did you use? Did you align your data and how? Did you call the FORTRAN interface or the CBLAS interface? Please provide these details and we can help.

Matlab on Intel architectures actually use MKL internally for many linear algebra functions. This may explain why you didn't see higher speedup.

As to multipy a dense matrix with a sparse matrix, you can try mkl_dcsrmm or mkl_dcscmm or mkl_dcoomm, depending on the storage format of your sparse matrix (CSR, CSC, or COO). Search the MKL reference manual (http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/...) using these function names as keywords to get detailed information.

## Problems with Matrix Inversion dgetri

I use two functions dgetrf and dgetri to do large matrix inversion. The first function is doing LU decomposition and the second function is doing matrix inversion based on LU decomposition. When I comparing the result with matlab, I found some problems:

1. For the same big matrix (I tried 10000 by 10000), the speedup of mkl is not very large. For exmaple, matlab took about 236 seconds while mkl took about 220 seconds to finish. I want to know whether this speedup is ok? Or mkl can get much better performance?

2. I found two parameters in dgetri is very important for the performance: work and lwork. At first I set work = 8*N, N is the size of the matrix, then mkl will slower than matlab; Then I changed work = N*N, after that mkl could achieve the speedup I mentioned. Is there any other things I need to do to get a better performance?

3. In my code I also want to do dense matrix multiply sparse matrix. Is there any specific function in mkl I can use for this? Or I can just use cblas_dgemm to do matrix multiplication?

Thanks for the help.

C.J.