I've been running pardiso_64 on several problems. If I change MKL_NUM_THREADS I can quickly notice that the factorization elapsed time goes down.
However, the back-substitution doesn't seem to be affected by the number of threads. Given that I am solving, one right-hand-side at a time, many, many right-hand-side vectors, I would like the back-substitution to also be parallelized.
Do I need to do anything special to get speedup in this phase?