DTRSM is the LAPACK function which I use to solve triangular linear system with a multiple right hand sides.
I am perfectly happy with the performance and the parallel scalability of the multi-threaded variant when the number of right hand sides n is sufficiently large.
My questions concerns the special case of a single right hand side. Here I detect no evidence that the function has been parallelized as the run-time is independent of the number of threads. The run-time does increase quadratically with the dimension m of the matrix exactly as predicted by the raw flop count. Moreover, the runtime of DTRSM is approximately twice that of DTRSV. I can find no evidence that DTRSV has been parallelized as the run-time is independent of the number of threads.
My specific questions are:
1) Is it correct that DTRSM defaults to a sequential code if the case of a single RHS.
2) Is it correct that DTRSV is a sequential code.
My motivation is the following: In LAPACK the function DLATRS can be used to solve triangular linear system in a manner which eliminates the possibility of floating point overflow. This is a sequential code. My colleges and I at Umeaa University are developing a parallel version of DLATRS. We need to make a fair comparison against the standard solvers DTRSM, DTRSV when the systems do not require overflow protection.