We are using only a selected few routines from Intel MKL such as dgemm, daxpy, ddot, dgemv, etc. For all the other blas/lapack routines we compile and link in the actual lapack/blas routines. My questions are as follows:
(i) Is dgemm, daxpy, ddot or dgemv parallelized ?
(ii) Do I need to use the full suite of Intel MKL routines to take advantage of parallelism in a select few routines?
By the way, we are using Intel MKL 7.2