In my code, I do matrix multiplication of a square matrix (doubles) with a vector that has the same size as the square matrix dimension. The operation is such that I only need to do matrix math for the lower triangle values of the matrix including the main diagonal (sometimes it's ones, sometimes it's not).

So far, I used the general matrix MKL function dgemv to perform the entire matrix multiplication by the vector while zeroing out the upper triangle of the matrix values so these elements have no effect upon the result vector.

Behind the scenes, what's the difference between using dgemv as I described above and dtrmv to perform this operation? Is dtrmv faster? Without using parallelism which MKL function is the fastest one to use for my operation?

Thanks.