Using BLAS Functions for Copy, Swap and Fast Matrix Multiplication

Availability of Functions that Will Copy a Matrix and Swap a Matrix
The functions in the BLAS (and in much of Intel® Math Kernel Library) support the solving of systems of equations. Only those functions necessary for solving systems of dense equations are available. Since matrix copying is not required, it is not available. However, matrices can be copied and transposed using the Level 1 BLAS routine XCOPY. Similary there is also a Level 1 SWAP routine.  See the Intel Math Kernel Library manual for details on [S,C,D,Z] COPY and [S,C,D,Z]SWAP.  These copying and swaping routines are written optimally and will use the available memory bandwidth effectively.

Using the Intel® Math Kernel Library for Fast Matrix Multiplication
The most commonly used function of the BLAS (and the Intel® Math Kernel Library) is xgemm, where x stands for s (single precision), d (double precision), c (complex) or z (double complex). The gemm stands for GE-neral MatrixMatrix and is the matrix multiplication routine. A great deal of effort has been applied to optimizing this routine for great performance. It has also been threaded to take advantage of all available processors. For more information on using Intel MKL with threaded applications see Threading Issues.

