Parallel algorithms for fast matrix multiplication is non-trivial task because of the large number of quadratic operations: necessary to minimize the amount of the allocation of additional memory and it does not sacrifice speed multiplication. My recent advances in this field for 3-square matrices 16000 * 16000, located in memory and processed according to the formula C = C + A * B: my 129 seconds to 186 seconds dgemm Intel MKL (OS XP x64, the processor i7 860, 8 gigabytes of memory 1333 Mhz). The positive effect of parallelization beginning to affect the matrix is not less than 1500 * 1500. As a basic function of multiplication on the leaves of the recursion tree used dgemm Intel MKL. Created and fast algorithm for multiplication without allocating additional memory: this prize is more modest - about 8/7 on the speed dgemm Intel MKL on large matrices. There has been a positive effect for the case when one of the non-square matrices:I use it to speed up many problems in linear algebra, starting from the solution of systems of linear equations and ending with a singular analysis.
For more complete information about compiler optimizations, see our Optimization Notice.