Related to one of my earlier post on MATMUL, I am looking for some guidance to see how can I speed-up the multiplication of 2 matrices (A,B). A is in order of size 1,000,000 x 100 and B is of size 100 x 10. I tried testing couple of combinations like MATMUL(A,B), MATMUL(B',A') tranposed before and suprised to see the difference in speed. Adding to that, some compiler options /fast, /Qopt-matmul, /Qipo (one each time) lead to even increased time. I see MKL libraries available in my package as well.

Is there any recommendation, that I should have matrices A & B in certain format for better performance or alternatives like OpenMP format might do better job? Please comment.

Thanks, Mohan.