I am trying to implement a recursive matrix multiplication on Xeon Phi. I have two implementation . The first one I have my own implementation of Strasseen and it is working fine when I call it for more than one level of recursion the time is decreased one I increase the recursion level. To boost My algorithem I used the cblas_dgemm MKL function for submatrix multiplication I call it from Strassen Algorithem. The problem is that I the time increased when I increase the level of recursion. what is the problem
For more complete information about compiler optimizations, see our Optimization Notice.