We have some articles discussing the performance test on MKL. for example, Tips to measure the performance of Intel® MKL with small matrix sizes.
SG have several interesting questions regarding the MKL performance and GFLOPS of processors under the article. I copy the comments here so more developer can involve.
The performance reported here seems to be half as good as it should be. Here's what I mean: graph on page http://software.intel.com/en-us/articles/parallelism-in-the-intel-math-kernel-library is for comparable CPU without AVX. Graph on this page is with AVX. But both graphs report similar performance -- but one would expect chip with AVX to perform twice as better than chip without AVX.
Here's my guess about the discrepancy: For the Graph on this page, the "F" in "GFlops" referrs to double-precision floating point, but for the Graph on the other page, the "F" is single-precision floating point. One can confirm the interpretation of "F" for this page based on the equation provided in this page. For the intrepretation of "F" on the other page, note that the other page says Peak Performance of 3.2 GHz SSE3 chip using 8 threads is 102.4 GFlops; dividing 102.4 by 3.2 GHz gives 32 floating point operations per clock per 8 threads -- or 4 floating point operations per clock per core; 4 floating point operations per clock per core on SSE3 chip means single precision floating point operations. But this para is just my guess -- would be nice if someone from Intel confirmed it or clarified the apparent missing performance on using AVX.