I'm evaluating MKL 8.1 under Linux.
At the first step, we wanted to see if it is possible to reach the theoritical CPU performance using MKL. We developed a very simpel example using SGEMM routine. This example multiplies two 4000x4000 elements arrays and adds the results to another matrix.
This example needs 64*10^9 computations. Its execution time is near 35 seconds on a Pentium M 2.0GHz 2Mb Cache CPU.
We concluded that the highest possible performance of this CPU is near 2GFLOPS.
Is this correct?
Do we need any special additional optimizations. We have used GCC for compiling the application under Debian. ( we did not use intel compiler, can it make the results better? )