One of the big new features introduced in the Intel MKL 11.2 is the greatly improved performance for small problem sizes.
The time required by the first Intel® MKL call should be ignored for the performance measurements. The first Intel MKL call has overhead due to buffer allocation and thread initialization. Ignoring the first Intel MKL call gives more consistent times for small problems.
For more complete information about compiler optimizations, see our Optimization Notice.