I have been running an MKL DGEMM benchmark in native mode on a KNC card, but have noticed strange behavior. The performance is inconsistent and varies quite a bit. I tried multiple thread affinity settings and noticed the same behavior with varying numbers of threads and threads per core. The test consists of calling DGEMM for a set of 6,000 by 6,000 matrices a total of 1,000 times so I can compare calls. During my testing the performance varied by as much as 100 GFLOPS. A Google search did not reveal much. However, I did find a Dr. Dobb's article that noted the unusual behavior and attributed it to OS jitter, but did not ellaborate. Has anyone else noticed this behavior?
For more complete information about compiler optimizations, see our Optimization Notice.