I've got a multi-threaded program that calls sequential dgemm() from multiple threads. If I run this program on a Sandy Bridge processor, using the latest MKL (from C++ Composer 2011.2.137), I get subtly different numerical results each time I run the program. Not wrong answers - just small differences in the low-order bits. If I run the same program on an earlier processor (e.g., i7-920), I get the exact same numerical result each time I run it. If I run my program using only one thread on a Sandy Bridge processor, I get the exact same numerical result each time. If I use an older MKL (e.g., 10.2.6.038) on Sandy Bridge (no change to my program, other than linking with a different MKL version), I get the exact same numerical result each time (but slower, of course, since it doesn't use AVX). Seems like there's some sort of thread safety problem in the AVX code inside MKL. Any known issues here?
For more complete information about compiler optimizations, see our Optimization Notice.