I have been developing an iterative algorithm where most of the computation involves MMM, MVM, forward and backward solve, as well as several BLAS LAPACK functions available in MKL.
For big problem sizes I get diverging results in two different CPUs. All the software is exactly the same:
- OS Linux Ubuntu 11.10 kernel version 3.0.0-22-generic
- Intel parallel_studio_xe_2011_sp1_update2_intel64.tgz (MKL 10.2)
- Intel l_mkl_10.3.10.319_intel64.tgz update
- icc (ICC) 12.1.3 20120212
The two systems I have:
- Intel 2 Core Duo on a MacBook Pro T9900 17'' Mid. 2009 (dual boot Ubuntu 11.10 kernel 3.0.0-22-generic)
- Intel i7 3930K C2 stepping Desktop on an ASUS Rampage Extreme IV (Ubuntu 11.10 kernel 3.0.0-22-generic)
Basically the Intel Core 2 Duo MBP produces correct results whereas the Intel i7 3930K the results differ greatly (final result, number of iterations etc). To discard possibilities I started downgrading the icc settings e.g. removed -no-prec-div and this improved the situation for the n=2000 problem size but for larger problem sizes it fails to converge correctly. I switched to use g++ instead of icpc and the non reproduceability problem still persists. Hence, all signs point different MKL behavior depending on the processor.
I came across the article below. Is this a solution to my problem or is there a way to ensure reproduceability using the current MKL release? http://software.intel.com/en-us/articles/intro-to-CBWR-in-intel-mkl/