matrix inversion precision issues on two different processors

matrix inversion precision issues on two different processors

Vineet Y.'s picture

Hi

I am having very serious precision issues by using intel mkl-lapack for matrix inversion:

Steps:

(1) I inverted a matrix using Matlab/Octave

(2) I use dgetrf and dgetri to invert the same matrix on two processors (a) Intel(R) Core(TM) i7-2600 CPU 3.40GHz for the test code/16GB of RAM on a windows machine using Intel Parallel Composer XE 2013, and (b) Intel(R) Xeon(R) CPU X5660 2.80GHz on a linux machine by using Composer xe 2011

(3) The problem is that the difference between the inverse obtained using Matlab/Octave and by using dgetrf and dgetri is different. There are differences is not an issue but the differences are based on processors is creating problems in large simulations. The answers received by using Intel Xeon Processors and Intel Composer XE 2011 Machine are more accurate than what is obtained by using the same code on windows machine

At this moment I think I am overlooking something i.e. creating a big mistake. An advice on solving this issue would be greatly appreciated. I have attached a sample code to highlight this issue. I have included the sample code but I was not able to upload input binary files on the forum (It was taking long long time)

Many thanks

Vineet

 

 

AttachmentSize
Download source1.f906.91 KB
6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Sergey Kostrov's picture

Hi Vineet,

>>...The answers received by using Intel Xeon Processors and Intel Composer XE 2011 Machine are more accurate [ SK: On Linux ]
>>than what is obtained by using the same code on windows machine...

Please post command lines for both cases ( sorry, I don't want to make any suggestions before I see all used options ). Next, I'll be able to verify calculations only on Windows 7 Professional with Intel Parallel Studio XE 2013 Update 2.

Also, would you be able to execute a couple of simple C/C++ tests ( I'll provide portable C/C++ codes ) to verify precision control functionality on both systems?

Vineet Y.'s picture

 Here are the command lines you requested. Send me the C/C++ files and I will execute them to verify precision control

 For linux (Intel Xeon processor)

ifort  source1.f90 –heap-arrays  -openmp -L /share/apps/intel/composer_xe_2011_sp1.7.256/mkl/lib/intel64/ -I /share/apps/intel/composer_xe_2011_sp1.7.256/mkl/include/ -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 –lpthread –o source1.exe

For Windows

Compiling with Intel(R) Visual Fortran Compiler XE 13.1.0.149 [Intel(R) 64]...

ifort /nologo /debug:full /O2 /I"C:\Program Files (x86)\Intel\Composer XE 2013\mkl\include" /warn:interfaces /module:"x64\Debug\\" /object:"x64\Debug\\" /Fd"x64\Debug\vc100.pdb" /traceback /check:none /libs:static /threads /dbglibs /Qmkl:parallel /c -heap-arrays /Qvc10 /Qlocation,link,"C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\\bin\amd64" "C:\Users\Vineet Work\Documents\Visual Studio 2010\Projects\source1.f90\source1.f90\Source1.f90"

Linking...

Link /OUT:"x64\Debug\source1.f90.exe" /INCREMENTAL:NO /NOLOGO /LIBPATH:"C:\Program Files (x86)\Intel\Composer XE 2013\mkl\lib\intel64" /MANIFEST /MANIFESTFILE:"C:\Users\Vineet Work\Documents\Visual Studio 2010\Projects\source1.f90\source1.f90\x64\Debug\source1.f90.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"C:\Users\Vineet Work\Documents\Visual Studio 2010\Projects\source1.f90\source1.f90\x64\Debug\source1.f90.pdb" /SUBSYSTEM:CONSOLE /IMPLIB:"C:\Users\Vineet Work\Documents\Visual Studio 2010\Projects\source1.f90\source1.f90\x64\Debug\source1.f90.lib" mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib mkl_lapack95_lp64.lib "x64\Debug\Source1.obj" "x64\Debug\Source2.obj"

Sergey Kostrov's picture

>>...Send me the C/C++ files and I will execute them to verify precision control...

Here it is and there are two solutions ( VS 2008 ) for Intel and Microsoft C++ compilers.

Note: /Qlong-double /Qpc80 options is used for Intel C++ compiler

Attachments: 

AttachmentSize
Download fputestapp.zip12.17 KB
Sergey Kostrov's picture

Outputs for Reference:

[ Intel C++ compiler ( 16-byte long double data type is used (!) ) ]

32-bit Windows platform - Configuration: RELEASE
Test-Case 1
Size of [ long double ] is: 16
Test-Case 2
_CW_DEFAULT & ALLBITSON: 0x9001F
_PC_24 & _MCW_PC : 0xA001F
_PC_53 & _MCW_PC : 0x9001F
_PC_64 & _MCW_PC : 0x8001F
Test-Case 3.1
Accuracy _CW_DEFAULT - long double - Result: 1.0000000000079181
Sub-Test 3.2
Accuracy _PC_24 - long double - Result: 1.0090389251708984
Test-Case 3.3
Accuracy _PC_53 - long double - Result: 1.0000000000079181
Test-Case 3.4
Accuracy _PC_64 - long double - Result: 1.0000000000000109

Test-Case 4

Matrix A
101.0 201.0 301.0 401.0 501.0 601.0 701.0 801.0
901.0 1001.0 1101.0 1201.0 1301.0 1401.0 1501.0 1601.0
1701.0 1801.0 1901.0 2001.0 2101.0 2201.0 2301.0 2401.0
2501.0 2601.0 2701.0 2801.0 2901.0 3001.0 3101.0 3201.0
3301.0 3401.0 3501.0 3601.0 3701.0 3801.0 3901.0 4001.0
4101.0 4201.0 4301.0 4401.0 4501.0 4601.0 4701.0 4801.0
4901.0 5001.0 5101.0 5201.0 5301.0 5401.0 5501.0 5601.0
5701.0 5801.0 5901.0 6001.0 6101.0 6201.0 6301.0 6401.0

Matrix B
101.0 201.0 301.0 401.0 501.0 601.0 701.0 801.0
901.0 1001.0 1101.0 1201.0 1301.0 1401.0 1501.0 1601.0
1701.0 1801.0 1901.0 2001.0 2101.0 2201.0 2301.0 2401.0
2501.0 2601.0 2701.0 2801.0 2901.0 3001.0 3101.0 3201.0
3301.0 3401.0 3501.0 3601.0 3701.0 3801.0 3901.0 4001.0
4101.0 4201.0 4301.0 4401.0 4501.0 4601.0 4701.0 4801.0
4901.0 5001.0 5101.0 5201.0 5301.0 5401.0 5501.0 5601.0
5701.0 5801.0 5901.0 6001.0 6101.0 6201.0 6301.0 6401.0

MFPT Used

Matrix C - Result
13826808.0 14187608.0 14548408.0 14909208.0 15270008.0 15630808.0 15991608.0 16352408.0
32393208.0 33394008.0 34394808.0 35395608.0 36396408.0 37397208.0 38398008.0 39398808.0
50959608.0 52600408.0 54241208.0 55882008.0 57522808.0 59163608.0 60804408.0 62445208.0
69526008.0 71806808.0 74087608.0 76368408.0 78649208.0 80930008.0 83210808.0 85491608.0
88092408.0 91013208.0 93934008.0 96854808.0 99775608.0 102696408.0 105617208.0 108538008.0
106658808.0 110219608.0 113780408.0 117341208.0 120902008.0 124462800.0 128023600.0 131584400.0
125225200.0 129426000.0 133626800.0 137827600.0 142028400.0 146229200.0 150430000.0 154630800.0
143791600.0 148632400.0 153473200.0 158314000.0 163154800.0 167995600.0 172836400.0 177677200.0

Press ESC to Exit...

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

[ Microsoft C++ compiler ]

32-bit Windows platform - Configuration: RELEASE
Test-Case 1
Size of [ long double ] is: 8
Test-Case 2
_CW_DEFAULT & ALLBITSON: 0x9001F
_PC_24 & _MCW_PC : 0xA001F
_PC_53 & _MCW_PC : 0x9001F
_PC_64 & _MCW_PC : 0x8001F
Test-Case 3.1
Accuracy _CW_DEFAULT - long double - Result: 1.0000000000079181
Sub-Test 3.2
Accuracy _PC_24 - long double - Result: 1.0090389251708984
Test-Case 3.3
Accuracy _PC_53 - long double - Result: 1.0000000000079181
Test-Case 3.4
Accuracy _PC_64 - long double - Result: 1.0000000000079181

Test-Case 4

Matrix A
101.0 201.0 301.0 401.0 501.0 601.0 701.0 801.0
901.0 1001.0 1101.0 1201.0 1301.0 1401.0 1501.0 1601.0
1701.0 1801.0 1901.0 2001.0 2101.0 2201.0 2301.0 2401.0
2501.0 2601.0 2701.0 2801.0 2901.0 3001.0 3101.0 3201.0
3301.0 3401.0 3501.0 3601.0 3701.0 3801.0 3901.0 4001.0
4101.0 4201.0 4301.0 4401.0 4501.0 4601.0 4701.0 4801.0
4901.0 5001.0 5101.0 5201.0 5301.0 5401.0 5501.0 5601.0
5701.0 5801.0 5901.0 6001.0 6101.0 6201.0 6301.0 6401.0

Matrix B
101.0 201.0 301.0 401.0 501.0 601.0 701.0 801.0
901.0 1001.0 1101.0 1201.0 1301.0 1401.0 1501.0 1601.0
1701.0 1801.0 1901.0 2001.0 2101.0 2201.0 2301.0 2401.0
2501.0 2601.0 2701.0 2801.0 2901.0 3001.0 3101.0 3201.0
3301.0 3401.0 3501.0 3601.0 3701.0 3801.0 3901.0 4001.0
4101.0 4201.0 4301.0 4401.0 4501.0 4601.0 4701.0 4801.0
4901.0 5001.0 5101.0 5201.0 5301.0 5401.0 5501.0 5601.0
5701.0 5801.0 5901.0 6001.0 6101.0 6201.0 6301.0 6401.0

MFPT Used

Matrix C - Result
13826808.0 14187608.0 14548408.0 14909208.0 15270008.0 15630808.0 15991608.0 16352408.0
32393208.0 33394008.0 34394808.0 35395608.0 36396408.0 37397208.0 38398008.0 39398808.0
50959608.0 52600408.0 54241208.0 55882008.0 57522808.0 59163608.0 60804408.0 62445208.0
69526008.0 71806808.0 74087608.0 76368408.0 78649208.0 80930008.0 83210808.0 85491608.0
88092408.0 91013208.0 93934008.0 96854808.0 99775608.0 102696408.0 105617208.0 108538008.0
106658808.0 110219608.0 113780408.0 117341208.0 120902008.0 124462808.0 128023608.0 131584408.0
125225208.0 129426008.0 133626808.0 137827616.0 142028416.0 146229216.0 150430016.0 154630816.0
143791616.0 148632416.0 153473216.0 158314016.0 163154816.0 167995616.0 172836416.0 177677216.0

Press ESC to Exit...

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Sergey Kostrov's picture

Hi Vineet,

Here are a couple of notes and in overall try the same set of command line options for both platforms ( options below are for Windows ):

- Use the same Instruction set, for example SSE2 ( /QxSSE2 ), or SSE4.2 ( /QxSSE4.2 )

- Use /fp:precise, /Qprec, /Qpc:64 or /Qpc:80 with /Qlong-double ( it enables 80-bit 'long double' data type when Intel C++ compiler is used )

- OpenMP is used on the Linux platform and I don't see /Qopenmp switch on Windows platform

- Verify an OpenMP report with /Qopenmp-report{ 0| 1| 2 } ( it controls the OpenMP parallelizer diagnostic level )

Login to leave a comment.