Performance Evaluation of Matrix Identity algorithms

Performance Evaluation of Matrix Identity algorithms

*** Performance Evaluation of Matrix Identity algorithms ***

[ Computer System used for performance evaluations ]

** Dell Precision Mobile M4700 **

Intel Core i7-3840QM ( 2.80 GHz )
Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/products/70846
32GB RAM
320GB HDD
NVIDIA Quadro K1000M ( 192 CUDA cores / 2GB memory )
Windows 7 Professional 64-bit SP1
Size of L3 Cache = 8MB ( shared between all cores for data & instructions )
Size of L2 Cache = 1MB ( 256KB per core / shared for data & instructions )
Size of L1 Cache = 256KB ( 32KB per core for data & 32KB per core for instructions )
Display resolution: 1366 x 768

14 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Matrix Identity Algorithm ( 32-bit ): 1024 x 1024

[ Tests Set 1 ( 32-bit ) - Matrix Size: 1024 x 1024 ]

[ Microsoft C++ compiler ]

Matrix Size: 1024 x 1024
Processing...
Identity - Pass 01 - Completed: 5.81250 ticks
Identity - Pass 02 - Completed: 5.87500 ticks
Identity - Pass 03 - Completed: 5.87500 ticks
Identity - Pass 04 - Completed: 5.81250 ticks
Identity - Pass 05 - Completed: 5.87500 ticks
Identity - Passed

[ Borland C++ compiler ]

Matrix Size: 1024 x 1024
Processing...
Identity - Pass 01 - Completed: 4.87500 ticks
Identity - Pass 02 - Completed: 4.87500 ticks
Identity - Pass 03 - Completed: 5.87500 ticks
Identity - Pass 04 - Completed: 5.87500 ticks
Identity - Pass 05 - Completed: 5.87500 ticks
Identity - Passed

[ Intel C++ compiler ]

Matrix Size: 1024 x 1024
Processing...
Identity - Pass 01 - Completed: 1.93750 ticks
Identity - Pass 02 - Completed: 2.00000 ticks
Identity - Pass 03 - Completed: 1.93750 ticks
Identity - Pass 04 - Completed: 1.93750 ticks
Identity - Pass 05 - Completed: 2.93750 ticks
Identity - Passed

[ MinGW C++ compiler ]

Matrix Size: 1024 x 1024
Processing...
Identity - Pass 01 - Completed: 5.87500 ticks
Identity - Pass 02 - Completed: 5.87500 ticks
Identity - Pass 03 - Completed: 6.81250 ticks
Identity - Pass 04 - Completed: 5.87500 ticks
Identity - Pass 05 - Completed: 5.81250 ticks
Identity - Passed

[ Watcom C++ compiler ]

Matrix Size: 1024 x 1024
Processing...
Identity - Pass 01 - Completed: 5.81250 ticks
Identity - Pass 02 - Completed: 5.87500 ticks
Identity - Pass 03 - Completed: 4.87500 ticks
Identity - Pass 04 - Completed: 5.87500 ticks
Identity - Pass 05 - Completed: 5.87500 ticks
Identity - Passed

Note: 1 sec = 1000 ticks

Matrix Identity Algorithm - Code Analysis

[ Microsoft C++ compiler ]

...
template < class T >
_RTINLINE RTvoid _MatrixIdentityProcessingCRv2A( T * _RTRESTRICT ptS, RTssize_t iRows, RTssize_t iCols, RTint iNumOfThreads )
{
...
0024355C xor eax, eax
0024355E rep stos dword ptr es:[edi]
...
}
...

[ Borland C++ compiler ]

...
template < class T >
_RTINLINE RTvoid _MatrixIdentityProcessingCRv2A( T * _RTRESTRICT ptS, RTssize_t iRows, RTssize_t iCols, RTint iNumOfThreads )
{
...
00403069 xor ecx, ecx
0040306B inc edx
0040306C mov dword ptr [eax], ecx
0040306E add eax, 4
00403071 cmp edi, edx
00403073 jg 00403069
...
}
...

[ Intel C++ compiler ]

...
template < class T >
_RTINLINE RTvoid _MatrixIdentityProcessingCRv2A( T * _RTRESTRICT ptS, RTssize_t iRows, RTssize_t iCols, RTint iNumOfThreads )
{
...
00401096 pxor xmm0, xmm0
0040109A movntps xmmword ptr [ebx+ecx*4], xmm0
0040109E movntps xmmword ptr [ebx+ecx*4+10h], xmm0
004010A3 movntps xmmword ptr [ebx+ecx*4+20h], xmm0
004010A8 movntps xmmword ptr [ebx+ecx*4+30h], xmm0
004010AD movntps xmmword ptr [ebx+ecx*4+40h], xmm0
004010B2 movntps xmmword ptr [ebx+ecx*4+50h], xmm0
004010B7 movntps xmmword ptr [ebx+ecx*4+60h], xmm0
004010BC movntps xmmword ptr [ebx+ecx*4+70h], xmm0
004010C1 add ecx,20h
004010C4 cmp ecx,eax
004010C6 jb 0040109A
...
}
...

[ MinGW C++ compiler ]

...
template < class T >
_RTINLINE RTvoid _MatrixIdentityProcessingCRv2A( T * _RTRESTRICT ptS, RTssize_t iRows, RTssize_t iCols, RTint iNumOfThreads )
{
...
0040A5AE call _memset( 0042FD28h )
...
}
...

[ Watcom C++ compiler ]

...
template < class T >
_RTINLINE RTvoid _MatrixIdentityProcessingCRv2A( T * _RTRESTRICT ptS, RTssize_t iRows, RTssize_t iCols, RTint iNumOfThreads )
{
...
0040C9C7 cmp eax, ebp
0040C9C9 jge 0040C9ED
0040C9CB mov edx, eax
0040C9CD mov dword ptr [ebx+edx*4], 0
0040C9D4 inc eax
0040C9D5 jmp 0040C9C7
...
}
...

Matrix Identity Algorithm ( 32-bit ): 2048 x 2048

[ Tests Set 2 ( 32-bit ) - Matrix Size: 2048 x 2048 ]

[ Microsoft C++ compiler ]

Matrix Size: 2048 x 2048
Processing...
Identity - Pass 01 - Completed: 32.25000 ticks
Identity - Pass 02 - Completed: 32.18750 ticks
Identity - Pass 03 - Completed: 32.25000 ticks
Identity - Pass 04 - Completed: 32.18750 ticks
Identity - Pass 05 - Completed: 33.25000 ticks
Identity - Passed

[ Borland C++ compiler ]

Matrix Size: 2048 x 2048
Processing...
Identity - Pass 01 - Completed: 31.25000 ticks
Identity - Pass 02 - Completed: 32.12500 ticks
Identity - Pass 03 - Completed: 33.18750 ticks
Identity - Pass 04 - Completed: 32.25000 ticks
Identity - Pass 05 - Completed: 32.25000 ticks
Identity - Passed

[ Intel C++ compiler ]

Matrix Size: 2048 x 2048
Processing...
Identity - Pass 01 - Completed: 12.68750 ticks
Identity - Pass 02 - Completed: 11.75000 ticks
Identity - Pass 03 - Completed: 12.68750 ticks
Identity - Pass 04 - Completed: 12.68750 ticks
Identity - Pass 05 - Completed: 12.68750 ticks
Identity - Passed

[ MinGW C++ compiler ]

Matrix Size: 2048 x 2048
Processing...
Identity - Pass 01 - Completed: 32.25000 ticks
Identity - Pass 02 - Completed: 33.18750 ticks
Identity - Pass 03 - Completed: 32.25000 ticks
Identity - Pass 04 - Completed: 32.18750 ticks
Identity - Pass 05 - Completed: 33.25000 ticks
Identity - Passed

[ Watcom C++ compiler ]

Matrix Size: 2048 x 2048
Processing...
Identity - Pass 01 - Completed: 31.25000 ticks
Identity - Pass 02 - Completed: 31.25000 ticks
Identity - Pass 03 - Completed: 30.31250 ticks
Identity - Pass 04 - Completed: 31.25000 ticks
Identity - Pass 05 - Completed: 31.25000 ticks
Identity - Passed

Note: 1 sec = 1000 ticks

Matrix Identity Algorithm ( 32-bit ): 4096 x 4096

[ Tests Set 3 ( 32-bit ) - Matrix Size: 4096 x 4096 ]

[ Microsoft C++ compiler ]

Matrix Size: 4096 x 4096
Processing...
Identity - Pass 01 - Completed: 128.93750 ticks
Identity - Pass 02 - Completed: 127.93750 ticks
Identity - Pass 03 - Completed: 127.93750 ticks
Identity - Pass 04 - Completed: 126.93750 ticks
Identity - Pass 05 - Completed: 128.93750 ticks
Identity - Passed

[ Borland C++ compiler ]

Matrix Size: 4096 x 4096
Processing...
Identity - Pass 01 - Completed: 128.93750 ticks
Identity - Pass 02 - Completed: 126.93750 ticks
Identity - Pass 03 - Completed: 125.00000 ticks
Identity - Pass 04 - Completed: 125.00000 ticks
Identity - Pass 05 - Completed: 124.00000 ticks
Identity - Passed

[ Intel C++ compiler ]

Matrix Size: 4096 x 4096
Processing...
Identity - Pass 01 - Completed: 48.81250 ticks
Identity - Pass 02 - Completed: 47.87500 ticks
Identity - Pass 03 - Completed: 48.81250 ticks
Identity - Pass 04 - Completed: 48.81250 ticks
Identity - Pass 05 - Completed: 48.87500 ticks
Identity - Passed

[ MinGW C++ compiler ]

Matrix Size: 4096 x 4096
Processing...
Identity - Pass 01 - Completed: 127.93750 ticks
Identity - Pass 02 - Completed: 126.93750 ticks
Identity - Pass 03 - Completed: 127.93750 ticks
Identity - Pass 04 - Completed: 127.93750 ticks
Identity - Pass 05 - Completed: 126.93750 ticks
Identity - Passed

[ Watcom C++ compiler ]

Matrix Size: 4096 x 4096
Processing...
Identity - Pass 01 - Completed: 124.00000 ticks
Identity - Pass 02 - Completed: 124.00000 ticks
Identity - Pass 03 - Completed: 124.06250 ticks
Identity - Pass 04 - Completed: 124.00000 ticks
Identity - Pass 05 - Completed: 125.00000 ticks
Identity - Passed

Note: 1 sec = 1000 ticks

Matrix Identity Algorithm ( 32-bit ): 8192 x 8192

[ Tests Set 4 ( 32-bit ) - Matrix Size: 8192 x 8192 ]

[ Microsoft C++ compiler ]

Matrix Size: 8192 x 8192
Processing...
Identity - Pass 01 - Completed: 362.81250 ticks
Identity - Pass 02 - Completed: 363.87500 ticks
Identity - Pass 03 - Completed: 362.81250 ticks
Identity - Pass 04 - Completed: 362.87500 ticks
Identity - Pass 05 - Completed: 362.81250 ticks
Identity - Passed

[ Borland C++ compiler ]

Matrix Size: 8192 x 8192
Processing...
Identity - Pass 01 - Completed: 361.37500 ticks
Identity - Pass 02 - Completed: 361.31250 ticks
Identity - Pass 03 - Completed: 361.31250 ticks
Identity - Pass 04 - Completed: 361.31250 ticks
Identity - Pass 05 - Completed: 360.37500 ticks
Identity - Passed

[ Intel C++ compiler ]

Matrix Size: 8192 x 8192
Processing...
Identity - Pass 01 - Completed: 134.75000 ticks
Identity - Pass 02 - Completed: 133.81250 ticks
Identity - Pass 03 - Completed: 134.75000 ticks
Identity - Pass 04 - Completed: 133.81250 ticks
Identity - Pass 05 - Completed: 133.75000 ticks
Identity - Passed

[ MinGW C++ compiler ]

Matrix Size: 8192 x 8192
Processing...
Identity - Pass 01 - Completed: 356.50000 ticks
Identity - Pass 02 - Completed: 357.37500 ticks
Identity - Pass 03 - Completed: 357.43750 ticks
Identity - Pass 04 - Completed: 356.43750 ticks
Identity - Pass 05 - Completed: 356.43750 ticks
Identity - Passed

[ Watcom C++ compiler ]

Matrix Size: 8192 x 8192
Processing...
Identity - Pass 01 - Completed: 345.68750 ticks
Identity - Pass 02 - Completed: 346.68750 ticks
Identity - Pass 03 - Completed: 346.68750 ticks
Identity - Pass 04 - Completed: 345.68750 ticks
Identity - Pass 05 - Completed: 346.68750 ticks
Identity - Passed

Note: 1 sec = 1000 ticks

Matrix Identity Algorithm ( 64-bit ): 16384 x 16384

[ Tests Set 5 ( 64-bit ) - Matrix Size: 16384 x 16384 ]

[ Microsoft C++ compiler ]

Matrix Size: 16384 x 16384
Processing...
Identity - Pass 01 - Completed: 62.00000 ticks
Identity - Pass 02 - Completed: 62.00000 ticks
Identity - Pass 03 - Completed: 47.00000 ticks
Identity - Pass 04 - Completed: 63.00000 ticks
Identity - Pass 05 - Completed: 46.00000 ticks
Identity - Passed

[ Intel C++ compiler ]

Matrix Size: 16384 x 16384
Processing...
Identity - Pass 01 - Completed: 63.00000 ticks
Identity - Pass 02 - Completed: 47.00000 ticks
Identity - Pass 03 - Completed: 62.00000 ticks
Identity - Pass 04 - Completed: 47.00000 ticks
Identity - Pass 05 - Completed: 62.00000 ticks
Identity - Passed

[ MinGW C++ compiler ]

Matrix Size: 16384 x 16384
Processing...
Identity - Pass 01 - Completed: 62.00000 ticks
Identity - Pass 02 - Completed: 47.00000 ticks
Identity - Pass 03 - Completed: 63.00000 ticks
Identity - Pass 04 - Completed: 46.00000 ticks
Identity - Pass 05 - Completed: 63.00000 ticks
Identity - Passed

Note: 1 sec = 1000 ticks

Matrix Identity Algorithm ( 64-bit ): 32768 x 32768

[ Tests Set 6 ( 64-bit ) - Matrix Size: 32768 x 32768 ]

[ Microsoft C++ compiler ]

Matrix Size: 32768 x 32768
Processing...
Identity - Pass 01 - Completed: 218.00000 ticks
Identity - Pass 02 - Completed: 218.00000 ticks
Identity - Pass 03 - Completed: 219.00000 ticks
Identity - Pass 04 - Completed: 218.00000 ticks
Identity - Pass 05 - Completed: 219.00000 ticks
Identity - Passed

[ Intel C++ compiler ]

Matrix Size: 32768 x 32768
Processing...
Identity - Pass 01 - Completed: 219.00000 ticks
Identity - Pass 02 - Completed: 218.00000 ticks
Identity - Pass 03 - Completed: 219.00000 ticks
Identity - Pass 04 - Completed: 218.00000 ticks
Identity - Pass 05 - Completed: 218.00000 ticks
Identity - Passed

[ MinGW C++ compiler ]

Matrix Size: 32768 x 32768
Processing...
Identity - Pass 01 - Completed: 218.00000 ticks
Identity - Pass 02 - Completed: 219.00000 ticks
Identity - Pass 03 - Completed: 218.00000 ticks
Identity - Pass 04 - Completed: 219.00000 ticks
Identity - Pass 05 - Completed: 218.00000 ticks
Identity - Passed

Note: 1 sec = 1000 ticks

Matrix Identity Algorithm ( 64-bit ): 65536 x 65536

[ Tests Set 7 ( 64-bit ) - Matrix Size: 65536 x 65536 ]

[ Microsoft C++ compiler ]

Matrix Size: 65536 x 65536
Processing...
Identity - Pass 01 - Completed: 873.00000 ticks
Identity - Pass 02 - Completed: 890.00000 ticks
Identity - Pass 03 - Completed: 873.00000 ticks
Identity - Pass 04 - Completed: 874.00000 ticks
Identity - Pass 05 - Completed: 874.00000 ticks
Identity - Passed

[ Intel C++ compiler ]

Matrix Size: 65536 x 65536
Processing...
Identity - Pass 01 - Completed: 874.00000 ticks
Identity - Pass 02 - Completed: 874.00000 ticks
Identity - Pass 03 - Completed: 873.00000 ticks
Identity - Pass 04 - Completed: 874.00000 ticks
Identity - Pass 05 - Completed: 873.00000 ticks
Identity - Passed

[ MinGW C++ compiler ]

Matrix Size: 65536 x 65536
Processing...
Identity - Pass 01 - Completed: 874.00000 ticks
Identity - Pass 02 - Completed: 873.00000 ticks
Identity - Pass 03 - Completed: 874.00000 ticks
Identity - Pass 04 - Completed: 873.00000 ticks
Identity - Pass 05 - Completed: 874.00000 ticks
Identity - Passed

Note: 1 sec = 1000 ticks

Matrix Identity Algorithm ( 64-bit ): 81920 x 81920

[ Tests Set 8 ( 64-bit ) - Matrix Size: 81920 x 81920 ]

[ Microsoft C++ compiler ]
Not Tested

[ Intel C++ compiler ]
Not Tested

[ MinGW C++ compiler ]
Not Tested

Note: 1 sec = 1000 ticks

Matrix Identity Algorithm ( 64-bit ): 131072 x 131072

[ Tests Set 9 ( 64-bit ) - Matrix Size: 131072 x 131072 ]

[ Microsoft C++ compiler ]
Not Tested

[ Intel C++ compiler ]
Not Tested

[ MinGW C++ compiler ]
Not Tested

Note: 1 sec = 1000 ticks

Hi Sergey,

 

Nice set of tests.

Why do you test Watcom Compiler if it it lacks support of SIMD SSE architecture extensions?

 

>>...Why do you test Watcom Compiler if it it lacks support of SIMD SSE architecture extensions?

There is nothing wrong with it because I always test all major C++ compilers ( there are 6 of them ) supported on the project I've been working on.

Next, take a look at
...
Matrix Identity Algorithm ( 32-bit ): 4096 x 4096

[ Tests Set 3 ( 32-bit ) - Matrix Size: 4096 x 4096 ]
...
test cases and you will see that Watcom and Borland C++ compilers did a good job compared to Microsoft and MinGW C++ compilers, but Intel C++ compilers more than twice outperformed all of them.

Another thing is that these tests clearly demonstrated that a very good quality of binary codes generation is Not enough to be competitive in modern times and this is the case with Watcom and Borland C++ compilers.

PS: Turbo C++ compiler ( 16-bit ) is Not used because it plays a different role as an Overall Source Codes Verifier.

>>Another thing is that these tests clearly demonstrated that a very good quality of binary codes generation is Not
>>enough to be competitive in modern times and this is the case with Watcom and Borland C++ compilers...

Here are a couple of notes:

- Borland C++ compiler is No longer supported and will never support latest Intel ISAs ( Instruction Set Architectures );

- Watcom C++ compiler could support it, but unfortunately, I don't see any progress in that direction and it is Not clear for me what Open Watcom C++ compiler team is currently doing. I recently integrated version 2.0 and it is Not too much different from version 1.9 and SSE is still Not supported.

Take a look at 3rd post of the thread Matrix Identity Algorithm - Code Analysis and you will see why Intel C++ compiler outperformed all the rest C++ compilers on 32-bit tests. This is because it uses Non-Temporal moves for a 1st stage of the algorithm and Unrolls processing to 1-to-32 Loop Unrolling Schema ( LPS ).

You could also see that on 64-bit tests all C++ compilers, that is, Intel, Microsoft and MinGW, showed identical performance.

Leave a Comment

Please sign in to add a comment. Not a member? Join today