ICC -O3 generates wrong code

ICC -O3 generates wrong code

I used ICC (icc version 14.0.3 (gcc version 4.8.1 compatibility)) on OS X 10.9.4 to compile the following code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define LOOPS 100
#define TEST_LEN (2*1024*1024)

static uint32_t g_Crc32Table[256];

void GetCrc32Table() 
{
	for (int i = 0; i < 256; i++) {
		uint32_t Crc = i;
		for (int j = 0; j < 8; j++) {
			if (Crc & 1) {
				Crc = (Crc >> 1) ^ 0x82F63B78;
			} else {
				Crc >>= 1;
			}
		}
		g_Crc32Table[i] = Crc;
	}
}

uint32_t GetCrc32(const uint8_t* InStr, size_t len)
{
	uint32_t crc = 0xffffffff;
	for (size_t i = 0; i < len; i++) {
		uint32_t tmp = InStr[i];
		crc = (crc >> 8) ^ g_Crc32Table[(crc ^ tmp) & 0xFF];
	}
	return crc ^ 0xBAADBEEF;
}
	
int main(int argc, char** argv)
{
	GetCrc32Table();
	srand(1234);
	uint8_t *b = (uint8_t*) malloc(TEST_LEN);
	for (int i = 0; i < TEST_LEN; i++) b[i] = (uint8_t) rand();
	
	uint32_t crc32 = 0;
	for (int i = 0; i < LOOPS; i++) {
		crc32 = GetCrc32(b, TEST_LEN);
	}
	printf("%x\n", (uint32_t) crc32);
	free(b);
}

with the following compile command line:

 /opt/intel/bin/icc -O3 -std=c99 -o t t.c

The compilation succeeded and it gave the result:

47b54328

As you can see, my code just compute the CRC32 for a generated buffer for 100 times (as defined in LOOPS). But after I changed LOOPS to 99 and used the same compile command line, it gave a different result:

f8521437

There must be something wrong. So I changed the command line to use -O2 and LOOPS to 100:

/opt/intel/bin/icc -O2 -std=c99 -o t t.c

This time it gave the correct result (f8521437). I verified with clang (version 5.1 came with Xcode 5):

clang -O3 -std=c99 -o t t.c

 

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I just did some investigations with a disassembler.

When LOOPS=100 (the result is incorrect), ICC generates MMX code as below:

                mov     edx, 0FFFFFFFFh
                xor     cl, cl
                mov     rax, 0FFFFFFFFh
                movd    xmm0, edx
                pshufd  xmm0, xmm0, 0

loc_100000C45:                          ; CODE XREF: _main+2B6
                mov     r9d, edx
                xor     r8d, r8d
                movdqa  xmm1, xmm0
                mov     rsi, rax

loc_100000C52:                          ; CODE XREF: _main+2AE
                movzx   r10d, byte ptr [r8+r15]
                xor     rsi, r10
                movzx   esi, sil
                inc     r8
                shr     r9d, 8
                psrld   xmm1, 8
                xor     r9d, [r14+rsi*4]
                mov     esi, r9d
                xor     r10, rsi
                movzx   r11d, r10b
                cmp     r8, 200000h
                movd    xmm2, dword ptr [r14+r11*4]
                pshufd  xmm3, xmm2, 0
                pxor    xmm1, xmm3
                jb      short loc_100000C52
                add     cl, 4
                cmp     cl, 64h
                jb      short loc_100000C45
                psrldq  xmm1, 0Ch
                lea     rdi, asc_100004CE0 ; "%x\n"
                movd    esi, xmm1
                xor     eax, eax
                xor     esi, 0BAADBEEFh
                call    _printf

But with LOOPS=99, ICC generates normal x86_64 code (even I added -xHost directive):

                xor     dl, dl
                mov     eax, 0FFFFFFFFh

loc_100000C52:                          ; CODE XREF: _main+295
                mov     esi, eax
                xor     ecx, ecx

loc_100000C56:                          ; CODE XREF: _main+28E
                mov     r10d, esi
                movzx   r8d, byte ptr [r15+rcx*2]
                xor     rsi, r8
                movzx   r9d, sil
                shr     r10d, 8
                movzx   r11d, byte ptr [r15+rcx*2+1]
                inc     rcx
                xor     r10d, [rbx+r9*4]
                mov     esi, r10d
                xor     r10, r11
                movzx   r8d, r10b
                shr     esi, 8
                xor     esi, [rbx+r8*4]
                cmp     rcx, 100000h
                jb      short loc_100000C56
                inc     dl
                cmp     dl, 3
                jb      short loc_100000C52
                xor     esi, 0BAADBEEFh
                lea     rdi, asc_100004CE0 ; "%x\n"
                xor     eax, eax
                call    _printf

 

Hi,

I tested your code and I see the same issue if I'm using the same command line you used. 

However the error does not show up if I compile with -O2 instead of -O3 or if I'm using the 15.0 beta compiler. Can you try one of those options?

Thanks,
Alex

I mentioned in the first post that the result code is correct if I use -O2 option.

I am not in the 15.0 beta programme. :(

And it would be great if you could let me know whether it is a known bug and if not, whether the bug has been located?

Thank you.

Leave a Comment

Please sign in to add a comment. Not a member? Join today