Intel C++ 13 Difference with -O2: Unsigned Integer Constant Propagation

Intel C++ 13 Difference with -O2: Unsigned Integer Constant Propagation

Аватар пользователя karma_kid

We are seeing a change/regression moving from Intel version 12 to version 13 of the C++ compiler on a Linux platform.  This occurs at the -O2 level of optmization (disappears at -O1), and appears on both Intel and AMD processors.  Code and makefile appended.  Can someone verify this (apparently broken) behavior?  Should we report this as a bug, or is this code straying into an area not defined by the standard?

The reproducer declares an instance of C, initializes it with a value set via a preprocessor macro, then tests it, increments it, and tests it again. Intel 12.1.5 (correctly) seems to do a thorough job of constant propagation and eliminates the conditionals in favor of just printing two values and a final "Test: PASSED" message.

Intel 13 doesn't seem to do the same level of constant propagation and instead generates code that actually carries out the operations expressed in the reproducer. A disassembly of the relevant section of the "big" executable shows:

400947: 66 0f 6f 05 41 05 00 movdqa 0x541(%rip),%xmm0 # 400e90
40094e: 00
40094f: 48 8b 35 3a 05 00 00 mov 0x53a(%rip),%rsi # 400e90
400956: 0f ae 14 24 ldmxcsr (%rsp)
40095a: 66 0f 7f 44 24 10 movdqa %xmm0,0x10(%rsp)
400960: e8 83 fe ff ff callq 4007e8 <std::basic_ostream<char, std::char_traits<char> >::operator<
400965: 48 89 c7 mov %rax,%rdi
400968: be 08 08 40 00 mov $0x400808,%esi
40096d: e8 86 fe ff ff callq 4007f8 <std::basic_ostream<char, std::char_traits<char> >::operator<
400972: 48 8b 74 24 10 mov 0x10(%rsp),%rsi
400977: 41 bd 01 00 00 00 mov $0x1,%r13d
40097d: 33 db xor %ebx,%ebx
40097f: 48 81 fe ff ff ff 7f cmp $0x7fffffff,%rsi

The first movdqa instruction seems to load data from 0x400e90, in the .rodata section; sure enough, the value at that address is 0x000000007fffffff, the comparison at 0x40097f is true, and the "big" executable produces the expected output:

% ./big
7fffffff
80000000
Test: PASSED

A disassembly of the "bigger" executable looks similar:

400947: 66 0f 6f 05 51 05 00 movdqa 0x551(%rip),%xmm0 # 400ea0
40094e: 00
40094f: 48 8b 35 4a 05 00 00 mov 0x54a(%rip),%rsi # 400ea0
400956: 0f ae 14 24 ldmxcsr (%rsp)
40095a: 66 0f 7f 44 24 10 movdqa %xmm0,0x10(%rsp)
400960: e8 83 fe ff ff callq 4007e8 <std::basic_ostream<char, std::char_traits<char> >::operator<
400965: 48 89 c7 mov %rax,%rdi
400968: be 08 08 40 00 mov $0x400808,%esi
40096d: e8 86 fe ff ff callq 4007f8 <std::basic_ostream<char, std::char_traits<char> >::operator<
400972: 48 bb 00 00 00 80 00 mov $0x80000000,%rbx
400979: 00 00 00
40097c: 41 bc 01 00 00 00 mov $0x1,%r12d
400982: 48 8b 74 24 10 mov 0x10(%rsp),%rsi
400987: 48 3b de cmp %rsi,%rbx

but this time the value at 0x400ea0 is 0xffffffff80000000, so the cmp at 0x400987 ends up comparing 0x80000000 (in %rbx) with 0xffffffff80000000 (in %rsi). The output of "bigger" is

% ./bigger
ffffffff80000000
ffffffff80000001
Test: FAILED

It seems to fail only with values larger than 0x7fffffff.

---- reproducer.cc ----
#include <ios>
#include <iostream>

struct C
{
C(unsigned long d1)
{
data[0] = d1;
data[1] = 0;
data[2] = 0;
data[3] = 0;
}

unsigned long data[4];
};

using namespace std;

int main(void)
{
int failures = 0;
C val( VAL );

cout << hex << val.data[0] << endl;
if (val.data[0] != VAL ) ++failures;

++val.data[0];

cout << val.data[0] << endl;
if (val.data[0] != VAL + 1) ++failures;

if (failures > 0) cout << "Test: FAILED" << endl;
else cout << "Test: PASSED" << endl;

return 0;
}

---- Makefile ----
CXXFLAGS := -g -O2 -Wall

.PHONY: all clean

all: big bigger

big: CXXFLAGS += -DVAL=0x7fffffffUL
big: reproducer.cc
$(CXX) $(CXXFLAGS) $^ -o $@

bigger: CXXFLAGS += -DVAL=0x80000000UL
bigger: reproducer.cc
$(CXX) $(CXXFLAGS) $^ -o $@

clean:
-$(RM) big bigger *.o

4 posts / 0 новое
Последнее сообщение
Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.
Аватар пользователя Sergey Kostrov

Thanks for the description and test-case!

PS: Guys, please take into account, when trying to compile the test-case, that VAL is a macro and defined in a command line (!):
...
big: CXXFLAGS += -DVAL=0x7fffffffUL
...

Аватар пользователя Georg Zitzlsberger (Intel)

Hello,

I can confirm this problem and forwarded it to engineering (DPD200240531). As soon as this is being fixed I'll let you know.
It might also be related to this: http://software.intel.com/en-us/forums/topic/358200

Best regards,

Georg Zitzlsberger

Аватар пользователя karma_kid

Thank you, Georg.  

    Rob Cunningham

Зарегистрируйтесь, чтобы оставить комментарий.