ICC 12 auto-vectorization issue Intel Linux32

ICC 12 auto-vectorization issue Intel Linux32

Hi, I'm hitting on crashes ( SEGV ) using ICC 12 and with auto-vectorized code .Auto-vectorization kicks in at -O2 or higher level optimizations . There are documented flag's to disable it ( -no-vec / -no-simd ) , but strangely I don't see ICC honouring it.The typical constructs where we see the SIMD code getting generated is for array / structure initializations. For example : char xfilePath[KPUXAXFILEPATHLENMAX] = {0};
Mostly during memset/memcpy and during initializations . Below is the assembly snippet :

(gdb) disass 0x402d1edd
Dump of assembler code for function kpuxaGlbClientAttrsInit:
0x402d1ec0 : xchg %ax,%ax
0x402d1ec2 : push %ebp
0x402d1ec3 : mov %esp,%ebp
0x402d1ec5 : sub $0x2a8,%esp
0x402d1ecb : mov $0x200,%eax
0x402d1ed0 : mov %ebx,-0xc(%ebp)
0x402d1ed3 : pxor %xmm0,%xmm0
0x402d1ed7 : mov %edi,-0x8(%ebp)
0x402d1eda : mov %esi,-0x4(%ebp)
=> 0x402d1edd : movaps %xmm0,-0x248(%ebp,%eax,1)
0x402d1ee5 : movaps %xmm0,-0x258(%ebp,%eax,1)
0x402d1eed : movaps %xmm0,-0x268(%ebp,%eax,1)
0x402d1ef5 : movaps %xmm0,-0x278(%ebp,%eax,1)
0x402d1efd : sub $0x40,%eax

movaps %xmm0,-0x248(%ebp,%eax,1) --> load from address ebp + (eax*1) - 0x248

(gdb) p $ebp+$eax
$5 = (void *) 0x42b41e8c
(gdb) p 0x42b41e8c-0x248
$6 = 1119099972
(gdb) p/x 0x42b41e8c-0x248
$7 = 0x42b41c44

MOVAPS moves a double quadword containing 4 packed single-precision FP values from the source operand to the destination. When the source or destination operand is a memory location, it must be aligned on a 16-byte boundary,otherwise a general protection exception (GP#) is
generated.

This address 0x42b41c44 should have been 16-byte aligned else a General Protection Exception will be raised. The problem can be thatICC 12 thinks that ebp is 16 byte aligned whereas it is 4-byte aligned as per the Intel 32-bit ABI.

The crash doesn't seem to re-produce in the below situations :1. On 64-bit Linux2. With the older version ICC compilers3. With optimization level -O1 and below
$icc -VIntel C Compiler XE for applications running on IA-32, Version 12.0 Build 20111014Copyright (C) 1985-2011 Intel Corporation. All rights reserved.Appreciate any of your comments or suggestions.
Thanks.

8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hello,

could you provide a small reproducer (single C/C++ source file)?

Thank you & best regards,

Georg Zitzlsberger

Hi, Its hard to get a test-case , since I don't see it reproducing in a small stand-alone . I'm still trying for a small reproducible test-case.In the meanwhile , if we can get further information from the current environment in any way , I can surely provide the details.Appreciate your help.Thanks.

Hello,

did you try with latest compiler from Intel Composer XE 2011 Update 11? If it does not occur there we high likely fixed it in the previous releases. It should be simpler to first test with the latest compiler than to reproduce the problem.

If it still fails you might provide me a more complex reproducer (via private reply). I can shrink it down to something engineering can handle. So far I failed to reproduce such a GP fault myself.

Unless you did not enforce the alignment and access to it manually (e.g. via pragma simd) I'm pretty sure it's a compiler optimization bug and we're highly interested to reproduce them on our side, too.

Regarding -no-vec vs. -no-simd:

  • -no-vec tells the HPO phase of the compiler to not vectorize loops. It, however, can still make use of SIMD instructions if applicable.
  • -no-simd turns off use of SIMD instructions; at least for instructions the compiler creates directly. This might not affect in-lined SIMD code from somewhere else [IPO] or use of intrinsics. (see edit below)

Best regards,

Georg Zitzlsberger

Edit: Opposed to what I've written above, -no-simd does not disable SIMD instructions in general. Particularly, it only disables the SIMD pragmas/directives. If you want to get rid of SIMD instructions alltogether there is only the option to compile for IA32 legacy (-mia32 [Linux*] or /arch:IA32 [Windows*]). However, this does not work for OS X* because the earliest processors available there already had SSE. It also is not possible for 64 bit processors because of the same reason.
I'm taking care to update the compiler documentation to correct this.

Hello,

I think I'm seeing something quite similar with a reproducer from this thread:
http://software.intel.com/en-us/forums/showthread.php?t=105284

It also results in unaligned access with movaps instruction. Anyways, that's just a guess; a reproducer from you would clarify things.
In case you cannot provide one I'll let you know once the similar issue is fixed. You might try with the latest compiler then.

Best regards,

Georg Zitzlsberger

Georg, In my case , we have 3rd party runtime which calls into our code . Our code is compiled with ICC 12 , so I assume the stack is 16-byte aligned . But I think the the 3rd party libs are compiled with an older compiler ( gcc ? ) and doesn't have 16-byte stack alignment and which eventually leads to a crash when it finds an unalinged moveaps instruction .I hearICC 12.0 recently changed to match GCCs ABI where all 32bit entries assume 16B aligned stacks (similar to GCC).To fix this, we have to either compile all the code with 12.0, or compile the 12.0 ones with -falign-stack=assume-4-byte . But we don't have a control on the 3rd party libs . So I'm going to try with the above flag for our code since I see that the 3rd party libs are indeed 4 byte aligned and not 16 bytes .Thanks.

Hello,

there were indeed ABI changes because of GCC (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13685).
In that case the solution would be to use the option you proposed.

Could you please verify your solution and let us know? Without having a reproducer to debug it's hard to tell whether the above movaps instruction accesses data from/allocated by a 3rd party library. It might still be related to the problem of the other thread. Let's see...

Thank you & best regards,

Georg Zitzlsberger

Hello,

I just got informed that a fix for the above problem is part of Intel(R) Composer XE 2013 SP1 (and higher).

Best regards,

Georg Zitzlsberger

Leave a Comment

Please sign in to add a comment. Not a member? Join today