"Illegal instruction" using -ipo

"Illegal instruction" using -ipo

I was working on a code using SSE4.1 instructions. When compiled with "icc -O3 -msse4.1", everything worked just fine. Howerver, if I add -ipo to the compilation, the code can be generated fine, but it will crash with "Illegal instruction" error.

Using valgrind, I found the offending instruction was the following

vex amd64->IR: unhandled instruction bytes: 0x66 0x45 0xF 0x3A 0x40 0xD9

I don't know what does that mean.

Similarly, if I used -fast, I will also get some other errors, such as

Fatal Error: This program was not built to run on the processor in your system.
The allowed processors are: Intel processors with SSE4.2 and POPCNT instructions support.

My computer has a Xeon E5520 quad-core CPU, it is running 64bit Ubuntu Linux 10.04. The /proc/cpuinfo shows the following

  • processor : 0
  • vendor_id : GenuineIntel
  • cpu family : 6
  • model : 26
  • model name : Intel Xeon CPU E5520 @ 2.27GHz
  • stepping : 5
  • cpu MHz : 2266.785
  • cache size : 8192 KB
  • physical id : 0
  • siblings : 4
  • core id : 0
  • cpu cores : 4
  • apicid : 0
  • initial apicid : 0
  • fpu : yes
  • fpu_exception : yes
  • cpuid level : 11
  • wp : yes
  • flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
  • bogomips : 4533.57
  • clflush size : 64
  • cache_alignment : 64
  • address sizes : 40 bits physical, 48 bits virtual
  • power management:

It looks like sse4_1, sse4_2 and popcnt are all supported.

Can any one let me know what was going on and if there is a work-around? The icc version is icc "(ICC) 12.0.0 20101006".

thanks in advance!

Qianqian

11 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

It would help much if you can isolate and display a small C/C++ code extract which, when compiled with the -O3 -msse4.1 -ipo options, produces the offending instruction sequence.

That valgrind error usually indicates a valid machine instruction which is not yetsupported byvalgrind.

If you aren't using the latest valgrind (3.6.0 released 21 October 2010), give it a try.

Yes, we'll need more info in order to do any valid investigation.
Several methods here:
1. try "IDB".
2. the issue may not be ipo. it maybe other optimizations that only kicked in after inlining.
so lower the optimization might help work-around the problem.

This issuewill likely take a long time to isolate. please file a ticket to the Intel Premier Support (https://premier.intel.com/)to get more hands-on help.

thanks,
Jennifer

Hi
I dont know if it help you more
Prescot processor make same problem, no relation with code.
Is strange same code with gnu compiler -march=core2 accepted and rejected by Icc. I think problem is configuration firmeware
(irq) (separed (old) .
Regards

hi Jennifer

thanks for the advice, idb is a very nice tool, I am glad that you pointed this out to me.

Using IDB, I found the code crashed at the following highlighted line:

for(i=0;i<4;i++)
if(havelsse4(plucker->m+eid*12+i*3,pout,&bary,o,d,int_coef)){
...
}

where the function havelsse4 is defined as an inline function

inline int havelsse4(float4 *vecN, float4 *pout,float4 *bary, const __m128 o,const __m128 d,const __m128 int_coef){
    ...
    if(...){
		_mm_store_ps(&pout->x, _mm_mul_ps(detp,_mm_shuffle_ps(inv_det, inv_det, 0))); /* crashed here */
                return 1;
    }
    return 0;
}

the error occurred at "movaps xmmword ptr [r8], xmm8" which corresponds to the marked line: ptr [r8] points to address "&pout->x" and xmm8 is the result from __mm_mul_ps().

if I commend out this line, -ipo and -fast worked perfectly. pout is a pointer to a float4 (a 4 floats struct) and it was allocated. Do you think that icc did something fishy when expanding this inline function?

how about moving the intrinsics parameters into different statements:

aa = _mm_shuffle_ps(inv_det,inv_det,0);
bb = _mm_mul_ps(detp, aa)
_mm_store_ps(&pout->x,bb);

can you give it a try?

hi Jennifer

I tried, but got the same error.

>>the error occurred at "movaps xmmword ptr [r8], xmm8" which corresponds to the marked line: ptr [r8] points to address "&pout->x"

Was r8 a multiple of 16?

movaps == move 16-byte aligned single precision

Jim Dempsey

yes, ptr [r8] is 16byte aligned. Variable pout is a float4 struct defined by

typedef struct Float4{
float x,y,z,w;
} float4 __attribute__ ((aligned(16)));

Could you file a ticket to Intel Premier Support? We will need a test case for it. Or if you could attach the testcase here (private if prefered), it would be ok too.

btw, make sure to get the latest compiler update.

Thanks,
Jennifer

Leave a Comment

Please sign in to add a comment. Not a member? Join today