IPP causes invalid opcode exception at h9_ippsFFTGetSize_C_32fc

IPP causes invalid opcode exception at h9_ippsFFTGetSize_C_32fc

Beni F.'s picture

We are using IPP version 7.1.1.119 on 4th generation (Haswell) Core i7 processor under INtime (5) operating system.

We are using static linkage (#include <ipp_h9.h> before #include <ipp.h>).

A call to ippsFFTInitAlloc_C_32fc causes an invalid opcode exception. This occurs inside h9_ippsFFTGetSize_C_32fc function when trying to execute the les esp,edx instruction.

Note: When configuring IPP for AVX rather than AVX2 (using ipp_g9.h instead of ipp_h9.h) - everything works correctly. It so happens that  g9_ippsFFTGetSize_C_32fc does not compries that les instruction.

We verified that our processor supports AVX2 (ran the piece of code suggested by Intel for checking this).

Please advise.

Thanks,

Beni Falk

30 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Beni F.'s picture

http://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions says:

"The new instructions are encoded using what Intel calls a VEX prefix, which is a two- or three-byte prefix designed to clean up the complexity of current and future x86/x64 instruction encoding. The two new VEX prefixes are formed from two obsolete 32-bit instructions-Load Pointer Using DS (LDS-0xC4, 3-byte form) and Load Pointer Using ES (LES-0xC5, two-byte form)-which load the DS and ES segment registers in 32-bit mode. In 64-bit mode, opcodes LDS and LES generate an invalid-opcode exception, but under Intel® AVX, these opcodes are repurposed for encoding new instruction prefixes. As a result, the VEX instructions can only be used when running in 64-bit mode. The prefixes allow encoding more registers than previous x86 instructions and are required for accessing the new 256-bit SIMD registers or using the three- and four-operand syntax. As a user, you do not need to worry about this (unless you're writing assemblers or disassemblers)."

 I have the following questions:

1. I am currently compiling and running my code in 32-bit mode. Does it mean the I cannot profitably use IPP on AVX and AVX2?

2. As I wrote, when I have configured IPP for using AVX (rather than AVX2) the problem did not occur and everything seemed to work correctly. Given the above statement at Intel's site, how could it work? Or does IPP somehow switch the processor to 64 bit mode before performing the operation and switches it back afterwards? Please excuse me in advance if this is a dumb question.

Thanks,

 

Beni Falk

 

bronxzv's picture

Quote:

Beni F. wrote:As a result, the VEX instructions can only be used when running in 64-bit mode. 

this paper is wrong about that (*), you can use AVX and AVX2 in both 32-bit and 64-bit modes, it's working in front of me as I type this text

* I signaled it here http://software.intel.com/en-us/forums/topic/279901 18 months ago, but for some reason it's still not fixed

iliyapolak's picture

It is very strange that this error was not corrected.

iliyapolak's picture

As bronxzv said you can use both AVX and AVX2 instructions set in protected mode and in long mode.Bear in mind that in 64-bit mode you have additional 8 YMMn registers and 8 gp 64-bit registers more at your disposal.

Beni F.'s picture

My problem is that I am using AVX2 via IPP (rather via manually crafted assembly code) and IPP crashes (at least while I am working in 32-bit mode).

Is there a way to work aroung this issue? Do Intel plan to issue a fix for IPP to address it, or does IPP in AVX2 mode mandate 64-bit mode (now and forever)?

Note: our problem occurred when trying to use FFT functions in IPP. I presume that some AVX2 instructions are available in 32-bit mode and some (the ones using VEX prefix) aren't. It is also logical to suppose that there are performance benefits to using some VEX instructions in conjunction with FFT (or else IPP wouldn't use them). Is it such a significant performance boost that Intel would not support using IPP FFT functions in 32-bit mode?

In my opinion the best approach would be for Intel to support both kinds of usage. Just my two cents.

 

iliyapolak's picture

Can you somehow identify that instruction?Maybe with the help of debugger.

bronxzv's picture

Quote:

Beni F. wrote:I presume that some AVX2 instructions are available in 32-bit mode and some (the ones using VEX prefix) aren't.

this is an erroneous assumption, as already explained the paper at your link is plain wrong about that and unfortunately, as you prove it here, very confusing for newcomers to AVX

btw what you describe looks much like a potential bug in IPP, I'll suggest to report it on the dedicated forum

Beni F.'s picture

@iliyapolak: as I wrote in my original post, the debugger shows the offending instruction as: les esp,edx

@bronzxv:

1. I also approached TenAsys (the vendor of the INtime operating system) with my problem and they wrote to me that AVX2 instructions that use the VEX prefix cannot execute in 32-bit mode. Seems that I am not the only one who got confused.

2. If the VEX instructions can in fact execute in 32-bit mode, why do I get an invalid opcode exception when hitting such an instruction in h9_ippsFFTGetSize_C_32fc?

3. If, as you say, this is a bug in IPP, where can I report it?

Thanks,

Beni Falk

bronxzv's picture

Quote:

Beni F. wrote:3. If, as you say, this is a bug in IPP, where can I report it?

I said that it looks like a potential bug, you can report it here: http://software.intel.com/en-us/forums/intel-integrated-performance-primitives

Beni F.'s picture

OK, thanks.

iliyapolak's picture

>>>@iliyapolak: as I wrote in my original post, the debugger shows the offending instruction as: les esp,edx>>>

sorry have not seen that.

iliyapolak's picture

>>>les esp,edx>>>

Afaik les instruction was used to set up far pointers

Beni F.'s picture

ilyapolak - please see my post second from the top of this thread. I quoted there from an Intel site where they explain about the VEX prefix instructions.

bronxzv's picture

Quote:

Beni F. wrote:@iliyapolak: as I wrote in my original post, the debugger shows the offending instruction as: les esp,edx

are you sure that your debugger has proper support for AVX2 instructions? it may be a legitimate crash due to an AVX2 instruction (for example an instruction that your CPU doesn't support, case in point TSX instuctions on K series CPUs) but the debugger is wrongly reporting it as a legacy LES ?

EDIT: can you tell us the value of the byte right after the leading 0xc5 of this "LES" ?

 

iliyapolak's picture

Yes I have read your post.I cannot understand if invalid opcode exception was thrown by the processor when les esp,edx sequence was decoded or it was thrown during decoding some AVX instruction which encodes VEX prefix with the help of les hex value.In first case compiler could be responsible for the fault.

iliyapolak's picture

>>>are you sure that your debugger supports AVX2 instructions?>>>

Interesting question.IIRC invalid opcode vector is 0x6 and cpu should prepare a trap frame where it saves an address of faulty instruction.

Itzhak B.'s picture

Quote:

bronxzv wrote:

EDIT: can you tell us the value of the byte right after the leading 0xc5 of this "LES" ?

 

Quote:

iliyapolak wrote:

Yes I have read your post.I cannot understand if invalid opcode exception was thrown by the processor when les esp,edx sequence was decoded or it was thrown during decoding some AVX instruction which encodes VEX prefix with the help of les hex value.In first case compiler could be responsible for the fault.

This is capture of screen whe exception occured.

iliyapolak's picture

Quote:

Itzhak B. wrote:

Quote:

bronxzvwrote:

EDIT: can you tell us the value of the byte right after the leading 0xc5 of this "LES" ?

 

Quote:

iliyapolakwrote:

Yes I have read your post.I cannot understand if invalid opcode exception was thrown by the processor when les esp,edx sequence was decoded or it was thrown during decoding some AVX instruction which encodes VEX prefix with the help of les hex value.In first case compiler could be responsible for the fault.

This is capture of screen whe exception occured.

Sorry I can not see any attached screenshot.

Beni F.'s picture

We are using Microsoft Visual Studio 2008 SP1. It is very likely not aware of the VEX instructions.

Instruction stream bytes where the invalid opcode exception occurred are the following: c4 e2 51 f7 d0 8d 0c d5 47 00 00 00 ... (I don't know where this instruction ends).

About the possibility of our CPU not supporting AVX2 instructions - we ran the code defined at

http://download-software.intel.com/sites/default/files/319433-014.pdf section 2.2.3 and it ran successfully.

Thanks,

Beni

 

bronxzv's picture

Quote:

Beni F. wrote:Instruction stream bytes where the invalid opcode exception occurred are the following: c4 e2 51 f7 d0 8d 0c d5 47 00 00 00 ... (I don't know where this instruction ends).

thanks, this looks like a 3-byte prefix VEX encoded instruction, I'll try to understand which one it is

btw I see that LES opcode is 0xc4, not 0xc5 as mentioned in the paper, one more error in this damned paper

bronxzv's picture

Quote:

Beni F. wrote:Instruction stream bytes where the invalid opcode exception occurred are the following: c4 e2 51 f7 d0 8d 0c d5 47 00 00 00 ... (I don't know where this instruction ends).

from the 2nd byte this is a x0f38 prefix group, and from the 3rd byte there is an additional 0xf3 prefix, it looks like a 32-bit (since W==0) SARX instruction (*) but I may be wrong, by far the best will be to use an up to date debugger

SARX is a BMI2 instruction with another cpuid feature flag than AVX2 instructions (AFAIK, please verify), so you must also ensure that your targets (CPU and OS + BIOS) have proper support for BMI2 before to run this code path and run a fallback path otherwise

* see "SARX r32a,r/m32,r32b" in the Intel Architecture Instruction Set Extensions Programming Reference, it's at page B-21 in the August 2012 edition I have

Mark Charney (Intel)'s picture

Close. It is actually HSW's SHLX instruction in the BMI2 extension. The embedded prefix in VEX.pp=1 which we denote as "66".

bronxzv's picture

Quote:

Mark Charney (Intel) wrote:Close. It is actually HSW's SHLX instruction in the BMI2 extension. The embedded prefix in VEX.pp=1 which we denote as "66".

thanks, I stand corrected, it looks like I mixed my mind with VEX.pp=2 (0xF3)

sirrida's picture

Have you checked whether the processor has BMI2 enabled? AVX2 is a different set. As far as I know not all Haswell processors support BMI2 (which is a pity).

Beni F.'s picture

I am going to check it - thanks.

Brijender Bharti (Intel)'s picture

Hi Beni,

One quick question,  how did you get this HSW system? Did you purchase it or did you get it from under Intel's development program?

I am trying to find out if you have early systems or not.

Beni F.'s picture

We got a COM Express board from Advantech (a SOM-5894).

Thanks,

Beni

 

Brijender Bharti (Intel)'s picture

Hi Beni,

Board does not tell which processor are you using, did you get the processor from advantech too? :

http://www.advantech.com/products/SOM-5894/mod_0E302D80-4F19-406B-B540-733C6924C3A6.aspx

Board supports Embedded Intel ® Core™ i7/i5/i3/Celeron ® processor.

 

 

Beni F.'s picture

Yes we did. We got the processor with the board. It is a Haswell Core i7. According to the datasheet on the site (http://downloadt.advantech.com/ProductFile/PIS/SOM-5894/Product%20-%20Datasheet/SOM-5894_DS(06.21.13)20130624090857.pdf) it must be Core i7-4700EQ. There is no marking on the processor (at least not one that I can see) and BIOS does not say much.

 

Login to leave a comment.