Illegal instruction from custom 64 bit DLL

Illegal instruction from custom 64 bit DLL

Hi,

I built a custom 64 bit dll with an export.def for the function exports.

The dll code is directly from Intel code samples for building custom IPP dlls. I use ippStaticInit(), not ippStaticInitCPU(id) .. so there should not be a problem there. 

My system is i5 2500k, Windows 7, "x64 based PC"

The crash is on the vxorps instruction on the first call to ippsZero_32f

e9_ippsZero_32f:
[...]
000007FEE52284F6  jg          e9_ippsZero_32f+1Fh (7FEE52284FFh) 
000007FEE52284F8  call        e9_ownsZero_8u_E9 (7FEE52565C0h)

e9_ownsZero_8u_E9:
000007FEE52565C0 push rsi
000007FEE52565C1 push rdi
000007FEE52565C2 mov rdi,rcx
000007FEE52565C5 mov rsi,rdx
000007FEE52565C8 mov rax,rdi
000007FEE52565CB movsxd rsi,esi
000007FEE52565CE vxorps ymm0,ymm0,ymm0 ; illegal instruction 
000007FEE52565D2 xor rdx,rdx
000007FEE52565D5 cmp rsi,100h

Seems like this is something to do with AVX, but why would that be illegal and what should I do?

publicaciones de 30 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

Hi,

could you run ippCpuInfo (available in ipp samples - it has pre-built executables) and publish here its output? 2nd generation Core supports AVX - so may be something is wrong with OS support.

regards, Igor

****************
The decoded data
****************

==================
Signature
Stepping ID 7
Model 10
Model + Ext. 42
Family 6
Family + Ext. 6
Type 0

BrandName
=================================================
Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
=================================================

==============================================================
IPP would recommend using cpu_p8(y8) code for this processor

================
Feature Flags
================
Cores 4 - Number of cores per physical package
CMP / HTT 1 - Multi-Cores and/or Multi-Threading

MOVBE 0 - MOVBE instruction. For the first time in Atom(TM)
MMX 1 - Intel(R) Architecture MMX(TM) technology is supported
SSE 1 - Streaming SIMD Extensions is supported
SSE2 1 - Streaming SIMD Extensions 2 is supported
SSE3 1 - Streaming SIMD Extensions 3 is supported
SSSE3 1 - Supplemental Streaming SIMD Extensions 3 is supported
SSE41 1 - Streaming SIMD Extensions 4 (SSE4.1) is supported
SSE42 1 - Streaming SIMD Extensions 4 (SSE4.2) is supported
STTNI 0 - STTNI Instructions
EM64T 1 - Intel(R) Extended Memory 64 Technology is supported
AVX 1 - CPU supports Intel(R) Advanced Vector Extensions instruction set
AVX_OS 0 - OS supports Intel(R) AVX
AES 1 - AES instruction is supported
CLMUL 1 - PCLMULQDQ instruction is supported

So Windows 7 doesn't support it... :(

This will be a very common customer issue, so is there an easy way to prevent AVX instructions? Maybe I should init the dll with an older cpu id if I see AVX_OS = 0?

Thanks 

Interesting is why it selects cpu_e9 code by default when ippCpuInfo says "IPP would recommend using cpu_p8(y8) code for this processor"

(dupe)

You probably need to install SP1 for your Windows 7 ?

Regards,
Sergey 

Regards,
Sergey

I understand, that would work for me the developer, but what can I do to support an unpatched Windows 7?

I'm not saying I need AVX for unpatched Win 7, just an option that doesn't cause illegal instructions.

Thanks

While waiting for a new IPP that selects y8 instead of e9, you'd have to use get cpu features and then call init cpu (your selected cpu), where your selected cpu is the one lower than avx if the os does not support avx.

Here is some of my code that selects an IPP cpu depending on features (32-bit case):

    lib_enum lib;
    Ipp64u pFeaturesMask;
    Ipp32u pCpuidInfoRegs[4];
    IppStatus status;

    status= ippInit();                    // init local ippCore
    if( status == ippStsNoErr )
        status= ippGetCpuFeatures( &pFeaturesMask, pCpuidInfoRegs );
    if( status != ippStsNoErr )            // error getting features
        lib= LIB_W7;                    // lowest supported is W7 = SSE2
    else if( (pFeaturesMask & (Ipp64u)(ippCPUID_AVX2)) &&  (pFeaturesMask & (Ipp64u)(ippAVX_ENABLEDBYOS)) )
        lib= LIB_H9;                    // AVX2
    else if( (pFeaturesMask & (Ipp64u)(ippCPUID_AVX)) &&  (pFeaturesMask & (Ipp64u)(ippAVX_ENABLEDBYOS)) )
        lib= LIB_G9;                    // AVX
    else if( pFeaturesMask & (Ipp64u)(ippCPUID_SSE42) )
        lib= LIB_P8;                    // SSE42
    else if( pFeaturesMask & (Ipp64u)(ippCPUID_SSSE3) ) {
        if( pFeaturesMask & (Ipp64u)(ippCpuBonnell) )
          lib= LIB_S8;                    // SSSE3 Atom optimized
        else
          lib= LIB_V8;                    // SSSE3
    } else
        lib= LIB_W7;

Thanks for that Thomas.

Hello, 

Which verions of IPP are using now?   It is support that Ippinit() function will check both of the OS, and supported CPU feature. 

Regards
Chao 

7.0.205

>>...7.0.205

I have that version of IPP library and I could verify ippsZero_32f function on Ivy Bridge ( i7 ). Let me know if that test case looks right as a reproducer:

#include "ipps.h"

int main( void )
{
Ipp32f fData[ 256 ];

IppStatus st = ::ippsZero_32f( &fData[0], 256 );

return ( int )1;
}

>>>>...7.0.205
>>
>>I have that version of IPP library and I could verify ippsZero_32f function on Ivy Bridge ( i7 ).

Daven, There are two news:

A good one: I didn't have any issues or problems on Ivy Bridge system with IPP version 7.1.

A not good one: Unfortunately, I don't have a set of 64-bit IPP DLLs for version 7.0.205.

Here are all results of my verification:

// Verification for DSP domain DLL ( AVX / e9 ) is needed
/*
List of IPP DLLs used:

24/09/2012 11:25 PM 144,864 ippcore-7.1.dll
24/09/2012 11:25 PM 240,608 ipps-7.1.dll
25/09/2012 01:21 AM 5,499,360 ippse9-7.1.dll
*/

#include "stdio.h"
#include "ipps.h"

int main( void )
{
Ipp32f fData[ 256 ];

printf( "Test Started\n" );

IppStatus st = ::ippsZero_32f( &fData[0], 256 );

printf( "Test Completed\n" );

return ( int )1;
}

[ Output ]

Test Started
Test Completed

Let me know if you have any questions.

Here are some additional technical details:

Dell Precision Mobile M4700
Intel Core i7-3840QM ( Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/compare/70846 )

and the test case is attached.

Adjuntos: 

AdjuntoTamaño
Descargar test19.cpp541 bytes

Cita:

Sergey Kostrov escribió:
>>...7.0.205

I have that version of IPP library and I could verify ippsZero_32f function on Ivy Bridge ( i7 ). Let me know if that test case looks right as a reproducer:

#include "ipps.h"

int main( void )
{
Ipp32f fData[ 256 ];

IppStatus st = ::ippsZero_32f( &fData[0], 256 );

return ( int )1;
}

Yes, that reproduced the illegal instruction error. 

>>...Yes, that reproduced the illegal instruction error...

Use MsInfo32.exe and post a complete information about OS.

This is a short follow up and I'd like to note that functions ippsZero_xxx are Not in the list of IPP functions optimized to benefit from Haswell's new instructions. Take a look at: http://software.intel.com/en-us/articles/haswell-support-in-intel-ipp

Sergey,

this list is not fully precise - this list contains only functions that have got hand-developed optimization. It doesn't take into account functions that have nested calls to hand-optimized functions (for example convolution uses ippzero, etc.) and + 1 more thing - the whole library is built with icc/icl with the corresponding optimization switch - so new instructions can be inserted by compiler in ANY function.

regards, Igor

>>...this list is not fully precise - this list contains only functions that have got hand-developed optimization. It doesn't take
>>into account functions that have nested calls to hand-optimized functions (for example convolution uses ippzero, etc.)...

Thanks for the information and it would be nice to have a comment in the article about this. Please consider it as a Feature Request ( some kind ).

Either way, surely the disassembly shows that ymm* registers are being used, and to my knowledge they are AVX registers. 

I did more tests:

ippInit(), ippInitCpu(ippCpuSSE42),  and ippInitCpu(ippCpuSSE41) choose the e9_ippsZero_32f code, and crash with the illegal instruction error
ippInitCpu(ippCpuSSE3) chooses the m7_ippsZero_32f code and doesn't crash

I should repeat this is only for 64 bit; 32 bit seems to choose the right code with just ippInit().

So, according to the ippCpuInfo app, I should be selecting cpu_y8 code for my condition (AVX cpu but no AVX os), though this isn't an option from the above tests. 

I guess the only thing to do is update the IPP license...

>>...ippInitCpu( ippCpuSSE3 ) chooses the m7_ippsZero_32f code and doesn't crash

...
m7 - Optimized for processors with Intel SSE3
...
y8 Optimized for 64-bit applications on processors with Intel SSE4.1
...

>>...So, according to the ippCpuInfo app, I should be selecting cpu_y8 code for my condition (AVX cpu but no AVX os)...

This is the right decision to use as highest as possible Intel Instruction Set ( as a workaround ) in your situation.

Could you attach your dll + reproducer in order to understand what is wrong and how you've managed to bypass OS-support check for AVX? IPP dispatcher checks both AVX bit from CPUID and that AVX is supported by OS, and dispatches AVX ONLY and ONLY if both conditions are true.

regards, Igor

Attached the dll+reproducer. The dll code itself is doing nothing fancy:

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <ipp.h>

BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvReserved)
{
switch(fdwReason)
{
case DLL_PROCESS_ATTACH:
{
if(ippInit() != ippStsNoErr) return false;
}

default:
hinstDLL;
lpvReserved;
break;
}

return true;
}

Adjuntos: 

AdjuntoTamaño
Descargar test64.zip948.78 KB

>>...Attached the dll+reproducer...

This is simply to let you know that test application crashed on my Ivy Bridge system ( Intel Core i7-3840QM ( 2.80 GHz ) / Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/compare/70846 ).

>>...
>>IppStatus st = ::ippsZero_32f( &fData[0], 256 );
>>...

Are there more IPP functions with similar problems, that is, with AVX related crashes?

Hi, I tried this code on the machine with SP1:

TID0: INS 0x000007fee7801348             BASE     or rdi, rax                          | rdi = 0x1ff, rflags = 0x206
TID0: Read 0x306c1 = *(UINT32*)00000000002FEE30
TID0: INS 0x000007fee780134b             BASE     mov r13d, dword ptr [rsp+0x20]       | r13 = 0x306c1
TID0: INS 0x000007fee7801350             BASE     cmp edx, 0x18000000                  | rflags = 0x246
TID0: INS 0x000007fee7801356             BASE     jnz 0x7fee7801363
TID0: INS 0x000007fee7801358             BASE     call 0x7fee7b30126                   | rsp = 0x2fee08
TID0: Write *(UINT64*)00000000002FEE08 = 0x7fee780135d
TID0: INS 0x000007fee7b30126             BASE     push rbx                             | rsp = 0x2fee00
TID0: Write *(UINT64*)00000000002FEE00 = 0x1
TID0: INS 0x000007fee7b30127             BASE     mov eax, 0x1                         | rax = 0x1
TID0: INS 0x000007fee7b3012c             BASE     cpuid                                | rax = 0x306c1, rbx = 0x1100800, rcx = 0x7ffaf3ff, rdx = 0xbfebfbff
TID0: INS 0x000007fee7b3012e             BASE     xor eax, eax                         | rax = 0, rflags = 0x246
TID0: INS 0x000007fee7b30130             BASE     and ecx, 0x18000000                  | rcx = 0x18000000, rflags = 0x206
TID0: INS 0x000007fee7b30136             BASE     cmp ecx, 0x18000000                  | rflags = 0x246
TID0: INS 0x000007fee7b3013c             BASE     jnz 0x7fee7b30154
TID0: INS 0x000007fee7b3013e             BASE     xor ecx, ecx                         | rcx = 0, rflags = 0x246
TID0: INS 0x000007fee7b30140             XSAVE    xgetbv                               | rdx = 0, rax = 0x7
TID0: INS 0x000007fee7b30143             BASE     mov ecx, eax                         | rcx = 0x7
TID0: INS 0x000007fee7b30145             BASE     xor eax, eax                         | rax = 0, rflags = 0x246
TID0: INS 0x000007fee7b30147             BASE     and ecx, 0x6                         | rcx = 0x6, rflags = 0x206
TID0: INS 0x000007fee7b3014a             BASE     cmp ecx, 0x6                         | rflags = 0x246
TID0: INS 0x000007fee7b3014d             BASE     jnz 0x7fee7b30154
TID0: INS 0x000007fee7b3014f             BASE     mov eax, 0x1                         | rax = 0x1
TID0: Read 0x1 = *(UINT64*)00000000002FEE00
TID0: INS 0x000007fee7b30154             BASE     pop rbx                              | rbx = 0x1, rsp = 0x2fee08
TID0: Read 0x7fee780135d = *(UINT64*)00000000002FEE08
TID0: INS 0x000007fee7b30155             BASE     ret                                  | rsp = 0x2fee10
TID0: INS 0x000007fee780135d             BASE     shl eax, 0x9                         | rax = 0x200, rflags = 0x206
TID0: INS 0x000007fee7801360             BASE     or rdi, rax                          | rdi = 0x3ff, rflags = 0x206

it is visible (xgetbv instrunction) that OS is checked for AVX support. Currently I'm waiting for AVX machine with Windows 7 and without SP1 - so I'll update on my findings after that.

regards, Igor

>>>>...
>>>>IppStatus st = ::ippsZero_32f( &fData[0], 256 );
>>>>...
>>
>>Are there more IPP functions with similar problems, that is, with AVX related crashes?

If ippsZero_32f function is used in some production software than I would suggest a workaround based on a call to CRT function memset.

Hi daven-hughes,

it is your bug: I've investigated exe and dll you've provided - you took ipps library from one IPP 7.0 update and ippcore library from another - this is the main issue - they are incompatible from the dispatching point of view - the initial version of 7.0 didn't have w7/m7 code/libraries - so it supported only 5 cpu-specific libraries, while for the later 7.0.x updates w7/m7 code had been restored that means 6 cpu-specific libraries. In your case ippcore function ippInit detects correct set of supported features and that your OS doesn't support AVX - so it dispatches AVX-1 cpu (index=4), but for the "old" ipps library this index corresponds to the last cpu - so to AVX code. You can easily check this fact including ippGetLibVersion (for ippcore) and ippsGetLibVersion (for ippSP) - these versions MUST be the same.

regards, Igor

My mistake then, thanks Igor and everyone else for your help. 

Well, that said, I checked programmatically using ippsGetLibVersion() / ippGetLibVersion() that both ippcore_l.lib and ipps_l.lib were from build 7.0.205.40, and I only have installed the 64 bit libs once so that only makes sense.

Ahh! I just got it - somehow I had the IPP 6's ippsemergedem64t.lib linked, god knows why maybe because I thought there were a couple of functions removed from v7.0, but changing the link order so that ipps_l.lib was first, fixed it. 

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya