SIGILL on AVX instruction

SIGILL on AVX instruction

Hi,

I have a very simple test program that I'm using to play around with AVX instruction sets. It works perfectly fine on my MacBook Pro, however, the same piece of code will fire off a SIGILL on my Linux workstation. I check cpuid before invoking the instructions and /proc/cpuinfo is also has the AVX flag set. I'm using clang with the -mavx command line switch. The instructions throwing the exception are any _mm256_xxx ones. I'm not using the FMA instructions. /proc/cpuinfo says I have 12 cores of Intel(R) Xeon(R) CPU E5-1660 0 @ 3.30GHz.

TIA,

Irfan.

publicaciones de 13 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

is the "xsave" cpuid bit set as well?

Hi Irfan

Can you post disasesmbly  of the code point where the SIGILL is thrown?

I suspect compiler error maybe somehow related to the opcodes.

@Iliya,

The following is the smallest piece of code that will throw, and it throws at the vpxor instruction, not that O'm

    .file    "asm.cpp"
    .section    .text.startup,"ax",@progbits
    .align    16, 0x90
    .type    __cxx_global_var_init,@function
__cxx_global_var_init:                  # @__cxx_global_var_init
    .cfi_startproc
# BB#0:
    pushq    %rbp
.Ltmp2:
    .cfi_def_cfa_offset 16
.Ltmp3:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
.Ltmp4:
    .cfi_def_cfa_register %rbp
    subq    $16, %rsp
    leaq    _ZStL8__ioinit, %rdi
    callq    _ZNSt8ios_base4InitC1Ev
    leaq    _ZNSt8ios_base4InitD1Ev, %rdi
    leaq    _ZStL8__ioinit, %rsi
    leaq    __dso_handle, %rdx
    callq    __cxa_atexit
    movl    %eax, -4(%rbp)          # 4-byte Spill
    addq    $16, %rsp
    popq    %rbp
    ret
.Ltmp5:
    .size    __cxx_global_var_init, .Ltmp5-__cxx_global_var_init
    .cfi_endproc

    .text
    .globl    main
    .align    16, 0x90
    .type    main,@function
main:                                   # @main
    .cfi_startproc
# BB#0:
    pushq    %rbp
.Ltmp8:
    .cfi_def_cfa_offset 16
.Ltmp9:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
.Ltmp10:
    .cfi_def_cfa_register %rbp
    andq    $-32, %rsp
    subq    $224, %rsp
    movl    $0, %eax
    movl    $0, 92(%rsp)
    vmovaps    32(%rsp), %ymm0
    vmovaps    %ymm0, 128(%rsp)
    vmovaps    %ymm0, 96(%rsp)
    leaq    (%rsp), %rcx
    vmovaps    96(%rsp), %ymm0
    vmovaps    128(%rsp), %ymm1
    vpxor    %ymm0, %ymm1, %ymm0
    vmovaps    %ymm0, 32(%rsp)
    movq    %rcx, 200(%rsp)
    vmovaps    %ymm0, 160(%rsp)
    movq    200(%rsp), %rcx
    vmovups    %ymm0, (%rcx)
    movq    %rbp, %rsp
    popq    %rbp
    vzeroupper
    ret
.Ltmp11:
    .size    main, .Ltmp11-main
    .cfi_endproc

    .section    .text.startup,"ax",@progbits
    .align    16, 0x90
    .type    _GLOBAL__I_a,@function
_GLOBAL__I_a:                           # @_GLOBAL__I_a
    .cfi_startproc
# BB#0:
    pushq    %rbp
.Ltmp14:
    .cfi_def_cfa_offset 16
.Ltmp15:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
.Ltmp16:
    .cfi_def_cfa_register %rbp
    callq    __cxx_global_var_init
    popq    %rbp
    ret
.Ltmp17:
    .size    _GLOBAL__I_a, .Ltmp17-_GLOBAL__I_a
    .cfi_endproc

    .type    _ZStL8__ioinit,@object  # @_ZStL8__ioinit
    .local    _ZStL8__ioinit
    .comm    _ZStL8__ioinit,1,1
    .section    .ctors,"aw",@progbits
    .align    8
    .quad    _GLOBAL__I_a

    .section    ".note.GNU-stack","",@progbits

 

@Mark,

Yes, the xsave bit is set on my the box that throws SIGILL. I'm reading up on it now from the reference manual.

Irfan.

@Iliya,

I've posed the disassembly from the smallest working version of a program that will cause the SIGKILL, but it has gone into moderation. It seems like this instruction is where things go wrong:

    vmovaps    32(%rsp), %ymm0

Either that, or at the instruction further down which is vpxor    %ymm0, %ymm1, %ymm0

I cannot tell precisely which one since I do not have access to a debugger on this machine right now.

Intel Intrinsics guide tells that _mm256_xor_si256 (which translates to vpxor equivalent to that in your disassembly) is an AVX2 intruction, it is not available in AVX. There is _mm256_setzero_si256 in AVX (which should also translate to vpxor but with the same register as both source operands). I suspect there is a hardware check that both source operands are the same in AVX, and this causes SIGILL.

 

引文:

Irfan H. 写道:

@Iliya,

I've posed the disassembly from the smallest working version of a program that will cause the SIGKILL, but it has gone into moderation. It seems like this instruction is where things go wrong:

    vmovaps    32(%rsp), %ymm0

Either that, or at the instruction further down which is vpxor    %ymm0, %ymm1, %ymm0

I cannot tell precisely which one since I do not have access to a debugger on this machine right now.

 

It looks like it could be related to stack alignment, rsp might not be 32byte aligned for some reason.

You could try the compiler args for clang "-mstackrealign -mstack-alignment=16", which generate code for 16byte alignment.

I think, you receive SIGSEGV in case of alignment violation.

 

引文:

andysem 写道:
Intel Intrinsics guide tells that _mm256_xor_si256 (which translates to vpxor equivalent to that in your disassembly) is an AVX2 intruction, it is not available in AVX.

this is clearly the correct explanation since the Xeon(R) CPU E5-1660 lacks AVX2 support, most probably the MacBook of the OP features AVX2, though, is it right Irfan ?

 引文:

andysem 写道:
There is _mm256_setzero_si256 in AVX

when targeting AVX (/QxAVX) the Intel compiler outputs code such as: vxorps ymm0, ymm0, ymm0

but vpxor ymm0, ymm0, ymm0 when targeting AVX2  (/QxCORE-AVX2)

It seems that question has been answered already.

Thank you everyone! I will try these. However, one point I'd like to make is that I used _mm256_xor_ps( ) which in the intrinsics guide is definitely listed as AVX and not AVX2. In addition, my cpuid call on my MacBook (where the exact same code works) clearly states that it does not support AVX2. Could this be due to a compiler issue with clang++?

Thanks,

Irfan.

引文:

bronxzv 写道:

 Quote:

andysem wrote:There is _mm256_setzero_si256 in AVX

when targeting AVX (/QxAVX) the Intel compiler outputs code such as: vxorps ymm0, ymm0, ymm0

but vpxor ymm0, ymm0, ymm0 when targeting AVX2  (/QxCORE-AVX2)

That explains it, thanks. I forgot about vxorps.

引文:

Irfan H. 写道:

Thank you everyone! I will try these. However, one point I'd like to make is that I used _mm256_xor_ps( ) which in the intrinsics guide is definitely listed as AVX and not AVX2. In addition, my cpuid call on my MacBook (where the exact same code works) clearly states that it does not support AVX2. Could this be due to a compiler issue with clang++?


 

Yes, _mm256_xor_ps is an AVX intrinsic. You can check disassembly of the binary you're running on MacBook to see that it translates to vxorps. If the compiler translates the intrinsic to vpxor then this is definitely a compiler issue.

Also, make sure you're not using compiler flags enabling AVX2, such as -mavx2 or -march=core-avx2. -march=native is also discouraged.

 

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya