Wrong mask generation using _mm512_mask_extpackstorelo_epi32

Wrong mask generation using _mm512_mask_extpackstorelo_epi32

Hi,

I have reduced my problem to this test:

#include <immintrin.h>

int main()
{
    int i;
    float tmp[16];

    for(i=0; i<16; i++){
        tmp[i] = 5.0f;
        printf("%f ", tmp[i]);
    }
    printf("\n");

     __m512 __vtmp = _mm512_set1_ps(10.0f);
     __mmask16 mask = 0x0040;

     _mm512_mask_extpackstorelo_ps(&tmp, mask, __vtmp, _MM_DOWNCONV_PS_NONE, 0);

    for(i=0; i<16; i++){
        printf("%f ", tmp[i]);
    }
    printf("\n");
}

According to the description of the ISA manual, using the 0x0040, the first position of 'tmp' shouldn't be written. However, the output of this code is:

5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000
10.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000

Having a look at the assembly, for any reason, the 0x0040 is being translated to $1:

        stmxcsr   64(%rsp)                                      #4.1 c1
        movl      $1, %eax                                      #11.23 c2
        vprefetche0 (%rsp)                                      #10.9 c2
        orl       $32832, 64(%rsp)                              #4.1 c6
        kmov      %eax, %k1                                     #11.23 c6
        ldmxcsr   64(%rsp)                                      #4.1 c10
        vbroadcastsd .L_2il0floatpacket.1(%rip), %zmm0{%k1}     #11.23 c11
        xorl      %ecx, %ecx                                    #8.5 c15
        movl      $1084227584, %edx                             #10.9 c15
        xorl      %r12d, %r12d                                  #8.5 c19
        vpackstorelpd %zmm0, 72(%rsp){%k1}                      #11.23 c19
        movl      %edx, %ebx                                    #11.23 c23
        movq      %rcx, %r15 

Am I missing something?

I'm using icc (ICC) 14.0.2 20140120

Thank you

Barcelona Supercomputing Center
publicaciones de 2 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

Sorry, I missunderstood the meaning of the instruction described in the Intrinsic Guide.
It's much clearer in the Reference Manual.

Could someone delete this post, please?

Barcelona Supercomputing Center

Inicie sesión para dejar un comentario.