256 bit split load/store issues

256 bit split load/store issues

TITLE:  Split load/store 256 bit finder

ISSUE_NAME:   SPLIT_LOAD_STORE_256_BIT (sub issue LOAD or STORE)

DESCRIPTION: 

As shown in the example below, if the code is doing 128-bit load and then insert of 128-bit to higher 256-bits of the same registers, then we are adding another instruction instead of utilizing full 256-bit loads. It is recommended the code generators avoid this behavior

 

EXAMPLE: 

LOAD:

vmovupd xmm3, xmmword ptr [rax+r8*1]

vbroadcastsd ymm5, qword ptr [rsi+r13*8]

vmovupd xmm11, xmmword ptr [rax+r10*1]

vbroadcastsd ymm13, qword ptr [rbx+r13*8]

mov r15, qword ptr [rsp+0x3a0]

vinsertf128 ymm4, ymm3, xmmword ptr [rax+r8*1+0x10], 0x1

STORE:

vmovupd xmmword ptr [r8], xmm2

vextractf128 xmmword ptr [r8+0x10], ymm2, 0x1

SOLUTION:

This is mainly needed to be fixed by code generators. Latest Intel compiler avoids such issues aggressively. Also, if using instrisics, it is recommended to use full 256-bit loads instead of 128-bit loads and inserts/extracts

1 contribution / 0 nouveau(x)
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.