Why is movdqu slower than data swizzling?

Why is movdqu slower than data swizzling?

Dear all,

This this data swizzling code:

__asmmoveax,DWORD PTR [edi+ebx*8]; //int*data

__asmmovesi,[edx]; // offsets[0]
__asmmovedi,[edx+4]; //offsets[1]

__asmmovssxmm1,DWORD PTR [eax+esi*4]; // 0 0 0 data[offset[0]]
__asmmovssxmm5,DWORD PTR [eax+edi*4]; // 0 0 0 data[offset[1]]
__asmmovedi,[edx+8]; //offsets[2]
__asmmovssxmm2,DWORD PTR [eax+edi*4]; // 0 0 0 data[offset[2]]
__asmmovedi,[edx+12]; //offsets[3]
__asmmovssxmm3,DWORD PTR [eax+edi*4]; // 0 0 0 data[offset3]]
__asmmovlhpsxmm1, xmm2; // 0 data2 0data0
__asmshufps xmm5, xmm3, 00010001b ;// data3 0 data1 0
__asmxorpsxmm1, xmm5;// data3 data2 data1 data0

Now if
offsets[1]=offsets[0]+1

offsets[2]=offsets[0]+2

offsets[3]=offsets[0]+3

i.e., the data are continous, then the code is equivalent to:

__asmmoveax,DWORD PTR [edi+ebx*8]; //int*data
__asmmovesi,[edx]; // offsets[0]
__asmmovdquxmm1,DWORD PTR [eax+esi*4];

However, when i compare the timing, it seems that data swizzling is faster.
It is normal? And Why?

Thank you.

1 post / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.