I implemented an image processing algorithm with the help of the new sse2 instructions (on an IA32/P4). For calculation I?m using mainly XMM register. For one thing I have to use the general purpose register. I found out that the execution time will decrease form 38ms to 25ms if I add an shift instruction i.g. shl edx,o.
With the help of VTune I figured out that there are less ?64k Aliasing Conflicts?, if the
shl edx,0 instruction is added. But there are still a lot Aliasing Conflicts.
What?s the reason for this?
Does anybody have an idea?
How can I reduce the Aliasing Conflicts?
This is a section of the source code:
MOVD esi,xmm4 ;copy 32 the low 32bits of xmm4 in esi
shl esi,16 ;set bit 15-31 to zero
shl edx,0 ; no modification
shr esi,5 ; esi divided by /32
Thanks a lot!