cross-typed usage penalties

cross-typed usage penalties

david.livshin@dalsoft.com's picture

Hi,

Intel 64 and IA-32 Architectures Optimization Reference Manual states ( see 5.1)
that

"Code sequences containing cross-typed usage produce the same result across
different implementations but incur a significant performance penalty. Using
SSE/SSE2/SSE3/SSSE3 instructions to operate on type-mismatched SIMD data
in the XMM register is strongly discouraged". ( underline is mine ).

Is there exact data of the performance penalties?

Specifically, what would be the penalty of mixing movhlps ( single precision type ) with addsd ( double precision type ) e.g.

movhlp %xmm1,%xmm2
addsd %xmm2,%xmm3

How much more efficient would be to use the following instead

unpckhpd %xmm1,%xmm1
addsd %xmm1,%xmm3

or

shufpd $1,%xmm1,%xmm1
addsd %xmm1,%xmm3

Thank you,

David Livshin

http://www.dalsoft.com


2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Intel Software Network Support's picture

Our engineering contacts responded:



Cross-type (mostly) means mixing vector integer and vector floating point. Mixing packed single and packed double is ok. One can write a code sequence to measure this for specific examples. When you violate the rules, the penalty will depend on which micro-architecture you are using. It may increase in future machines. It is at least 1 clock on Intel Core2 Duo Processors, and in some cases you can pay this more than once.


==


Lexi S.


IntelSoftware NetworkSupport


http://www.intel.com/software


Contact us


Login to leave a comment.