cross-typed usage penalties

cross-typed usage penalties


Intel 64 and IA-32 Architectures Optimization Reference Manual states ( see 5.1)

"Code sequences containing cross-typed usage produce the same result across
different implementations but incur a significant performance penalty. Using
SSE/SSE2/SSE3/SSSE3 instructions to operate on type-mismatched SIMD data
in the XMM register is strongly discouraged". ( underline is mine ).

Is there exact data of the performance penalties?

Specifically, what would be the penalty of mixing movhlps ( single precision type ) with addsd ( double precision type ) e.g.

movhlp %xmm1,%xmm2
addsd %xmm2,%xmm3

How much more efficient would be to use the following instead

unpckhpd %xmm1,%xmm1
addsd %xmm1,%xmm3


shufpd $1,%xmm1,%xmm1
addsd %xmm1,%xmm3

Thank you,

David Livshin

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Our engineering contacts responded:

Cross-type (mostly) means mixing vector integer and vector floating point. Mixing packed single and packed double is ok. One can write a code sequence to measure this for specific examples. When you violate the rules, the penalty will depend on which micro-architecture you are using. It may increase in future machines. It is at least 1 clock on Intel Core2 Duo Processors, and in some cases you can pay this more than once.


Lexi S.

IntelSoftware NetworkSupport

Contact us

Leave a Comment

Please sign in to add a comment. Not a member? Join today