SSE3 critique

SSE3 critique

Deleted user的头像


I am writing a paper about interval arithmetic using SSE2 instructions which is part of my library for exact real number computations, and while doing it I realized SSE3 could have been quite helpful if it were done slightly differently.

My exact question is: I am curious why did Intel prefer to include a addsub instruction instead of multiplication with one of the arguments negated, i.e. something like

mulpnpd xmm1,xmm2

giving xmm1.1 * xmm2.1, (-xmm1.0) * xmm2.0

Using this the addsubpd instruction would not be needed to compute complex multiplications and divisions.

What I believe to be more important, however, is the behavior of Intel's sample SSE3 code for complex multiplication when the rounding mode is set to something other than rounding-to-nearest. More specifically, the SSE3 complex multiplication code would not compute upper bounds for the product when the rounding is to +inf, nor lower bounds for -inf, because the rounding of the multiplication that computes the substracted component would be rounded incorrectly.

This would not be the case if a mulpn instruction were available instead of addsub, because the result of the multiplication would be rounded the correct way. A mulpn would also be very useful for single or double precision interval arithmetic using the SIMD registers.

Does anyone know why Intel preferred addsub to this?

5 帖子 / 0 new
Intel Software Network Support的头像

Greetings from Intel Software Network Support. We will check with our engineering contacts and let you know what we find out.


Lexi S.
Intel Software Network Support

Intel Software Network Support的头像

Our engineering contacts responded:

The addsub was added for complex arithmetic. It seemed more natural to handle the "-" with an add type of instruction, rather than a mul as described above. Interval arithmetic was not a factor at all in the decision to add this instruction, but significant improvement in math libraries were obtained with these instructions, confirming that they are useful.

We are always looking for new instructions and feedback to make our architectures better suited to our customers' needs. If you would like to write up your requestwith a bit more detail and send it to us here, we would be glad to forward the information to ourarchitects to consider the request for future architectures. We would also need to know what you want to use it for.


Lexi S.
Intel Software Network Support
Contact us

Message Edited by on 11-15-2005 11:18 PM

Deleted user的头像

The question above popped out in the process of developing an interval arithmetic package that uses SSE-2 extensions to achieve high efficiency. The objective of the project is to develop a library for reliable real number computations (i.e. certified accuracy of the results) with very low overhead.

By storing a double precision interval in a SSE-2 register and keeping one of the bounds of the interval negated, the SSE-2 extensions turn out to be very useful for interval arithmetic by being able to perform some meaningful interval operations in single instructions. The basic arithmetic operations addition and subtraction, for example, can be implemented in, respectively, one and two processor instructions.

Multiplication and division, however, are still complicated by an architectural weakness of the x86 platform, the complexity required to perform branch-free selection in the SSE-2 registers (the ANDN, AND and OR combination in comparison to e.g. VSEL in AltiVec). To overcome this weakness, we want to propose a simple new instruction that can bring down the number of instructions required to perform interval multiplications to five (in comparison to the 15 that are currently needed).

The details are described in the paper "Interval Arithmetic using SSE-2", recently submitted to an academic journal.

Message Edited by barnie_bg on 04-28-200606:12 AM

Igor Levicki的头像

I know this is an old post but I am curious to hear if the author has updated his code. There is an instruction BLENDVPD in SSE 4.1 which makes conditional selection of double precision values easier.

-- Regards, Igor Levicki If you find my post helpfull, please rate it and/or select it as a best answer where applies. Thank you.