Wierd instruction: extractps

Wierd instruction: extractps

In my assembly program, I use sse instruction in computing .

I need to extract 4 single float value in xmm seperately, to do later computing.

I write like that
extractps $3, %xmm0,%xmm0

compiler as reported error:Error: suffix or operands invalid for `extractps'

I look up volumen 2 manual ,find I am wrong .

EXTRACTPS reg/m32, xmm2, imm8
the destnation should be a register of 64bit or 32bit.

But 64bit or 32bit registers are all for integers , like rax,eax.
How can I do float computing later if I put the result in integer registers ?

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

rax, eax are called general-purpose registers therefore can hold different datatypes.

from http://download.intel.com/products/processor/manual/325462.pdf

Yes, you are right. general-purpose register can hold different datatypes.

But if I want to do some computing ,it is difficult.

I have to store the data in eax back to memory , and then load it .

Because , addl on can do integer add.
addss ,addps, addpd addsd can do float add.

fadd can do float add. but data is stored in fp register stack.
I have to mov data in eax to fp register stack .
This is no memory access.

So I can only use fadd in order to reduce memory access times.

Is what I said right ?

First of all I'll suggest you to use the intrinsics instead of ASM code.

It looks like what you want to do is to use scalar instructions from packed values, you can do that directly for the element 0, for example use _mm_add_ss onsome data output by _mm_add_ps. Toaccess otherelements simply use a shuffle instruction before the scalar instruction, btw shuffle is typically faster than extractps, for example to rotate right an XMM register with 4 x FP32 :


before: 3.0 | 4.5 | -2.4 | 1.1
after: 1.1 | 3.0 | 4.5 | -2.4

doing 3 such rotate right in sequence will allow you to access easily all individual elements

Login to leave a comment.