Replacement of packusdw

Replacement of packusdw

Hi all,

I'm looking for an efficient instruction sequence with the exact same functionality as the SSE 4.1 packusdw instruction.

First I tried to use packssdw by subtracting 0x8000 from the input and then adding it back in, but this doesn't work for all input. I've got a working implementation that uses pcmpgtd to do the saturating, but it's very long. Anyone got a better idea?

Thanks,

Nicolas

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Best Reply

Quoting - c0d1f1ed
Hi all,

I'm looking for an efficient instruction sequence with the exact same functionality as the SSE 4.1 packusdw instruction.

First I tried to use packssdw by subtracting 0x8000 from the input and then adding it back in, but this doesn't work for all input. I've got a working implementation that uses pcmpgtd to do the saturating, but it's very long. Anyone got a better idea?

Thanks,

Nicolas

use your original idea of sub 0x8000 and adding it back later on packssdw, but also do packssdw on the original values, shift the sign bit 15 bits right and use it asmask (using pandn) on the result tozero the negative elements - hopefully shorter than the pcmpgt version

Leave a Comment

Please sign in to add a comment. Not a member? Join today