Signed Saturation of integers

Signed Saturation of integers

Is there a way i can do signed saturation of integers on xmm register?
There are instruction for short and byte (paddsw, paddsb) but not for integers(no paddsd)!!!

Regards,
Prashanth NS
2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hello Prashanth,

You will need to implement that sequence manually.

One way is to use available 32-bit addition but check the result for underflow and overflow.

Overflow can occur only if both inputs were positive but result ended up being negative, while uderflow occurs only when both inputs are negative but result ends up being positive. These two checks produce the following algorithm in C:

int res = a + b;
int tmp = (res & ~(a | b)) < 0 ? 0x7fffffff : res;
int c = (~res & (a & b)) < 0 ? 0x80000000 : tmp;

Use of SSE4.1 (or AVX) instruction BLENDVPS (VBLENDVPS) allows ~4X speedup (and ~5X with AVX) over scalar code above, as measured on data within L1 cache on Sandy Bridge microarchitecture:

#include 

 // requires SSE4_1 (or AVX) support for BLENDVPS (or VBLENDVPS)
__m128i __inline __mm_adds_epi32( __m128i a, __m128i b )
{
       __m128i int_min = _mm_set1_epi32( 0x80000000 );
       __m128i int_max = _mm_set1_epi32( 0x7FFFFFFF );

       __m128i res      = _mm_add_epi32( a, b );
       __m128i sign_and = _mm_and_si128( a, b );
       __m128i sign_or  = _mm_or_si128( a, b );

       __m128i min_sat_mask = _mm_andnot_si128( res, sign_and );
       __m128i max_sat_mask = _mm_andnot_si128( sign_or, res );

       __m128 res_temp = _mm_blendv_ps( _mm_castsi128_ps( res ), _mm_castsi128_ps( int_min ), _mm_castsi128_ps( min_sat_mask ) );

       return _mm_castps_si128(
               _mm_blendv_ps( res_temp, _mm_castsi128_ps( int_max ), _mm_castsi128_ps( max_sat_mask ) ) );
}

The following are some of functional tests results generated with the implementation above:

2147483632 + 14 = 2147483646
        (7ffffff0 + e = 7ffffffe)

2147483632 + 15 = 2147483647
        (7ffffff0 + f = 7fffffff)

2147483632 + 16 = 2147483647
        (7ffffff0 + 10 = 7fffffff)

2147483632 + -2147483648 = -16
        (7ffffff0 + 80000000 = fffffff0)

-2147483648 + -2147483648 = -2147483648
        (80000000 + 80000000 = 80000000)

-2147483648 + 0 = -2147483648
        (80000000 + 0 = 80000000)

-2147483648 + 1 = -2147483647
        (80000000 + 1 = 80000001)

-2147483648 + -1 = -2147483648
        (80000000 + ffffffff = 80000000)

Hope this helps.

-Max

P.S. adding tag: _mm_adds_epi32 for search engines.

Leave a Comment

Please sign in to add a comment. Not a member? Join today