Loading unsigned char values into Iu16vec8 with SSE2

Loading unsigned char values into Iu16vec8 with SSE2

Hello,

I d like to load an image of unsigned char values into a

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Oops, sorry for the previous message.

So I d like to load an image of unsigned char values into an array of Iu16vec8 so as to take advantage of SSE2 instruction set capabilities :

with Buffer, the Iu16vec8 array and src the source image, I ve used something like :

for(j=0;j

Unfortunately the expected performances are not met and the algorithm is a bit slower than the same algorithm written with MMX instructions. What should be changed or added to get better performances with SSE2 than with MMX.

Thanx

Jong

Jong,
It looks like you are trying to unpack the source bytes to word values for SSE2. The _mm_set instrinsic is not the best way to do this.

Here's a different way of coding it that is 15 times faster:

for(j=0;j Buffer[j] = (Iu16vec8)_mm_unpacklo_epi8(_mm_loadl_epi64( (__m128i const*)&src[j*8] ), _mm_setzero_si128());

If you compare the disassembly produced by the compiler, you'll see this is much more streamlined.

Regards,
Mike Stoner
Applications Engineer
Intel Software Solutions Group

Leave a Comment

Please sign in to add a comment. Not a member? Join today