I d like to load an image of unsigned char values into a
Oops, sorry for the previous message.
So I d like to load an image of unsigned char values into an array of Iu16vec8 so as to take advantage of SSE2 instruction set capabilities :
with Buffer, the Iu16vec8 array and src the source image, I ve used something like :
for(j=0;jUnfortunately the expected performances are not met and the algorithm is a bit slower than the same algorithm written with MMX instructions. What should be changed or added to get better performances with SSE2 than with MMX.ThanxJong
Jong,It looks like you are trying to unpack the source bytes to word values for SSE2. The _mm_set instrinsic is not the best way to do this.
Here's a different way of coding it that is 15 times faster:
for(j=0;j Buffer[j] = (Iu16vec8)_mm_unpacklo_epi8(_mm_loadl_epi64( (__m128i const*)&src[j*8] ), _mm_setzero_si128());
If you compare the disassembly produced by the compiler, you'll see this is much more streamlined.
Regards,Mike StonerApplications EngineerIntel Software Solutions Group