Loading unsigned char values into Iu16vec8 with SSE2

Loading unsigned char values into Iu16vec8 with SSE2

Imagen de jong.mo

Hello,

I d like to load an image of unsigned char values into a

publicaciones de 3 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.
Imagen de jong.mo

Oops, sorry for the previous message.

So I d like to load an image of unsigned char values into an array of Iu16vec8 so as to take advantage of SSE2 instruction set capabilities :

with Buffer, the Iu16vec8 array and src the source image, I ve used something like :

for(j=0;j



Unfortunately the expected performances are not met and the algorithm is a bit slower than the same algorithm written with MMX instructions. What should be changed or added to get better performances with SSE2 than with MMX.

Thanx

Jong

Imagen de Michael Stoner (Intel)

Jong,
It looks like you are trying to unpack the source bytes to word values for SSE2. The _mm_set instrinsic is not the best way to do this.

Here's a different way of coding it that is 15 times faster:

for(j=0;j Buffer[j] = (Iu16vec8)_mm_unpacklo_epi8(_mm_loadl_epi64( (__m128i const*)&src[j*8] ), _mm_setzero_si128());

If you compare the disassembly produced by the compiler, you'll see this is much more streamlined.

Regards,
Mike Stoner
Applications Engineer
Intel Software Solutions Group

Inicie sesión para dejar un comentario.