Loading unsigned char values into Iu16vec8 with SSE2

Loading unsigned char values into Iu16vec8 with SSE2

Аватар пользователя jong.mo

Hello,

I d like to load an image of unsigned char values into a

3 posts / 0 новое
Последнее сообщение
Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.
Аватар пользователя jong.mo

Oops, sorry for the previous message.

So I d like to load an image of unsigned char values into an array of Iu16vec8 so as to take advantage of SSE2 instruction set capabilities :

with Buffer, the Iu16vec8 array and src the source image, I ve used something like :

for(j=0;j



Unfortunately the expected performances are not met and the algorithm is a bit slower than the same algorithm written with MMX instructions. What should be changed or added to get better performances with SSE2 than with MMX.

Thanx

Jong

Аватар пользователя Michael Stoner (Intel)

Jong,
It looks like you are trying to unpack the source bytes to word values for SSE2. The _mm_set instrinsic is not the best way to do this.

Here's a different way of coding it that is 15 times faster:

for(j=0;j Buffer[j] = (Iu16vec8)_mm_unpacklo_epi8(_mm_loadl_epi64( (__m128i const*)&src[j*8] ), _mm_setzero_si128());

If you compare the disassembly produced by the compiler, you'll see this is much more streamlined.

Regards,
Mike Stoner
Applications Engineer
Intel Software Solutions Group

Зарегистрируйтесь, чтобы оставить комментарий.