instructional change __m128i

instructional change __m128i

Hi, good afternoon.

I am using a __m128i for store 16 elements of 8 bits

__m128i s0 = _mm_set_epi8(pixelsTemp[95], pixelsTemp[94], pixelsTemp[93], pixelsTemp[92], pixelsTemp[91], pixelsTemp[90], pixelsTemp[89], pixelsTemp[88], pixelsTemp[87], pixelsTemp[86], pixelsTemp[85], pixelsTemp[84], pixelsTemp[83], pixelsTemp[82], pixelsTemp[81], pixelsTemp[224]);

__m128i s1 = _mm_set_epi8(pixelsTemp[239], pixelsTemp[238], pixelsTemp[237], pixelsTemp[236], pixelsTemp[235], pixelsTemp[234], pixelsTemp[233], pixelsTemp[232], pixelsTemp[231], pixelsTemp[230], pixelsTemp[229], pixelsTemp[228], pixelsTemp[227], pixelsTemp[226], pixelsTemp[80], (char)(175));

after i am adding both variables (s0 and s1)

__m128i sum = _mm_add_epi8 (s0, s1);

the problem is that when the sum is greater than 255, stored back zero, I know that 8 bits can store up to 255 maximum. 

But the question is whether there is any instruction to store the result of the amounts in 16 bits rather than 8 bits.

Or if there is a better way of working vectorially?

thank you so much

2 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

You can convert your numbers to 16-bit values before adding this. This way, you'll capture all bits of the result. Obviously, you will then need twice the number of additions to add pairs of 16-bit integers.

The conversion can be done easily with _mm_unpacklo_epi8 and _mm_unpackhi_epi. Please note that these instructions are interleaving put. You therefore need to prepate an all-zero vector as the other source.

 

Kind regards

Thomas

 

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen