errors with SSE data types

errors with SSE data types

Bild des Benutzers psheph

I am trying to learn my way around the SSE compiler intrinsics, but am getting some errors I can't figure out... Hopefully someone point me in the right direction.

I have two arrays of 16 bit integers that I am trying to multiply. I want to compare the speed of a standard for loop to a loop I write with intrinsics, so here is how I am declaring the arrays:

short *array1 = new ( _mm_malloc( sizeof(short[length1/2]), 16)) short;

where length1 is the byte length of the file I will be reading the array from. I am declaring two more arrays identically...

When it comes time to do the SSE math, here is my code:

__m128 a,b,c;

for (int i = 0; i < max_index; i=i+8)
{
a = _mm_load_si128( (__m128 *) array1 + i*2);
b = _mm_load_si128( (__m128 *) array2 + i*2);

c = _mm_mullo_epi16( a, b);

_mm_store_si128( (__m128 *) out2 + i*2, c);
}

every time I try to read or write the value of a,b, or c, it points to variable and gives the error:

no suitable constructor exists to convert from "__m128" to "__m128"

Any suggestions?

Thanks,
Paul

4 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.
Bild des Benutzers lionelk

Hello psheph,
did you " #include "xmmintrin.h" " ? According to info I recieved you need this header to implement the intrinsics (but I only use SSE, you might need a different one for MMX). Also, see my post "usage of _mm_prefetch(...)" on page 3. Although it deals with prefetching, it has a couple of lines of SSE intrinsic stuff that seems to do something similar to what you are trying to do, and I know this works.

Are you sure you should have that "i*2" at the end of your "a" and "b" intrinsic data type declarations?

Bild des Benutzers bronx

at 1st sight your code looks strange on some points, for example the new that call _mm_malloc, _mm_malloc isn't an allocator to use as a "placement new" but already returns a pointer, just do smthg like :

short *array1 = (short *)_mm_malloc(sizeof(short)*length,16);

the prototype for "_mm_load_si128" use "__m128i" as parameters so including and replacing __m128 by __m128i should solve your compilation issues at least

Bild des Benutzers Yaroslav Morkovnikov (Intel)

Try
typedef union {
int in[2];
__m128 out;
} cvt;

cvt convert_tmp;

convert_tmp.in[0] = ...
convert_tmp.in[1] = ...
a = _mm_load_si128(convert_tmp.out);

Melden Sie sich an, um einen Kommentar zu hinterlassen.