I used as this way:
__declspec(align(16)) int diff[16];
but when I use these data as this way:
__m128i *d = (__m128i*) diff;
dl0 = _mm_load_si128(d);
dl3 = _mm_load_si128(d+3);
the program crashed. it can only use the function _mm_loadu_si128, butits performace is rather slowly than the function _mm_load_si128.
the result is: when I overwrite the arithmetic, It not easy to see the sse2's strong suit.
how can I align data correctly?
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

