how can I align data correctly?

I used as this way:

__declspec(align(16)) int diff[16];

but when I use these data as this way:

__m128i *d = (__m128i*) diff;
dl0 = _mm_load_si128(d);
dl3 = _mm_load_si128(d+3);

the program crashed. it can only use the function _mm_loadu_si128, butits performace is rather slowly than the function _mm_load_si128.

the result is: when I overwrite the arithmetic, It not easy to see the sse2's strong suit.

