I'm sure this question that has been asked dozens of times. I just can't seem to figure out how to structure a search query that finds the answer.
PROBLEM: I allocated an __m128 aligned data array using new in the constructor. I then used it to perform an operation with _mm_dp_ps() within a function. It worked fine with no
optimization. Using full optimization with the Intel Compiler, the data become
unaligned and all sorts of bad things happened.
QUESTION: Isn't __m128 defined as being 16 byte aligned? Is this alignment not guaranteed with optimization? If so, is this a bug? Or did I just do something silly that I can't see?
(By the way, I got around this problem by dynamically allocating aligned data using "__m128 *sse_result = (__m128*) _mm_malloc(4*sizeof(__m128), 16);")
Here's snippets of my code:
In the class definition:
*transMat_sse; //the sepia transformation matrix
In the constructor:
transMat_sse = new
transformation matrix (MS version)
*(transMat_sse+0) = _mm_set_ps(0.393f,
0.769f, 0.189f, 0.0f);
*(transMat_sse+1) = _mm_set_ps(0.349f,
0.686f, 0.168f, 0.0f);
*(transMat_sse+2) = _mm_set_ps(0.272f,
0.534f, 0.131f, 0.0f);
*(transMat_sse+3) = _mm_set_ps( 0.0f,
0.0f, 0.0f, 1.0f);
My compiler arguments: /c /O2 /Ob2 /Oi /Qipo /I
"\\include" /I ".\\Workloads" /I
".\\external\\vtune\\include" /D "WIN32"
/D "NDEBUG" /D "_CONSOLE" /D
"_MBCS" /EHsc /MD /GS /Gy /fp:fast
/Fo"Win32\\Release_Intel/" /W3 /nologo /Zi /Qwd10121 /Qopenmp
/QaxSSE4.2 /QxSSE2 /Q_multisrc-