Enforcing Loop Vectorization with Array Notations

Enforcing Loop Vectorization with Array Notations

Jorge Martinis's picture

I wonder what would be preventing the compiler from vectorizing the innermost loop in the following function (e.g.): template inline void MatrixVectorProduct(const matrix& m, const std::vector& rhs, std::vector& lhs) { size_t cols = m.cols();
const T* restrict pcol = &(*rhs.begin()); //outer loop (/Qvec-report:3): nonstandard loop is not a vectorization candidate (Fine!) _Cilk_for(size_t i=0; i { const T* prow = &(*(m.begin() + i * cols)); //inner loop(/Qvec-report:3): modifying order of operation not allowed under given switches (?) lhs[i] = __sec_reduce_add( prow[0:cols] * pcol[0:cols] ); } } Under the switches: /O3 /Qstd=c99 /Qopenmp /Qfp-speculation:safe /Qrestrict /arch:SSE2, this function's performance approaches Intel MKL's 'cblas_dgemv()'. Cheers,

13 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Pablo Halpern (Intel)'s picture

When I tried this, instantiated with both float and double and using both the Windows and Linux versions of the compiler, the vectorization report I get says that the inner loop is vectorized. What does your matrix class look like? Are you using the released version of the compiler, or a beta, or something else?

- Pablo

Brandon Hewitt (Intel)'s picture

I think we may have figured out the trigger here. Assuming you're building out of the IDE, is /fp:precise specified by default? Try changing to /fp:fast if it is.

The question I'm following up on is whether this behavior of the vectorizer makes sense in the context of array notations.

Brandon Hewitt Technical Consulting Engineer Tools Knowledge Base: "http://software.intel.com/en-us/articles/tools" Software Product Support info: "http://www.intel.com/software/support"
Jorge Martinis's picture

Indeed, I intentionally specify /fp:precise.

Jorge Martinis's picture

After switching to /fp:fast, the loop is vectorized. However, it crashes at runtime with thread/call stack stalled right at the loop.

Jorge Martinis's picture

My matrix class uses contiguous storage and row-major layout. I am using the Intel Composer 2011 XE Update 1 (12.1.127).

Brandon Hewitt (Intel)'s picture

Jorge,

If you turn on /W4, do you get any remarks like the following?

remark #18009: A temporary array is allocated to resolve data dependencies

If so, I think you might have a stack overflow caused by some of the array notation code. Let me know - I have an open problem report on this that I can link this thread to.

Brandon Hewitt Technical Consulting Engineer Tools Knowledge Base: "http://software.intel.com/en-us/articles/tools" Software Product Support info: "http://www.intel.com/software/support"
Jorge Martinis's picture

Brandon, After turning on /W4, I found no remarks. Under /fp:fast the MatrixVectorProduct() function (thread #1) builds and runs. On the other hand, the following function works under /fp:precise (w/o innermost loop vectorization) whereas under /fp:fast the innermost loop is vectorized but it crashes at runtime due to an unhandled access violation. template inline void MatrixProduct(const matrix& m, const matrix& rhs, matrix& lhs) { //assert(...) on all dimensions size_t mcols = m.cols(); size_t ncols = rhs.cols(); const T* pcol = &(*rhs.begin());//restrict pointer candidate _Cilk_for(size_t i=0; i { const T* prow = &(*(m.begin() + i * mcols)); for(size_t j=0; j { lhs[i][j] = __sec_reduce_add(prow[0:mcols] * pcol[j:mcols:ncols]);//acc violation on vect } } } Compiler: /c /O2 /Ob2 /Oi /Ot /Oy /Qipo /I "C:\\Program Files (x86)\\Intel\\ComposerXE-2011\\mkl\\include\\ia32" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /EHsc /MD /GS /Gy /arch:SSE2 /fp:fast /Fo"Release/" /Fd"Release/vc90.pdb" /W4 /nologo /Zi /Qopenmp /Quse-intel-optimized-headers /Qstd=c99 /Qrestrict /Qvec-report3 Linker: mkl_intel_c.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /INCREMENTAL:NO /nologo /LIBPATH:"C:\\Program Files (x86)\\Intel\\ComposerXE-2011\\mkl\\lib\\ia32" /NODEFAULTLIB:"libcmt.lib" /TLBID:1 /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /DYNAMICBASE /NXCOMPAT /MACHINE:X86 Cheers,

Brandon Hewitt (Intel)'s picture

Hi Jorge,

This definitely looks like a compiler issue from what you've sent me. The vectorizer is doing something improperly, I think. I've created a problem report for our vectorizer team, and I'll update the thread as their investigation proceeds.

Brandon Hewitt Technical Consulting Engineer Tools Knowledge Base: "http://software.intel.com/en-us/articles/tools" Software Product Support info: "http://www.intel.com/software/support"
Jorge Martinis's picture

Brandon, I think I've found an answer to our follow-up question in a related article at: http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/ It seems that the behavior makes sense on this context due to the fact that the /fp:precise model allows only value-safe optimizations. The reduction loop in __sec_reduce_add() implies sums reassociation, making it value-unsafe. Question remains on why it does fail under /fp:fast though. Regards,

Brandon Hewitt (Intel)'s picture

Hi Jorge,

Correct. Because /fp:precise is specified, the compiler can't safely vectorize the array notation reduction. However, the code crashing after vectorization is still an issue it seems to me.

Brandon Hewitt Technical Consulting Engineer Tools Knowledge Base: "http://software.intel.com/en-us/articles/tools" Software Product Support info: "http://www.intel.com/software/support"
Jorge Martinis's picture

Brandon, I agree. A very important one indeed. I look forward to hearing from that. Cheers,

Brandon Hewitt (Intel)'s picture

Hi Jorge,

We've put a fix in on update 3 for this issue. Try update 3, and let me know if you still have problems.

Brandon Hewitt Technical Consulting Engineer Tools Knowledge Base: "http://software.intel.com/en-us/articles/tools" Software Product Support info: "http://www.intel.com/software/support"

Login to leave a comment.