Vectorization issue with ICC 12.1.3

Vectorization issue with ICC 12.1.3

Hello,
I am using ICC to compile C function which perform basic picture analysis on pixels. I want to compare sequential and parallel OpenMP version of my program to check the improvement.The problem is that I have different problem with vectorization of my inner loops that I do not understand.For instance, I get this kind of report (-vec-report3 activated):../src/OdPictureAnalysisP4A.c(648): (col. 4) remark: PARTIAL LOOP WAS VECTORIZED.../src/OdPictureAnalysisP4A.c(674): (col. 7) remark: loop was not vectorized: unsupported loop structure.../src/OdPictureAnalysisP4A.c(666): (col. 7) remark: loop was not vectorized: unsupported loop structure.../src/OdPictureAnalysisP4A.c(679): (col. 54) remark: loop was not vectorized: dereference too complex.../src/OdPictureAnalysisP4A.c(674): (col. 7) remark: LOOP WAS VECTORIZED.../src/OdPictureAnalysisP4A.c(682): (col. 7) remark: LOOP WAS VECTORIZED.
There is a contradiction in the same report about the vectorization of one loop.Now, same source code, activating openMP at compile time (-openmp), I get this:../src/OdPictureAnalysisP4A.c(666): (col. 7) remark: LOOP WAS VECTORIZED.../src/OdPictureAnalysisP4A.c(674): (col. 7) remark: LOOP WAS VECTORIZED.../src/OdPictureAnalysisP4A.c(674): (col. 7) remark: LOOP WAS VECTORIZED.../src/OdPictureAnalysisP4A.c(682): (col. 7) remark: LOOP WAS VECTORIZED.
No problem this time.Secondly, I have another issue is that OpenMP #pragma influence the vectorization. Here -openmp is activated for both test.First case, 2 nested loops without OpenMP #pragma, the inner one is vectorized.Second case, 2 nested loops again, but with #pragma, the inner one is not vectozired anymore, the reason :../src/OdPictureAnalysisRK_P4A.c(704): (col. 7) remark: loop was not vectorized: unsupported loop structure.
I am wondering if this issue is specific to this version of ICC? What could I try to fix this?In all these cases I compile with -fast option. The system is Debian Linux, and the hardware is x86_64 Intel Xeon X5670.Kind regards

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

When I see such contradictory messages about vectorization of a loop, typically it means that the compiler has created 2 versions with run-time selection, only one of which is optimized. The opt-report should indicate whether there is versioning. This is most annoying in the case of nested loops, where my only remedy is to check whether the optimized code is executed when running my actual case. In order to trace it (e.g. under VTune), it is preferable to compile without interprocedural analysis (-fno-inline-functions -no-ipo rather than -fast).
I submitted an issue on premier.intel.com for a case where the 11.1 compiler vectorized without the multiple versioning problem, and the 12.x compilers can recover if the vectorizable code is written in CEAN (extended array notation). I've asked; it was not intentional that cases which optimized prior to introduction of CEAN now require the CEAN for satisfactory operation.
-openmp disables many of the compiler's multi-level loop optimizations in the parallel region. You must write the loop nest optimizations yourself. As you have seen, if you do so, you may avoid the compiler doing undesirable things.

Leave a Comment

Please sign in to add a comment. Not a member? Join today