Performance Essentials with OpenMP 4.0 Vectorization
Techniques a developer can use to utilize vector hardware to potentially improve application performance by using explicit vector programming methods with OpenMP* 4.0 in C/C++.
remainder loop was not vectorized: vectorization possible but seems inefficient
The compiler only auto-vectorizes a loop if its internal heuristics indicate that a speed-up is likely. If a speed-up seems unlikely or is too uncertain, the compiler emits the message "vectorization possible but seems inefficient" and does not vectorize the loop. Common reasons for this include:
The compiler has detected a potential backward dependency between loop iterations that could make vectorization unsafe. The compiler will not auto-vectorize a loop if there are any data values for which vectorization could lead to an incorrect result. Two common examples are:
1. reading from an array element after writing to it in a preceding iteration;
2. writing data using pointers that might be aliased to other data that is also being accessed in the loop. (That is, the data might overlap).
1. More than one exit point in the loop. A loop must have a single entry and a single exit point. Multiple exit points in a loop can cause this message.
2. An iteration count that is data dependent. The iteration count must be known at entry to the loop.
3. Loop contains a subroutine or function call that prevents vectorization.
4. Other complex control structures, for example, use of multiple GOTO statements.
Below are examples for the first three scenarios.