Developer Guide and Reference

Contents

vector

Indicates to the compiler that the loop should be vectorized according to the argument keywords.

Syntax

#pragma vector {always[assert]|aligned|unaligned|dynamic_align[(var)]|nodynamic_align|temporal|nontemporal|[no]vecremainder|[no]mask_readwrite|vectorlength(n1[, n2]...)}
#pragma vector nontemporal
[
(
var1
[,
var2
,
...
])
]
Arguments
always
Instructs the compiler to override any efficiency heuristic during the decision to vectorize or not, and vectorize non-unit strides or very unaligned memory accesses; controls the vectorization of the subsequent loop in the program; optionally takes the keyword assert
aligned
Instructs the compiler to use aligned data movement instructions for all array references when vectorizing
unaligned
Instructs the compiler to use unaligned data movement instructions for all array references when vectorizing
dynamic_align
[(
var
)]
Instructs the compiler to perform dynamic alignment optimization for the loop with an optionally specified variable to perform alignment on
nodynamic_align
Disables dynamic alignment optimization for the loop
multiple_gather_scatter_by_shuffles
Instructs the optimizer to disable the generation of gather/scatter and to transform gather/scatter into unit-strided loads/stores plus a set of shuffles wherever possible
nomultiple_gather_scatter_by_shuffles
Instructs the optimizer to enable the generation of gather/scatter instructions and not to transform gather/scatter into unit-strided loads/stores
nontemporal
Instructs the compiler to use non-temporal (that is, streaming) stores on systems based on all supported architectures, unless otherwise specified; optionally takes a comma-separated list of variables.
When this pragma is specified, it is your responsibility to also insert any fences as required to ensure correct memory ordering within a thread or across threads. One typical way to do this is to insert a
_mm_sfence
intrinsic call just after the loops (such as the initialization loop) where the compiler may insert streaming store instructions.
temporal
Instructs the compiler to use temporal (that is, non-streaming) stores on systems based on all supported architectures, unless otherwise specified
vecremainder
Instructs the compiler to vectorize the remainder loop when the original loop is vectorized
novecremainder
Instructs the compiler not to vectorize the remainder loop when the original loop is vectorized
mask_readwrite
Disables memory speculation, causing the generation of masked load and store operations within conditions
nomask_readwrite
Enables memory speculation, causing the generation of non-masked loads and stores within conditions
vectorlength
(
n1
[,
n2
]...)
Instructs the vectorizer which vector length/factor to use when generating the main vector loop.
Description
The
vector
pragma indicates that the loop should be vectorized, if it is legal to do so, ignoring normal heuristic decisions about profitability. The
vector
pragma takes several argument keywords to specify the kind of loop vectorization required. These keywords are
aligned
,
unaligned
,
always
,
temporal
, and
nontemporal
. The compiler does not apply the vector pragma to nested loops, each nested loop needs a preceding pragma statement. Place the pragma before the loop control statement.
Using the
aligned/unaligned
keywords
When the
aligned/unaligned
argument keyword is used with this pragma, it indicates that the loop should be vectorized using aligned/unaligned data movement instructions for all array references. Specify only one argument keyword:
aligned
or
unaligned.
If you specify
aligned
as an argument, you must be sure that the loop is vectorizable using this pragma. Otherwise, the compiler generates incorrect code.
Using the
always
keyword
When the
always
argument keyword is used, the pragma controls the vectorization of the subsequent loop in the program. If
assert
is added, the compiler will generate an error-level assertion test to display a message saying that the compiler efficiency heuristics indicate that the loop cannot be vectorized.
Using the
dynamic_align
and
nodynamic_align
keywords
Dynamic alignment is an optimization the compiler attempts to perform by default. It involves peeling iterations from the vector loop into a scalar loop before the vector loop so that the vector loop aligns with a particular memory reference. The
dynamic_align
(
var
) form of the directive allows the user to provide a scalar or array variable name to align on. Specifying
nodynamic_align
with or without
var
does not guarantee the optimization is performed; the compiler still uses heuristics to determine feasibility of the operation.
Using the
multiple_gather_scatter_by_shuffles
and
nomultiple_gather_scatter_by_shuffles
keywords
These clauses do not affect loops nested in the specified loop.
Using the
nontemporal
and
temporal
keywords
The
nontemporal
and
temporal
argument keywords are used to control how the "stores" of register contents to storage are performed (streaming versus non-streaming) on systems based on
IA-32 and
Intel® 64 architectures.
By default, the compiler automatically determines whether a streaming store should be used for each variable.
Streaming stores may cause significant performance improvements over non-streaming stores for large numbers on certain processors. However, the misuse of streaming stores can significantly degrade performance.
Using the
[no]vecremainder
keyword
If the
vector always
pragma and keyword are specified, the following occurs:
  • If the
    vecremainder
    clause is specified, the compiler vectorizes both the main and remainder loops.
  • If the
    novecremainder
    clause is specified, the compiler vectorizes the main loop, but it does not vectorize the remainder loop.
Using the
[no]mask_readwrite
keyword
If the
vector
pragma and
mask_readwrite
or
nomask_readwrite
keyword are specified, the following occurs:
  • If the
    mask_readwrite
    clause is specified, the compiler generates masked loads and stores within all conditions in the loop.
  • If the nomask_readwrite clause is specified, the compiler generates unmasked loads and stores for increased performance.
Using the
vectorlength
keyword
n
is an integer power of 2; the value must be 2, 4, 6, 8, 16, 32, or 64. If more than one value is specified, the vectorizer will choose one of the specified vector lengths based on a cost model decision.
The pragma
vector{always|aligned|unaligned}
should be used with care.
Overriding the efficiency heuristics of the compiler should only be done if the programmer is absolutely sure that vectorization will improve performance. Furthermore, instructing the compiler to implement all array references with aligned data movement instructions will cause a run-time exception in case some of the access patterns are actually unaligned.

Examples

In the following example, the
aligned
argument keyword is used to request that the loop be vectorized with aligned instructions.
Note that the arrays are declared in such a way that the compiler could not normally prove this would be safe to vectorize.
Example: Using the
vector aligned
pragma
void vec_aligned(float *a, int m, int c) {   int i;   // Instruct compiler to ignore assumed vector dependencies.   #pragma vector aligned   for (i = 0; i < m; i++)     a[i] = a[i] * c;   // Alignment unknown but compiler can still align.   for (i = 0; i < 100; i++)     a[i] = a[i] + 1.0f; }
Example: Using the
vector always
pragma
void vec_always(int *a, int *b, int m) {   #pragma vector always   for(int i = 0; i <= m; i++)     a[32*i] = b[99*i]; }