Developer Guide and Reference

Contents

Function Annotations and the SIMD Directive for Vectorization

This topic presents specific
C++
language features that better help to vectorize code.
The SIMD vectorization feature is available for both Intel® microprocessors and non-Intel microprocessors. Vectorization may call library routines that can result in additional performance gain on Intel® microprocessors than on non-Intel microprocessors.
The vectorization can also be affected by certain options, such as
/arch
(Windows*),
-m
(Linux*
and
macOS*
), or
[Q]x
.
The
__declspec(align(
n
))
declaration
enables you to overcome hardware alignment constraints. The auto-vectorization hints address the stylistic issues due to lexical scope, data dependency, and ambiguity resolution. The SIMD feature's
pragma
allows you to enforce vectorization of loops.
You can use the
__declspec(vector)
__attribute__(vector)
and the
__declspec(vector[clauses])
__attribute__(vector(
clauses
))
declarations
to vectorize user-defined functions and loops. For SIMD usage,
the
vector
function
is called from a loop that is being vectorized.
The C/C++ extensions for array notations
map
operations can be defined to provide general data parallel semantics, where you do not express the implementation strategy. Using array notations, you can write the same operation regardless of the size of the problem, and let the implementation use the right construct, combining SIMD, loops, and tasking to implement the operation. With these semantics, you can choose more elaborate programming and express a single dimensional operation at two levels, using both task constructs and array operations to force a preferred parallel and vector execution.
The usage model of the
vector
declaration
is that the code generated for the function actually takes a small section (
vectorlength
) of the array, by value, and exploits SIMD parallelism, whereas the implementation of task parallelism is done at the call site.
The following table summarizes the language features that help vectorize code.
Language Feature
Description
__declspec(align(
n
))
Directs the compiler to align the variable to an
n
-byte boundary. Address of the variable is
address
mod n=0
.
__declspec(align(
n
,off))
Directs the compiler to align the variable to an
n
-byte boundary with offset off within each
n
-byte boundary. Address of the variable is
address
mod n=off
.
__declspec(vector)
(Windows*)
__attribute__(vector)
(Linux*
and
macOS*
)
Combines with the
map
operation at the call site to provide the data parallel semantics. When multiple instances of the vector declaration are invoked in a parallel context, the execution order among them is not sequenced.
__declspec(vector[
clauses
])
(Windows*)
__attribute__(vector(
clauses
))
(Linux*
and
macOS*
)
Combines with the
map
operation at the call site to provide the data parallel semantics with the following values for
clauses
:
  • processor clause:
    processor(cpuid)
  • vector length clause:
    vectorlength(n)
  • linear clause:
    linear(param1:step1 [, param2:step2]…)
  • uniform clause:
    uniform(param [, param,]…)
  • mask clause:
    [no]mask
When multiple instances of the vector declaration are invoked in a parallel context, the execution order among them is not sequenced.
restrict
Permits the disambiguator flexibility in alias assumptions, which enables more vectorization.
__declspec(vector_variant(
clauses
))
(Windows*)
__attribute__(vector_variant(
clauses
))
(Linux*
and
macOS*
)
Provides the ability to vectorize user-defined functions and loops. The
clauses
are as follows:
  • implements clause (required):
    implements (function declarator) [, simd-clauses])
  • simd-clauses (optional): one or more of the clauses allowed for the vector attribute
__assume_aligned(
a
,
n
)
Instructs the compiler to assume that array
a
is aligned on an
n
-byte boundary; used in cases where the compiler has failed to obtain alignment information.
__assume(
cond
)
Instructs the compiler to assume that the represented condition is true where the keyword appears. Typically used for conveying properties that the compiler can take advantage of for generating more efficient code, such as alignment information.
Auto-Vectorization Hints
#pragma ivdep
Instructs the compiler to ignore assumed vector dependencies.
#pragma vector
{aligned|unaligned|always|temporal|nontemporal}
Specifies how to vectorize the loop and indicates that efficiency heuristics should be ignored. Using the
assert
keyword with the
vector {always}
pragma
generates an error-level assertion message if the compiler efficiency heuristics indicate that the loop cannot be vectorized. Use
#pragma ivdep!
to ignore the assumed dependencies.
#pragma novector
Specifies that the loop should never be vectorized.
Some
pragmas
are available for both Intel® microprocessors and non-Intel microprocessors, but may perform additional optimizations for Intel® microprocessors than for non-Intel microprocessors.
User-Mandated
Pragma