Compiler Methodology for Intel® MIC Architecture
Getting Started with Intel Compiler Pragmas and Directives
Compiler options allow a user to control how source files are interpreted and control characteristics of the object files or executables. Compiler options are applied to an entire source file. So what do you do if there are particular loops, variables, functions, or procedures which you need to control? This is where compiler pragmas and directives are useful. In this chapter, we explain where to find documentation on pragmas and directives and highlight a subset of these that are most useful for performance on either Intel® Xeon® processors or the Intel® Xeon Phi™ coprocessor.
A C/C++ 'pragma' is a command to be interpreted by the Intel C++ compiler and helps guide the compiler actions during compilation. For Fortran, a 'directive' is a command to be interpreted by the Intel Fortran compile and helps guide compiler actions during compilation. Syntax is obviously different for C++ pragmas versus Fortran directives. However, the actions specified to the two Intel compilers is often equivalent.
C++ Pragmas: Pragmas are directives that provide instructions to the compiler for use in specific cases. For example, you can use the novector pragma to specify that a loop should never be vectorized. The keyword #pragma is standard in the C++ language, but individual pragmas are machine-specific or operating system-specific, and vary by compiler.
Some pragmas provide the same functionality as compiler options. Pragmas override behavior specified by compiler options.
Some pragmas are available for both Intel and non-Intel processors but they may perform additional optimizations for Intel® processors than they perform for non-Intel processors. Refer to the individual pragma page for detailed description.
Documentation on C++ pragmas recognized by the Intel® C/C++ Composer XE compiler: Pragmas are documented in the product documentation, the "Intel® C++ Compiler XE User and Reference Guides". If you have forgotten where to find your documentation, please review this information from the chapter "New User Compiler Basic Usage". From the Contents of the User and Reference Guide, open chapter "Compiler Reference", "Pragmas", "Overview/Intel® C++ Compiler Pragmas"
Fortran Directives: Compiler directives are special commands that direct the action of the compilation. The directives recognized by the Intel® Fortran Composer XE are unique to this compiler. Compiler directives override any compiler options.
Documentation on directives recognized by the Intel® Fortran Composer XE compiler: Directives are documented in the product documentation, the "Intel® Fortran Compiler XE User and Reference Guides". If you have forgotten where to find your documentation, please review this information from the chapter "New User Compiler Basic Usage". From the Contents of the User and Reference Guide, open chapter "Language Reference", "Directive Enhanced Compilation", "Directive Enhanced Compilation Overview"
The following pragmas/directives are common to C++ and Fortran and are essential to understand. Please open your browser window to the pragma/directive documentation as described above. Research each of these pragma/directives for your language. In the description below, both C++ and Fortran syntax is shown in the order C++/Fortran:
- simd/SIMD - These directives give control to the programmer to force vectorization. SIMD control is a powerful tool. Please read the presentation Vectorization: Pragma/Directive SIMD for an overview. If you have been using IVDEP, consider moving to use of SIMD.
- ivdep/IVDEP - this pragma/directive instructs the compiler to ignore assumed vector dependencies in loops that are vectorization targets. To ensure correct code, the compiler treats an assumed dependence as a proven dependence, which prevents vectorization. This pragma overrides that decision.Use this pragma only when you know that the assumed loop dependencies are safe to ignore. NOTE that this directive only affects loops with assumed dependencies. The compiler will ignore this directive if the dependency is 100% certain. For greater control, consider using pragma/directive SIMD.
- loop_count/LOOP COUNT - this pragma/directive informs the compiler of the iterations to be expected for a loop. This is important since many loops have bounds that are set with variables and only known to the compiler at runtime. The compiler attempts to determine the number of iterations for a loop AND the work within that loop to determine if vectorizing the loop is "profitable" - that is, is there a performance benefit for vectorizing this loop given the overhead associated with creating this as a vectorized loop. Since the compiler often cannot determine the loop trip count at compile time, it may refuse to vectorize such a loop. -vec-report will report these non-vectorized loops as "not profitable" or "may not be profitable". This directive allows the programmer to give the compiler hints on the expected loop trip count(s) and hence help with the profitability analysis.
- vector/VECTOR and novector/NOVECTOR - override compiler options on loop vectorization and force vectorization if it is legal to do so. Again, if the compiler is 100% certain there is a dependency in the loop it will ignore this directive. The novector directive is often used to prevent vectorization of loops whose vectorization may cause issues in numerical accuracy.
- inline,forceinline,noinline/INLINE,FORCEINLINE,NOINLINE - gives fine grain control on the compiler inlining heuristics (examples: C++, Fortran).
- unroll/UNROLL, nounroll/NOUNROLL, unroll_and_jam/UNROLL_AND_JAM - these directives enable or disable loop unrolling (and jamming). unroll tells the compiler optimizer how many times to unroll a loop. This can aide vectorization. unroll_and_jam directive partially unrolls one or more loops higher in the nest than the innermost loop and fuses (jams) the resulting loops back together. This transformation allows more reuses in the loop. unroll_and_jam directive is only active at optimization level O3.
- nofusion/NOFUSION - prevents the loop fusion optimization.
- distribute_point - Instructs the compiler to prefer loop distribution at the location indicated. When the pragma is placed inside a loop, the compiler distributes the loop at that point. All loop-carried dependencies are ignored.
There are more pragmas/directives. The above are key optimization pragmas/directives with which every performance oriented programmer should be familiar. Again, take some time to become familiar with the set of pragmas/directives listed above.
SIMD directives allow programmers the most control on loop vectorization. IVDEP and VECTOR are a weaker and older options that provide a suggestion or hint to the compiler. However, the compiler may choose to ignore IVDEP and VECTOR if the compiler is certain that there is a dependency in a loop. You should have reviewed the presentation Vectorization: Pragma/Directive SIMD before leaving this chapter: it is key to vectorization and hence performance on the Intel® Xeon Phi™ coprocessor.
It is essential that you read this guide from start to finish using the built-in hyperlinks to guide you along a path to a successful port and tuning of your application(s) on Intel® Xeon Phi™ coprocessors. The paths provided in this guide reflect the steps necessary to get best possible application performance.
Go back to chapter "New User Compiler Basic Usage"