Intel® Advisor: Vectorization Advisor Glossary

Publicado:10/20/2014   Última actualización:08/03/2018

Intel® Advisor provides design tools to help ensure your Fortran, C and C++ native/managed applications realize full performance potential on modern processors:

  • Vectorization Advisor is a vectorization optimization tool that lets you identify high-impact, under-optimized loops, what is blocking vectorization, and where it is safe to force vectorization. It also provides code-specific how-can-I-fix-this-issue? recommendations.
  • Roofline Analysis visualizes actual performance against hardware-imposed performance ceilings (rooflines). It provides insights into where the bottlenecks are, which loops are worth optimizing for performance, what are the likely causes of bottlenecks and what should be the next optimization steps.
  • Offload Advisor (Intel® Advisor Beta only) allows you to identify high-impact opportunities to offload to GPU as well as the areas that are not advantageous to offload. It provides performance speedup projection on accelerators along with offload overhead estimation and pinpoints accelerator performance bottlenecks.
  • Threading Advisor is a fast-track threading design and prototyping tool that lets you analyze, design, tune, and check threading design options without disrupting your normal development.

For details about each tool, see the Intel Advisor User Guide.

The following is a glossary for the Vectorization Advisor. It is a work in progress.

alignment of code / code alignment: Placement of a contiguous code section (loop or function) in memory such that the address of the first byte is divisible by a power of two. Such a code section is called n-byte aligned.

alignment of data / data alignment: Placement of contiguous data (such as a variable or C/C++ struct/class) in memory such that the address is divisible by a power of two. You may achieve better performance if data is aligned, at least, to its size.

call count: The number of times a loop is invoked.

CPU front-end: A part of CPU core that reads instructions from memory, decodes them, and sends them to the execution core (back-end). Under certain circumstances, the front-end may process too few instructions per clock cycle, which results in under-utilization of the back-end.

directive: A programming language construct that specifies how a compiler should process input. Same as a C/C++ pragma.

filling: Moving a variable from main memory to a register. Using variables in registers instead of main memory results in better performance.

FMA: Fused multiply-add instructions that improve the performance and accuracy of floating-point computations. Sample syntax: A = +A * B + C. These instructions are faster because the computation is not performed in steps, and more accurate because intermediate results are treated as infinite precision, with rounding done on store.

ICC: Command line for invoking the Intel® C Compiler on the Linux* platform. Often used as a shorthand for referring to the compiler.

ICL: Command line for invoking the Intel® C/C++ Compiler on the Microsoft Windows* platform. Often used as a shorthand for referring to the compiler.

ICPC: Command line for invoking the Intel® C++ Compiler on the Linux* platform. Often used as a shorthand for referring to the compiler.

IFORT: Command line for invoking the Intel® Fortran Compiler on the Windows* and Linux* platforms. Often used as a shorthand for referring to the compiler.

loop body: A vectorized loop (usually) compiler-generated from a source loop. Same as kernel loop.

peeled loop: A small, (usually) compiler-generated loop created to align the memory accesses inside the loop body and maximize its efficiency. The compiler peels off any initial iterations containing misaligned accesses, which leaves the remaining iterations’ memory accesses optimally aligned. A peeled loop always has a trip count smaller than the vector length.

register pressure: When the optimal number of registers is unavailable for variable allocation. High register pressure may result in spilling.

remainder loop: A (usually) compiler-generated loop created to clean up any remaining iterations that do not fit within the scope of the loop body. The compiler typically generates remainder loops when the source loop trip count is not a multiple of the vector length.

SIMD: Single-instruction-multiple-data. A processor instruction that performs the same operation on multiple pieces of data (such as elements of an array).

source loop: A developer-written loop as it appears in source code.

spilling: Moving a variable from a register to main memory. A spilled variable must be loaded in and out of main memory for every read/write operation, resulting in poorer performance.

trip count: The number of times the body of a loop will execute. Same as iteration count (and sometimes referred to as loop count in Intel compiler documentation).

unroll: Optimize a loop by duplicating its body, thus reducing the branching overhead and the number of loop iterations that must execute. A complete unroll fully duplicates the loop body such that no repetition is required. A partial unroll of size n duplicates the body n times and reduces the number of iterations to 1/n of the original iteration count.

vector length: Number of elements that can be processed in the same operation. Ideal vector length = vector register width in bits / data type size in bits.

vector register width: The number of bits in the processor vector registers. Intel® Streaming SIMD Extensions 2 (Intel® SSE2) instructions operate on 128-bit registers; Intel® Advanced Vector Extensions (Intel® AVX) instructions operate on 256-bit registers; Intel® Many Integrated Core Instructions (Intel® MIC Instructions) operate on 512-bit registers.

vectorize: Generate code that takes advantage of processor vectorization hardware, usually by executing SIMD instructions.

Información sobre productos y desempeño

1

Los compiladores Intel pueden o no optimizar al mismo nivel para los microprocesadores que no son Intel en optimizaciones que no son exclusivas de los microprocesadores Intel. Estas optimizaciones incluyen los conjuntos de instrucciones SSE2, SSE3 y SSSE3, y otras optimizaciones. Intel no garantiza la disponibilidad, funcionalidad o eficacia de ninguna optimización en microprocesadores que no sean fabricados por Intel. Las optimizaciones dependientes del microprocesador en este producto fueron diseñadas para usarse con microprocesadores Intel. Ciertas optimizaciones no específicas de la microarquitectura Intel se reservan para los microprocesadores Intel. Consulte las guías de referencia y para el usuario para obtener más información acerca de los conjuntos de instrucciones específicos cubiertos por este aviso.

Revisión del aviso n.° 20110804