User Guide

Contents

Advanced Modeling Options

When you select a
Target System
of
Intel® Xeon Phi™
or
Offload to Intel Xeon Phi
coprocessor, additional modeling parameters appear below
Runtime Modeling
area under
Intel Xeon Phi Advanced Modeling
:
  • Select
    Consider Code Vectorization
    if you agree to modify your parallel code later to improve vector parallel execution. If checked, you can specify:
    • Reference CPU Vectorization Speedup
      you expect can be achieved. This value indicates the speedup multiplier gain for the current site by using vectorization techniques with the reference CPU. When providing this estimate, base your estimates on target device characteristics and your expertise of
      how much
      and
      how well
      this part of code can be vectorized.
    • Intel Xeon Phi Vectorization Speedup
      you expect can be achieved. This value indicates the speedup multiplier gain for current site by using vectorization techniques with an
      Intel® Xeon Phi™
      processor. When providing this estimate, base your estimates on target device characteristics and your expertise of
      how much
      and
      how well
      this part of code can be vectorized.
  • When you choose
    Target System
    as
    Offload to Intel Xeon Phi
    , you can select the
    Offload Transfer Data Size
    to specify data transfer size value you expect can be achieved (unit is KB).
  • Click
    Apply
    after modifying any of these values.
In some cases, you can restructure your code to enable more efficient vector operations. Loop vectorization allows hardware to process data independently in smaller units (usually 64-byte), such as operations on data arrays.
One way to enable more efficient vector operations is to modify a
single
loop to create a new outer loop where the two loops cover the same iteration space. A technique called strip-mining allows the innermost loop to use vector operations in small chunks.
Other ways to enable more efficient vector operations include examining outermost loops where threading parallelism might already be used, and consider vectorizing its innermost loops and/or callee functions.
Certain innermost loops may benefit from OpenMP 4 constructs. That is, under certain conditions you can use both an
omp parallel for
threading pragma and a
omp simd
(or similar) simd vectorization pragma (see the compiler vectorization report and descriptions at http://openmp.org).
The processor microarchitecture determines the type of vector instructions that will be supported and thus the size of data the hardware can process efficiently.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.