Developer Guide and Reference

Contents

Programming with Auto-parallelization

The auto-parallelization feature implements
some
concepts
of OpenMP*,
such as the worksharing construct (with the
PARALLEL
for
directive). This section provides details on auto-parallelization.

Guidelines for Effective Auto-parallelization Usage

A loop can be parallelized if it meets the following criteria:
  • The loop is countable at compile time: This means that an expression representing how many times the loop will execute (loop trip count) can be generated just before entering the loop.
  • There are no
    FLOW
    (
    READ
    after
    WRITE
    ),
    OUTPUT
    (
    WRITE
    after
    WRITE
    ) or
    ANTI
    (
    WRITE
    after
    READ
    ) loop-carried data dependencies. A loop-carried data dependency occurs when the same memory location is referenced in different iterations of the loop. At the compiler's discretion, a loop may be parallelized if any assumed inhibiting loop-carried dependencies can be resolved by run-time dependency testing.
The compiler may generate a run-time test for the profitability of executing in parallel for loop, with loop parameters that are not compile-time constants.
Coding Guidelines
Enhance the power and effectiveness of the auto-parallelizer by following these coding guidelines:
  • Expose the trip count of loops whenever possible; use constants where the trip count is known and save loop parameters in local variables.
  • Avoid placing structures inside loop bodies that the compiler may assume to carry dependent data, for example, procedure calls, ambiguous indirect references or global references.

Auto-parallelization Data Flow

For auto-parallelization processing, the compiler performs the following steps:
  1. Data flow analysis:
    Computing the flow of data through the program.
  2. Loop classification:
    Determining loop candidates for parallelization based on correctness and efficiency.
  3. Dependency analysis:
    Computing the dependency analysis for references in each loop nest.
  4. High-level parallelization:
    Analyzing the dependency graph to determine loops that can execute in parallel, and computing run-time dependency.
  5. Data partitioning:
    Examining data reference and partition based on the following types of access:
    SHARED
    ,
    PRIVATE
    , and
    FIRSTPRIVATE
    .
  6. Multithreaded code generation:
    Modifying loop parameters, generating entry/exit per threaded task, and generating calls to parallel run-time routines for thread creation and synchronization.
Options that use OpenMP* are available for both Intel® and non-Intel microprocessors, but these options may perform additional optimizations on Intel® microprocessors than they perform on non-Intel microprocessors. The list of major, user-visible OpenMP* constructs and features that may perform differently on Intel® microprocessors than on non-Intel microprocessors includes: locks (internal and user visible), the SINGLE construct, barriers (explicit and implicit), parallel loop scheduling, reductions, memory allocation, and thread affinity and binding.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804