Programming with Auto-parallelization
Guidelines for Effective Auto-parallelization Usage A loop can be parallelized if it meets the following criteria:
- The loop is countable at compile time: This means that an expression representing how many times the loop will execute (loop trip count) can be generated just before entering the loop.
- There are noFLOW(READafterWRITE),OUTPUT(WRITEafterWRITE) orANTI(WRITEafterREAD) loop-carried data dependencies. A loop-carried data dependency occurs when the same memory location is referenced in different iterations of the loop. At the compiler's discretion, a loop may be parallelized if any assumed inhibiting loop-carried dependencies can be resolved by run-time dependency testing.
- Expose the trip count of loops whenever possible; use constants where the trip count is known and save loop parameters in local variables.
- Avoid placing structures inside loop bodies that the compiler may assume to carry dependent data, for example, procedure calls, ambiguous indirect references or global references.
Auto-parallelization Data Flow
- Data flow analysis:Computing the flow of data through the program.
- Dependency analysis:Computing the dependency analysis for references in each loop nest.
- High-level parallelization:Analyzing the dependency graph to determine loops that can execute in parallel, and computing run-time dependency.
- Data partitioning:Examining data reference and partition based on the following types of access:SHARED,PRIVATE, andFIRSTPRIVATE.
- Multithreaded code generation:Modifying loop parameters, generating entry/exit per threaded task, and generating calls to parallel run-time routines for thread creation and synchronization.