The auto-parallelization feature of the Intel® C++ Compiler automatically translates serial portions of the input program into equivalent multithreaded code. Automatic parallelization determines the loops that are good worksharing candidates, performs the dataflow analysis to verify correct parallel execution, and partitions the data for threaded code generation as needed. The auto-parallelization functionality provides the performance gains from shared memory on multiprocessor and dual core systems.
The auto-parallelizer analyzes the dataflow of the loops in the application source code and generates multithreaded code for those loops which can safely and efficiently be executed in parallel.
This behavior enables the potential exploitation of the parallel architecture found in symmetric multiprocessor (SMP) systems.
Automatic parallelization frees developers from having to:
During compilation, the compiler automatically attempts to deconstruct the code sequences into separate threads for parallel processing. No other effort is needed.
Using this option enables parallelization for both Intel® microprocessors and non-Intel microprocessors. The resulting executable may get additional performance gain on Intel® microprocessors than on non-Intel microprocessors. .
Serial code can be divided so that the code can execute concurrently on multiple threads. For example, consider the following serial code example.
Example 1: Original Serial Code |
---|
|
The following example illustrates one method showing how the loop iteration space, shown in the previous example, might be divided to execute on two threads.
Example 2: Transformed Parallel Code |
---|
|
Auto-vectorization detects low-level operations in the program that can be done in parallel, and then converts the sequential program to process 2, 4, 8, or (up to) 16 elements in one operation, depending on the data type. In some cases, auto-parallelization and vectorization can be combined for better performance results.
The following example demonstrates how code can be designed to explicitly benefit from parallelization and vectorization.
Example |
---|
|
Compiling the example code with the correct options, the compiler should report results similar to the following:
vectorization.c(18) : (col. 6) remark: LOOP WAS VECTORIZED. vectorization.c(16) : (col. 3) remark: LOOP WAS AUTO-PARALLELIZED.