Guided Auto-Parallel (GAP)

Guided Auto-Parallel Overview
The guided auto-parallelization feature of the Intel® Compiler is a tool that offers selective advice resulting in better performance of serially coded applications.
The advice generated by the compiler typically falls under three broad categories: 

• Advice to use local-variable: the compiler advises you to make simple source changes that are localized to a loop-nest or a routine. For example, you may receive advice to use a local-variable for the upper-bound of a loop (instead of a class member) OR to initialize a local variable unconditionally at the top of the loop-body OR to add restrict keyword to pointer-arguments of a function definition (if it is appropriate). 

• Advice to apply pragmas: the compiler advises you to apply a new pragma on a certain loop-level if the pragma semantics can be satisfied (you have to verify this). In many cases, you may be able to apply the pragma (thus implicitly asserting new program/loop properties) that the compiler can take advantage of to perform enhanced optimizations. 

• Advice to add compiler options: the compiler advises you to add new command-line options that assert new properties.
The advice is specific but optional; you can either implement it or reject it. To receive this advice all you need to do is use the -guide [Linux* and Mac OS* X] or the /Qguide [Windows*] set of compiler. The compiler does not generate any object files or executables during the guided auto-parallelization run.

Use the -guide or /Qguide options in addition to your normally used compiler options. The compiler advice targets the optimizations enabled at the chosen optimization level. If you decide to take the advice suggested by the guided auto-parallelization compilation run, then make the suggested code changes or use the suggested compiler options and recompile the program, this time without using the -guide or /Qguide options. The performance of your program should improve.

Use guided auto-parallelization along with auto-parallelization when you have serial code you wish to parallelize using the auto-parallelization options [-parallel or /Qparallel] and also wish to get advice on further parallelizing opportunities that the guided auto-parallelization may suggest.

Use guided auto-parallelization without enabling auto-parallelization when you are interested in improving the performance of your single-threaded code or when you want to improve the performance of your applications with explicit threading without relying on the compiler for auto-parallelization.

Preparing the project to run Guided Auto Parallel (GAP)
1) Convert project to use Intel C++ projects
2) Change configuration to “Release”. 
a. GAP only works with /O2 or higher optimization.
3) After conversion go to menu -> Build -> Clean All

 Project Conversion

Figure 1:  Convert to using Intel compiler project.

Running Guided Auto Parallel (GAP)
There are several ways to invoke Guided Auto Parallel (GAP) in the IDE, depending on whether you want analysis for the whole solution, the project, a single file, a function, or a range of lines source code. For the purpose of this tutorial, we will use single file analysis.

1) Select scalar_dep.cpp right click -> Intel C++ Composer XE -> Guide Auto Parallel -> Run Analysis on file “scalar_dep.cpp”
a. Click Run Analysis in the Configure Analysis dialog box.

Figure 2:  Run Guided Auto-Parallel Analysis

Figure 3:  Configuring Analysis

Viewing the results from Guided Auto Parallel (GAP)
The output generated by GAP analysis can be view in the standard Output Window of the IDE, or in the “Error List” window filtered by “Messages”.  Note that GAP message in the standard Output Window are encapsulated between “GAP REPORT LOG OPENED” and “END OF GAP REPORT LOG”.

Figure 4 GAP messages in IDE standard Output Window.

Figure 5 GAP message in Error List window filtered by Messages.

User can also redirect GAP output to a file. To output GAP messages to a file by check the box “Send remarks to a file”

Figure 6 Add option to output GAP messages to a file.

Note that GAP messages will not be available in the IDE standard Output Window or Error List Window if this option is enabled.

Analyzing GAP messages
Analyze the output generated by GAP analysis and determine whether or not the specific suggestion(s) provided by GAP is appropriate for specified source code.
For this sample tutorial, GAP generates the following output for the following loop at line 49 of scalar_dep.cpp:

for (i=0; i<n; i++) {
if (A[i] > 0) {b=A[i]; A[i] = 1 / A[i]; }
if (A[i] > 1) {A[i] += b;}

1>GAP REPORT LOG OPENED ON Tue Jun 29 12:13:54 2010
1>remark #30761: Add -Qparallel option if you want the compiler to generate recommendations for improving auto-parallelization.
1>C:\gap_test\test\scalar_dep.cpp(49): remark #30515: (VECT) Loop at line 49 cannot be vectorized due to conditional assignment(s) into the following variable(s): b. This loop will be vectorized if
the variable(s) become unconditionally initialized at the top of every iteration. [VERIFY] Make sure that the value(s) of the variable(s) read in any iteration of the loop must have been written earlier
in the same iteration.
1>Number of advice-messages emitted for this compilation session: 1.

By default, the compiler will generate a remark #30761 to enable auto parallelization to generate recommendation for improving auto-parallelization. Remark #30515 indicates if variable b can be unconditionally assigned, the compiler will be able to vectorize the loop.

To get GAP advice for parallelization, enable parallelization (/Qparallel) and rerun the GAP analysis.

figure 7: Enabling Parallelization (/Qparallel).

1>GAP REPORT LOG OPENED ON Tue May 18 11:42:58 2010
1>C:\test\scalar_dep.cpp(49): remark #30521: (PAR) Loop at line 49 cannot be parallelized due to conditional assignment(s) into the following variable(s): b. This loop will be parallelized if the
variable(s) become unconditionally initialized at the top of every iteration. [VERIFY] Make sure that the value(s) of the variable(s) read in any iteration of the loop must have been written earlier in
the same iteration.
1>C:\test\scalar_dep.cpp(49): remark #30525: (PAR) If the trip count of the loop at line 49 is greater than 188, then use "#pragma loop count min(188)" to parallelize this loop. [VERIFY] Make
sure that the loop has a minimum of 188 iterations.
1>Number of advice-messages emitted for this compilation session: 2.


The remark #30521 indicates that loop at line 49 cannot parallelize because the variable b is conditionally assigned, and remark #30525 indicates that the loop trip count must be greater than 188 for the compiler to parallelize the loop.

The user needs to verify if the changes recommend by GAP are appropriate and do not change the semantics of the program.  Apply the necessary change(s) and re-compile the source file.  For this loop, we made the following changes to enable parallelization and vectorization of the loop as recommended by GAP:


#pragma loop count min (188)

for (i=0; i<n; i++) {
b = A[i];
if (A[i] > 0) {A[i] = 1 / A[i];}
if (A[i] > 1) {A[i] += b;} 

To verify that the loop was parallelize and vectorized:

1)   For the code that GAP provides a message(s), verify with the vectorizer report or parallel report after applying the change(s) provided by GAP.
2)   Add the option /Qvec-report1  /Qpar-report1 to the Additional Linker command line options dialog box.  (Note that in Visual Studio /GL is on by default which will enable /Qipo if you are using Intel compiler.  If /Qipo is not enable, the option should be added to the C/C++ Additional Options).
3)   Recompile without Guided Auto Parallel.

Figure 8 Add /Qvec-report1 /Qpar-report1 to get parallelization and vectorization reports.

The output window will report that a function call at line 23 in main.cpp was vectorized and parallelized.  The reason is because /Qipo (inlining across multiple file) was enabled, and the function with the loop in scalar_dep.cpp was inline at the call in main.cpp, resulting in the report that line 23 was vectorize and parallelize automatically by the compiler.

Adding parallelism to serial application can be difficult. Intel compiler with Guide Auto-Parallel feature provides a low cost and effective tool to add parallelism to your application.

Optimization Notice in English