Developer Guide and Reference

Contents

Profile-Guided Optimization via HW counters

A lightweight profiling mechanism is available that can be used to achieve many of the benefits of instrumentation based profiling, but without the overhead of inserting instrumentation into the application binary. This mode of operation can be beneficial in cases where increase in code/data size or changes in run time due to instrumentation may make regular Performance-Guided Optimization (PGO) infeasible. This approach requires the use of Intel® VTune™ Amplifier to collect information from the hardware counters. The information is collected with minimal overhead, and combined with debug information produced by the compiler to identify the primary code path for optimizations.
Follow these steps to use this method:
Phase 1: Compile the application with the option
prof-gen-sampling
.
This option will instruct the compiler to generate additional debug information for the application, which is used to map the information collect by the hardware counters to specific source code. However, use of the option does not affect the generated instruction sequence in the way instrumented PGO would. Optimizations may be enabled during this build, however it is recommended to disable function inlining during this build.
Phase 2: Run the generated executable on one or more representative workloads with the Intel VTune Amplifier tool:
<installation-root>
/bin64/amplxe-pgo-report.sh
<your application and command line>
Additional information regarding options for data collection can be found in the Intel VTune Amplifier documentation. This step will generate files of the form
r
NNN
pgo_icc.pgo
(where
NNN
is a 3 digit number) which will be used as input to the following phases.
Phase 3: (optional) Merge the report files produced during phase 2.
The tool
profmergesampling
can be used to produce an indexed file of results that will speed up the processing of the data during the next phase.
profmergesampling -file
<input-file[:input_file]*>
-out
<output_name>
Phase 4: Compile the application with the option
prof-use-sampling:
input-file[:input_file]*
In phase 4, one or more result files produced during phase 2 (or an indexed file from phase 3) can be fed into the compiler to direct the optimizations.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804