Intel® Fortran Compiler Classic and Intel® Fortran Compiler Developer Guide and Reference

ID 767251
Date 9/08/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Profile-Guided Optimization via Hardware Counters

A lightweight profiling mechanism can be used to achieve many of the benefits of instrumentation-based profiling, but without the overhead of inserting instrumentation into the application binary. This mode of operation can be beneficial in cases where increased code/data size or changes in runtime due to instrumentation may make regular Performance-Guided Optimization (PGO) infeasible. This approach requires the use of Intel® VTune™ Profiler to collect information from the hardware counters. The information is collected with minimal overhead, and combined with debug information produced by the compiler to identify the primary code path for optimizations.

Follow these steps to use this method:

  1. Compile the application with the option prof-gen-sampling.

    This option instructs the compiler to generate additional debug information for the application, which is used to map the information collected by the hardware counters to a specific source code. Using this option does not affect the generated instruction sequence in the way that instrumented PGO does. Optimizations may be enabled during this build, but it is recommended that you disable function inlining.

  2. Run the generated executable on one or more representative workloads with the Intel VTune Profiler tool:
    <installation-root>/bin64/amplxe-pgo-report.sh <your application and command line>

    Additional information regarding options for data collection can be found in the Intel VTune Profiler documentation. This step generates files in the form: rNNNpgo_icc.pgo (where NNN is a three digit number). These files are used as input in the next steps.

  3. Merge the report files produced during step 2.

    The tool profmergesampling can be used to produce an indexed file of results that speeds up processing data during the next step.

    profmergesampling -file <input-file[:input_file]*> -out <output_name>
  4. Compile the application with the option prof-use-sampling:input-file[:input_file]*

    In this step, one or more result files produced during step 2 (or an indexed file from step 3) can be fed into the compiler to direct the optimizations.

See Also