How to Optimize Code for the Most-Often Used Code Path


Overcome the limitation of optimizing compilers in terms of not knowing which code-execution path is most likely to be used. For example, an optimizer can refine a long series of if statements and have it run at great speed; but if it does not know that in the majority of runs, the very last test is the one that is run, the optimizer cannot rearrange the sequence for best possible performance. It has to work on the assumption that all if tests in the sequence are equally probable.


Use profile-guided optimizations (PGO), by compiling the program with the –prof_gen switch set. This will cause an instrumented executable to be generated. The instrumentation records data about the execution path of the program into a dynamic profile, which is stored as a separate file. Multiple runs of the program generate additional data for the profile. The original program is then recompiled using the –prof_use switch, which tells the compiler to examine the collected profiles to make optimization decisions:

PGO optimization enables the compiler to improve the following decisions:

  • Branch prediction - When a program is running on Intel® processors, various parts of the chip pre-execute code based on a variety of techniques that figure out which instructions are about to be executed. For the most part, this guessing is straightforward except as it regards conditional jumps. Here, the processor must guess as to whether the condition will be true. Current algorithms have a hit rate of more than 90%. If the guess is right, the pre-executed instructions from the correct branch are folded into the stream of retired instructions, their results mapped to the appropriate registers, and processing continues. If the guess is wrong, however (a situation known as a branch mispredict), all speculative processing is discarded, and the entire execution pipeline is stalled until the correct branched-to instruction is retrieved. Execution then starts up again at this new location. This near-halt in the execution process is extremely expensive. By using PGO, the compiler can know which path through if statements is most frequently taken and structure the code so that the processor can make better guesses at execution time.
  • Better register allocation - A technique by which compilers can really improve performance of code is to know which variables to keep in registers and which to keep in memory. Variables in registers can be accessed significantly faster. However, registers are a precious commodity and in some functions, there are not enough registers to store all the variables. Complex algorithms can guess which variables to store in registers; but all these methods suffer from not knowing which variables are used the most. Profiles provide this critical information and enable the most-accessed variables to be stored in registers.
  • Movement of code blocks - The location of code in executable files is the subject of numerous types of optimizations. Ideally, executable files would have all executable code in the same segment with every routine in close proximity to code it calls. In this way, branches could be short, and where they occur, the branched-to destination code might be in the processor's instruction cache (called the i-cache), meaning that fetching it could be done with no appreciable delay.



Profile Guided Optimization


Para obter informações mais completas sobre otimizações do compilador, consulte nosso aviso de otimização.