Eliminate unpredictable conditional branches in code. Eliminating these branches improves performance because it does the following:
- Reduces the possibility of mispredictions
- Reduces the number of required branch target buffer (BTB) entries; conditional branches that are never taken do not consume BTB resources
Consider a line of C code that has a condition dependent upon one of the constants:
X = (A < B) ? CONST1 : CONST2;
This code conditionally compares two values, A and B. If the condition is true, X is set to CONST1; otherwise, it is set to CONST2. An assembly code sequence equivalent to the above C code can contain branches that are not predictable if there is no correlation in the two values.
The following example code shows assembly code with unpredictable branches:
cmp A, B ; condition jge L30 ; conditional branch mov ebx, CONST1 ; ebx holds X jmp L31 ; unconditional branch L30: mov ebx, CONST2 L31:
Use the setcc and cmov instructions to eliminate unpredictable conditional branches where possible. Do not do this for predictable branches. Do not use these instructions to eliminate all unpredictable conditional branches, because using these instructions will incur execution overhead due to executing both paths of a conditional branch. In addition, converting conditional branches to cmovs or setcc trades control-flow dependence for data dependence and restricts the capability of the out-of-order engine. When tuning, note that all 32-bit Intel® processors have very high branch-prediction rates. Consistently mispredicted branches are rare. Use these instructions only if the increase in computation time is less than the expected cost of a mispredicted branch.
There are four principal ways of eliminating branches:
- arrange code to make basic blocks contiguous
- unroll loops
- use the cmov instruction
- use the setcc instruction
The following optimized code sets ebx to zero, then compares A and B. If A is greater than or equal to B, ebx is set to one. Then ebx is decreased and “and-ed” with the difference of the constant values. This sets ebx to either zero or the difference of the values. By adding CONST2 back to ebx, the correct value is written to ebx. When CONST2 is equal to zero, the last instruction can be deleted.
xor ebx, ebx ; clear ebx (X in the C code) cmp A, B setge bl ; When ebx = 0 or 1 ; OR the complement condition sub ebx, 1 ; ebx=11...11 or 00...00 and ebx, CONST3 ; CONST3 = CONST1-CONST2 add ebx, CONST2 ; ebx=CONST1 or CONST2
Another way to remove branches on Intel® Pentium® II processors and later is to use the cmov and fcmov instructions. The following example shows changing a test and branch instruction sequence using cmov and eliminating a branch. If the test sets the equal flag, the value in ebx will be moved to eax. This branch is data-dependent, and is representative of an unpredictable branch.
test ecx, ecx jne 1h mov eax, ebx 1h: ; To optimize code, combine jne and mov into one cmovcc ; instruction that checks the equal flag test ecx, ecx ; test the flags cmoveq eax, ebx ; if the equal flag is set, move ; ebx to eax - the lh: tag no longer ; needed
The cmov and fcmov instructions are available on the Intel® Pentium® II processors and later, but not on Intel® Pentium® processors and earlier 32-bit Intel® processors. Be sure to check whether a processor supports these instructions with the cpuid instruction.