Schedule Instructions Optimally on 64-Bit Intel® Architecture

Submit New Article

March 10, 2009 1:00 AM PDT



Challenge

Schedule instructions properly for optimal performance on the Intel® Itanium® processor. Optimal scheduling will minimize the chances of implicit stops or unexpected dispersal-related stalls.


Solution

Observe the following heuristics whenever possible, which are based on best-known methods for instruction scheduling on 64-bit Intel architecture:

  • Schedule the most restricted instructions early in the bundle. This lessens the chance that a generic subtype instruction will consume a port that is needed by a later, more restricted instruction.
  • In some cases, placing A-type instructions in I slots rather than M slots might achieve denser bundling. If this is done, place any I-type instructions (which must go in I slots) earlier in the issue group when possible. This way, the later instructions in I slots can be issued to available M ports. Since not all processors support this strategy (such as the Itanium processor), it is preferable to place A-type instructions in M slots.
  • Most floating-point load types can be issued to any of the four memory ports, not just M0 and M1. Control-speculation-related (advanced and check) and pair-floating-point loads are the exceptions that can only be issued to ports M0 and M1. When scheduling a mix of FP loads, advanced FP loads, integer loads, and lfetch instructions, ensure that regular FP loads are scheduled late in the issue group, so that if necessary, they can be issued to the M2 and M3 ports. This frees the M0 and M1 ports needed by lfetch instructions or more restrictive load types.
  • Avoid using nop.f. It risks unintended stalls due to outstanding long latency instructions. For example, a write to FPSR is a multiple-cycle operation. Any floating-point operation, including a nop.f, will stall until the write is completed.
  • The MFI template should not be necessary. On the Itanium® processor, MFI was a commonly used template to facilitate dual issue. There are many other dual-issue template pairs on the Itanium processor, so using this template should no longer be necessary.
    This item is a summary of conclusions that can be drawn from important points noted in the Intel® Itanium® Processor Reference Manual (link below). These guidelines are not applicable in all situations, and profiling should be used to guide the use of optimizations.

 


Source

Intel® Itanium® Processor Reference Manual