| March 10, 2009 1:00 AM PDT | |
Remove bank conflicts from high-level loops. Removing bank conflicts is, in many cases, rather simple. Consider the following double-precision matrix multiply in Fortran:
Do k=1,MAX |
Unroll the inner loop and then interlace the unrolled lines. The first thing to improve performance is to unroll the inner loop.
Do k=1,MAX |
While this coding improves the performance significantly, it may result in many bank conflicts caused by loading successive elements in memory during the same cycle. The simple solution is to interleave the unrolled lines as follows:
Do k=1,MAX |
This change results in separating the loads from the same banks (adjacent addresses in memory) by at least a cycle, thus removing the bank conflict. In the case of single-precision data (four bytes), a greater degree of unrolling (over j) and a more complex interleaving is required to remove the bank conflicts. In a real measurement of the above example, interleaving improves performance by more than 10%.
Introduction to Microarchitectural Optimization for Itanium® Processors
For more complete information about compiler optimizations, see our Optimization Notice.

