Prepare applications for optimization on the Intel® Itanium® processor family. The first issue in getting high performance code on Itanium-based systems is to get the code ported or written to run correctly in the 64-bit environment. It is not uncommon for code that functions correctly in a 32-bit environment to have latent bugs that will be exposed when the code is moved to a 64-bit environment.
Code that is cleaned of such problems will likely benefit in both a 32 bit and a 64-bit environment. It will be easier to read and maintain, and because your compiler may not need to make as many assumptions about cleaned code, it may even be able to do a better job optimizing it.
Understand your applications performance characteristics at system and application levels. By system level performance, we mean making sure that the platform (CPU, memory, disk drives, graphics cards, network cards, etc.) is performing appropriately for the application. In particular, because the Itanium architecture has new instructions and alignment considerations, it is possible that a system configuration for a program running on an Itanium-based system might be different from the configuration running the same program on IA-32.
There are many system-level tools that can be useful in system-level performance-optimization work. These tools can help characterize your application and provide basic performance information. Some very useful tools are Perfmon*, NetMon*, Pview* and vmstat*. For more information on NetMon, see the NetMon.chm help file that is installed with the Microsoft Platform SDK. For more information on Pview, look at the MSDN library article at MSDN Library - PView Overview*. Finally, for more information on vmstat, go to http://linuxcommand.org/man_pages/vmstat8.html*.
In terms of application-level considerations, the developer should make every effort to understand the basic nature of the application. Is it inherently IO bound? Is it memory bound? Is it CPU bound? It is important to understand the basic memory requirements of an application. The data footprint of an application will expand in 64-bit code, due to the differences in basic data sizes. In both Unix* and Windows* operating systems, the data-type pointer is 64 bits wide, so any pointer data is twice as big as on 32-bit systems.
Furthermore, on Unix systems, the data type long also grows to 64 bits, making it twice as big as it would be on 32-bit systems. Because of this, the basic data footprint of an Itanium-based application may be much larger compared to the equivalent 32-bit version. In addition, the size of the text portion of the application (the actual instructions) may also be larger. The net result is that an application that has been tuned to a specific memory configuration on 32-bit machines may require larger memory systems for the 64-bit version, or other tuning will need to be done to reduce its memory footprint.
If an application is IO bound or memory bound, then compiler optimizations and sophisticated microarchitecture features will not improve your application's performance. In general, we want to remove IO and memory bottlenecks to drive the application toward being CPU bound. Once the application is CPU bound, compiler optimizations, libraries, and the power of the architecture can aid you in achieving maximum performance.
There are a number of products available for determining the performance characteristics of an application. One which has been particularly useful is the Intel® VTune™ Performance Analyzer, which can isolate performance bottlenecks on both IA-32 and Itanium-based systems. Information on the VTune Analyzer and other software-development tools from Intel can be found at the Intel Software Development Products Web site.
An Introduction to the Optimization of Applications for Itanium® Processors