| Last Modified On : | October 23, 2008 11:21 AM PDT |
Rate |
|
by Max Alt
Intel® Solution Services
With maturing compiler technologies, compile-time analysis can be a very powerful tool for optimizing code on any architecture. In combination with run-time performance analysis, the compiler became the most effective tuning tool. The methods described in this paper can be applied to any parallel architecture with multiple microprocessor execution units.
In the era of parallel and platform computing, we rely less on single execution unit performance, assuming that the compiler maximized each component. It now becomes more important to evaluate performance of the platform as a whole, distributing the processing requests and embracing techniques of compiler usage in maximizing single execution unit performance.
Knowing coding techniques, writing good code and having a good compiler takes care of two latter issues, however, once the code is running, a developer always raises questions. What computing capacity does the code implementation have? What hardware features would increase that capacity? And how would the code's runtime performance be impacted if the hardware were different?
There are compile-time optimization techniques, which allow a developer to estimate and improve performance without running the program (perhaps with slight guidance from runtime tools), there are also techniques which allow developers to estimate performance on similar architectures without using simulators. This paper discusses the philosophy behind key performance parameters of the Intel® Itanium® architecture and their functions which factor in the performance modeling formulas.
A slightly different angle on performance modeling and analysis of parallel architectures. The paper explores a compile-time based approach for predicting performance of compute intensive code blocks on Intel Itanium architecture. The author uses simulator runs to validate the correctness and feasibility of proposed techniques. Keeping that in mind, there is a possibility of using discrete simulator runs for refining and guiding the result interpretation from compile-time based performance modeling approximation approaches.
