Performance analysis is an essential step in the development of HPC codes. It will even gain in importance with the rising complexity of machines and applications that we are seeing today. Many tools exist to help with this analysis, but the user is too often left alone with interpreting the results. In this tutorial we will provide a practical road map for the performance analysis of HPC codes and will provide users step by step advice on how to approach the optimization of their codes as well as on how to investigate observed performance bottlenecks in detail.
Develop a methodology for the analysis phase of the development cycle. Typically, the analysis stage for a threaded application involves profiling a serial application to determine regions of the application that are potential candidates for parallelization.
Develop a methodology for the design phase of the development cycle. Regions identified by the analysis phase are examined during the design phase to determine changes that must be made to accommodate a threading paradigm.
Choose task-level or data-parallel threading for various parts of an application. Choosing the right threading method minimizes the amount of time spent modifying, debugging, and tuning threaded code.
Describe your application (or an individual operation in that application) in terms of one of two models based on fit for the particular job: