Performance analysis is an essential step in the development of HPC codes. It will even gain in importance with the rising complexity of machines and applications that we are seeing today. Many tools exist to help with this analysis, but the user is too often left alone with interpreting the results. In this tutorial we will provide a practical road map for the performance analysis of HPC codes and will provide users step by step advice on how to approach the optimization of their codes as well as on how to investigate observed performance bottlenecks in detail. We will cover both on-node performance and communication optimization. Throughout this tutorial will show live demos using Open|SpeedShop, a comprehensive and easy to use performance analysis tool set, to demonstrate the individual analysis steps. All techniques will, however, apply broadly to any tool and we will point out alternative tools where useful.
For more complete information about compiler optimizations, see our Optimization Notice.