Performance analysis is an essential step in the development of HPC codes. It will even gain in importance with the rising complexity of machines and applications that we are seeing today. Many tools exist to help with this analysis, but the user is too often left alone with interpreting the results. In this tutorial we will provide a practical road map for the performance analysis of HPC codes and will provide users step by step advice on how to approach the optimization of their codes as well as on how to investigate observed performance bottlenecks in detail.
analysis
Tuning Phase of Threaded Application Development
Challenge
Develop a methodology for the tuning phase of the development cycle. The tuning phase increases performance incrementally where possible.
Analysis Phase of Threaded Application Development Cycle
Challenge
Develop a methodology for the analysis phase of the development cycle. Typically, the analysis stage for a threaded application involves profiling a serial application to determine regions of the application that are potential candidates for parallelization.
Event Based Sampling on Yocto Project* based Platforms
Debugging & Testing Phase of Threaded Application Development
Challenge
Develop a methodology for the debugging & testing phase of the development cycle. In most situations, the debug and testing stages of threading go hand in hand.
Structure the Design Phase of Threaded Application Development Cycle
Challenge
Develop a methodology for the design phase of the development cycle. Regions identified by the analysis phase are examined during the design phase to determine changes that must be made to accommodate a threading paradigm.
Apply Data Decomposition to Create Threaded Code
Challenge
Implement data decomposition on a serial function in order to produce a threaded version. The threaded version creates threads, each performing individual pieces of a computationally intensive operation.
Choose the Right Threading Model (Task-Parallel or Data-Parallel Threading)
Challenge
Choose task-level or data-parallel threading for various parts of an application. Choosing the right threading method minimizes the amount of time spent modifying, debugging, and tuning threaded code.
Solution
Describe your application (or an individual operation in that application) in terms of one of two models based on fit for the particular job:
Structure the Implementation Phase of Threaded Application Development
Challenge
Develop a methodology for the implementation phase of the development cycle. The implementation phase involves converting design issues to actual code by selecting an appropriate threading model.
