Challenge
Develop a methodology for the design phase of the development cycle. Regions identified by the analysis phase are examined during the design phase to determine changes that must be made to accommodate a threading paradigm.
Develop a methodology for the design phase of the development cycle. Regions identified by the analysis phase are examined during the design phase to determine changes that must be made to accommodate a threading paradigm.
Create a summary of performance issues related to Hyper-Threading Technology for an application. This information is a starting point to get an idea as to what optimizations and source code changes will make the biggest difference.
Ascertain whether the level of performance improvement from Hyper-Threading Technology for a specific application is acceptable. There is a misconception that equal performance on two workloads means equal Hyper-Threading Technology effectiveness. This doesn't give the full picture, since the amount of performance achievable is unknown.
Determine whether performance degradation (or lower-than-expected performance benefit) from Hyper-Threading Technology is due to exceeding the write-combining buffer capacity. A write-combining (WC) store buffer accumulates multiple stores in the same cache line before eventually writing the combined data farther out into the memory hierarchy, to accelerate processor write performance.
Implement data decomposition on a serial function in order to produce a threaded version. The threaded version creates threads, each performing individual pieces of a computationally intensive operation.
Choose task-level or data-parallel threading for various parts of an application. Choosing the right threading method minimizes the amount of time spent modifying, debugging, and tuning threaded code.
Describe your application (or an individual operation in that application) in terms of one of two models based on fit for the particular job:
Evaluate instructions-retired data in conjunction with performance data to examine the correctness of threading methodology. The Instructions Retired processor event in the VTune™ Performance Analyzer is a key performance indicator. Instructions Retired can give you quick insight into possible performance problems in your application.