Develop a methodology for the tuning phase of the development cycle. The tuning phase increases performance incrementally where possible.
Determine whether performance degradation (or lower-than-expected performance benefit) from Hyper-Threading Technology is due to exceeding the write-combining buffer capacity. A write-combining (WC) store buffer accumulates multiple stores in the same cache line before eventually writing the combined data farther out into the memory hierarchy, to accelerate processor write performance.
Choose task-level or data-parallel threading for various parts of an application. Choosing the right threading method minimizes the amount of time spent modifying, debugging, and tuning threaded code.
Describe your application (or an individual operation in that application) in terms of one of two models based on fit for the particular job:
Evaluate instructions-retired data in conjunction with performance data to examine the correctness of threading methodology. The Instructions Retired processor event in the VTune™ Performance Analyzer is a key performance indicator. Instructions Retired can give you quick insight into possible performance problems in your application.