Identify the source of performance degradations or low performance gains of applications running on systems that support Hyper-Threading Technology. Once applications have been tuned for the Pentium® 4 processor, they can be tuned for processors that support Hyper-Threading Technology as a separate process. In some cases, however, the tuning process may not yield acceptable increases in performance.
Verify that the issue is related to Hyper-Threading Technology, and then root-cause it by means of the VTune™ Performance Analyzer. This analysis follows a standard, five-step methodology:.
- Assuming that the performance is not as expected on processors with Hyper-Threading Technology, the next step is to review the Intel® Pentium® 4 and Intel® Xeon® Processor Optimization Manual and the [http://shareit.intel.com/cd/ids/developer/asmo-na/eng/technologies/threading/hyperthreading/index.htm] white papers on Hyper-Threading Technology that are available on the Intel® Developer Services Web site. These resources can be used to identify known Hyper-Threading Technology optimization opportunities and coding pitfalls that may still be part of the application.
- Assuming that the performance is still not as expected, the next step is to narrow the scope of interest to a Hyper-Threading Technology-enabled processor performance issue. You should gather performance results from the following types of systems:
- a single-processor system with a uni-processor kernel
- a single-processor system with a multi-processor kernel
- a single-processor system with Hyper-Threading Technology enabled and a multi-processor kernel
- a dual -processor system with a multi-processor kernel.
Comparing these performance results, verify that the performance degradation is not a multi-processor issue. Verify that the dual Pentium 4 processor system performance is as expected and exceeds single Pentium 4 processor without Hyper-Threading Technology enabled. If not, or if the performance gain is very low, then the tuning effort should follow the standard SMP tuning methodology.
- Next, verify that the single Pentium 4 processor with multi-processor kernel degrades less than 5% versus a single Pentium 4 processor uni-processor kernel. Note that single threaded (or effectively single-threaded) applications may actually degrade due to multi-processor kernel overhead not required for uni-processor kernels.
- Finally, verify that the performance on Hyper-Threading Technology-enabled processors degrades versus a single Pentium 4 processor with uni-processor kernel.
- Assuming reasonable SMP performance but degraded performance on Hyper-Threading Technology-enabled processors, the next step is to root-cause the performance degradation using the VTune Performance Analyzer.
Use the VTune Performance Analyzer tuning assistant feature, sometimes referred to as Automatic Hotspot Analysis, for Hyper-Threading Technology-enabled processors. The tuning methodology and support for data collection will guide the user as to what events are significant to collect initially and what are reasonable event-ratio expectations. In addition to the data collected for Hyper-Threading Technology-enabled processors, the same data should be collected on single Pentium 4 processor systems without Hyper-Threading enabled and dual Pentium 4 processor systems.
Comparing the time in clock ticks between systems can narrow the scope of where processor time is being spent. Then it is a matter of understanding what is causing the difference in clock ticks between the various platforms using the other recommended processor events.