by Shawn D. Casey
Hyper-Threading Technology from Intel allows one physical processor package to be perceived as two separate logical processors within the operating system. Processor resources enabled for Hyper-Threading Technology duplicate, tag, or share the majority of resources. Sharing resources allows a more efficient use of the processor for a significant performance increase, at less than 5% die size and power consumption increase compared to a single processor package. However, Hyper-Threading Technology cannot have performance expectations equivalent to that of multi-processing where all the processor resources are replicated.
Measured performance on the Intel® Xeon® processor MP with Hyper-Threading Technology shows performance gains of up to 30% on common server application benchmarks for this technology¹. There are several dependencies reflected in the speedup demonstrated by any benchmark with Hyper-Threading Technology, and the amount of gain can vary widely. However, a gain of 30% or more within a parallel section with is considered acceptable.
This paper explains how to calculate Hyper-Threading Technology effectiveness, a derived quantity describing the effectiveness of Hyper-Threading Technology while taking the scalability of the workload into account.
1Intel Technology Journal, Volume 6, Issue 1 p.11
There is a misconception that equal performance on two workloads means equal Hyper-Threading Technology effectiveness. This doesn't give the full picture, since the amount of performance achievable is unknown.
Which application is more effective with Hyper-Threading Technology? Application A has a 5% speedup with Hyper-Threading Technology while application B has a 7% speedup.
Indeterminate. Take the case where application A has a 90% speedup with dual-processors and application B only has a 10% increase with dual-processors. Application A had much more potential to scale with Hyper-Threading Technology but failed to take advantage of it. Application B, on the other hand, only had 10% potential and took advantage of most of that with Hyper-Threading Technology. Calculating Hyper-Threading Technology effectiveness solves this problem.
The two dependencies on the performance speedup from any multi-processing system according to Amdahl's Law² are the amount of parallelism in the application/workload and the speedup of the parallel parts.
Speedupoverall = Overall speedup of application/workload
with enhanced sections of code.
Fractionenhanced = Portion of code that has be en enhanced or
made parallel. (0.0 <= Fractionenhanced <= 1.0)
Amdahl's Law basically states that the amount of performance gain is driven by the time spent in the parallel region of the application and how fast you can make the code run in those regions. In a dual processor system, this formula becomes the more specific:
DP ScalingMax = Maximum possible dual-processor scaling (ignoring
unusual super-linear cases), giving the amount of parallelism.
Synchronization overhead is included in speedup value of 2.
ParallelObserved = Amount of parallel activity that occurs in the
1Computer Architecture: A Quantitative Approach, 2nd Edition, §1.6"
Since Hyper-Threading Technology is a form of multi-processing, similar techniques that are used to determine the effectiveness of multi-processing can be used to determine the effectiveness of Hyper-Threading Technology.
If we assume that to be effective with Hyper-Threading Technology, a 30% gain within a parallel section is needed, we can modify Equation 1 and set SpeedupEnhanced = 1.3 to determine Hyper-Threading Technology Scaling Effective if the amount of parallel activity is known (Equation 3).
Hyper-Threading Technology Scaling Effective = An effective Hyper-Threading Technology scaling value assuming a 30% acceptable speedup within the parallel sections (which includes synchronization overhead).
ParallelObserved = Amount of parallel activity that occurs in the application/workload pair.
Performance measurements usually do not reflect how parallel a particular application is. The derivation of Hyper-Threading Technology effectiveness eliminates the need of knowing how parallel an application is by empirically gathering that information from performance measurements. To determine an application/workload Hyper-Threading Technology effectiveness, the following measurements are needed:
Performance measurements denote a metric where higher values are more favorable. In cases where elapsed time i s the performance metric, the reciprocal of time is used. The most important consideration is that the application/workload pair performs the same amount of work on all configurations.
From the measurements, we can calculate the following:
Hyper-Threading Technology effectiveness is defined as:
HT ScalingObserved = Measured Hyper-Threading Technology Scaling (includes synchronization overhead).
HT ScalingEffectiveness = How effective an application/workload pair is with Hyper-Threading Technology.
We know from our measurements what the HT ScalingObserved is, but HT ScalingEffective in Equation 3 is still in terms the degree of parallelism in the application/workload. Assuming the parallel sections of the applications scale at 2, we can solve for Parallel Observed from Equation 2 in terms of DP ScalingObserved as shown in Equation 6.
Parallel Observed from Equation 6 can then be used in Equation 3 to calculate HT ScalingEffective in terms of DP Scaling measured as shown in Equation 7.
Using Equation 5 and Equation 7 together, Hyper-Threading Technology effectiveness can now be represented completely in terms of Hyper-Threading Technology scaling and DP scaling.
A Hyper-Threading Technology Effectiveness of 1.0 is desirable, meaning that the application is achieving what is typical based on measured performance from common benchmarks. Anything less than one represents an undesirable effectiveness that should be investigated from the application's perspective using performance analyzer tools such as the Intel VTune™ Performance Analyzer.
For an application/workload that takes 40.0s to complete on a single-processor system, 25.0s to complete on a dual-processor s ystem, and 35.0s on Hyper-Threading Technology enabled single-processor system, what is the Hyper-Threading Technology Effectiveness? Is it acceptable? What if the Hyper-Threading Technology system completed its task in 30s?
First, calculate the HT Scaling Observed,
Next, calculate the DP Scaling Observed,
Now, calculate Hyper-Threading Technology Effectiveness,
This application/workload combination is not very effective. However, if the Hyper-Threading Technology system took 30s to complete, then,
Which would result in the following Hyper-Threading Technology Effectiveness,
When the Hyper-Threading Technology system completes in 30s, the application/workload has a terrific Hyper-Threading Technology Effectiveness.
So, understanding the effectiveness of your application running on a Hyper-Threading Technology enabled system takes a little more than just comparing performance numbers. As we've seen in the previous sections, however, there are some straightforward things you can do to really understand how much your application/workload can gain from using Hyper-Threading Technology. See the Resources below for additional information.
For information on taking advantage of Hyper-Threading Technology in developing your applications, visit our community for Parallel Programming.
Computer Architecture: A Quantitative Approach, 2nd Edition, Patterson, David A. and Hennessy, John L.
Shawn Casey is a Senior Application Engineer with Intel's Software and Solutions Group.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804