Introduction to Hyper-Threading Technology

Using this Tutorial

 

System Requirements


  • Internet Connectivity: 56 kbps or greater.

  • Browser: Internet Explorer* 4 / Netscape Communicator* 4.7 or higher (IE 5 recommended) with JavaScript* enabled and Flash Player* version 4 or higher installed. If you can see the animation shown here, you have the Flash Player version 4 or higher. If not, you can download it for free from www.adobe.com*.

  • Minimum Screen Resolution: 1024x 768

  • Hardware:

    • A computer system with an Intel® processor supporting Hyper-Threading Technology

    • A chipset and BIOS that support Hyper-Threading Technology


  • OS:

    • An operating system that includes optimizations for Hyper-Threading Technology


Performance will vary depending on the specific hardware and software you use. See http://www.intel.com/products/ht/hyperthreading_more.htm for more information.

 


Course Introduction

 

Course Description and Objectives

Description: This course describes Hyper-Threading Technology (HT Technology) and introduces the reader to the key aspects and benefits.

Objectives: After completing this course, you will


  • Understand what HT Technology is and be able to identify its key aspects

  • Be able to identify which processor resources are shared, partitioned, and replicated for HT Technology

  • Know how to prepare code for HT Technology

  • Know which Intel® processors support Hyper-Threading Technology and where it can be applied

 


Understanding Hyper-Threading Technology

 

Overview and Lesson Objectives

In this lesson, you will learn


  • The definition of Hyper-Threading Technology (HT Technology) and important related terms

  • How HT Technology improves performance of software applications




When you finish this lesson, you will


  • Know key aspects of HT Technology, such as how it simultaneously processes two threads of code in a single physical package

  • Know to what extent resources are under-utilized in the Intel® NetBurst® microarchitecture by a typical mix of code and how Hyper-Threading Technology improves resource utilization

 

Hyper-Threading Technology Defined
Hyper-Threading Technology (HT Technology) is ground breaking technology that allows processors to work more efficiently. This technology enables the processor to execute two series, or threads, of instructions at the same time, thereby improving performance and system responsiveness.

Many of today's operating systems and applications are multi-threaded for use on multi-processor systems, providing higher performance. In these systems, separate processors and supporting hardware and firmware schedule and execute code in parallel.

With HT Technology, a single processor can simultaneously process two threads of code, improving the performance of multi-threaded code running on a single processor.

HT Technology is based on the inherent performance enhancements of the Intel NetBurst® microarchitecture.

Click the 'On' button to see how Hyper-Threading Technology works.

 

Important Terms Related to Hyper-Threading Technology
Hyper-Threading Technology enables a single processor to run two multi-threaded code threads simultaneously. The following terms are important to know when learning about Hyper-Threading Technology.

Intel® Pentium® 4 processor
With a PC based on the Intel® Pentium® 4 Processor with HT Technology you get advanced performance and multitasking capabilities for today's digital home and digital office applications.

Intel® Xeon® processor
This processor has all of the attributes of the Intel Pentium® 4 Processor with enhancements targeted to benefit the server market, including a larger cache, integrated L3 cache in the Intel Xeon processor MP, and additional circuits for server security.

Intel NetBurst® Microarchitecture
This new microarchitecture adds significant enhancements to the core architecture of the Intel Pentium® III Processor, including:


  • Change in execution stages

  • Changes in L1 code cache

  • Changes to System bus speed

  • Changes in maximum core speed

 

Concurrency
Concurrency is the simultaneous execution of multiple, structurally different application activities. Concurrency can include a system multitasking on the same activity or multitasking on multiple activities – for example, computation, disk access, and network access at the same time. Concurrency improves efficiency, reduces latency, and improves throughput. High concurrency reduces the amount of time an application spends idle. Concurrency is one of three metrics used by Intel® Solution Services, along with latency and throughput.

Multi-processor
More than one processor sharing the same system bus. The operating system (OS) must detect the presence of more than one physical processor.

Thread
A part of a program that can execute independently of other parts and concurrently with other parts on supporting operating systems. A thread usually has a many-to-one relationship with processes or applications (multi-threaded applications). Threads are specifically designed into software architecture and implemented through an operating system’s application programming interfaces.

Multi-threading
The ability to execute different parts of a program simultaneously. This can be done by distributing the processing burden across more than one processor, through Intel® Hyper-Threading Technology which provides multiple logical processors per physical processor package, or through dual core processors.

Hyper-Threading Technology
Circuitry added to a processor that enables it to appear as two logical processors, resulting in a single physical processor appearing like two logical processors to an operating system and multi-threaded application. Each logical processor can execute a thread of a multi-threaded program. Hyper-threading is Intel's simultaneous multi-threading design. It allows a single processor to manage data as if it were two processors by handling data instructions in parallel rather than one at a time. Hyper-Threading Technology is designed to improve system performance and efficiency.

Execution of threads
If multi-threaded programs are present, the OS can schedule the execution of threads (one thread of a multi-threaded program) to each processor. If the OS detects the presence of Hyper-Threading Technology supported processor(s), it can schedule threads to each logical and physical processor.


  • 1 to 8 Physical processors up to 8 Threads

  • 1 to 8 Hyper-Thread enabled processors = 16 Logical Processors = 16 Threads

Comparing Environments

Uni-Processor Systems
In a typical uni-processor system, applications can reside in memory with the operating system, but only one application can access the processor’s resources at any given moment. Code is executed sequentially.

 

Uni-Processor Hyper-Threading Technology Supported Systems
When the type of physical processor is detected as a Hyper-Threading Technology supported processor, the operating system can schedule threads to each logical processor similar to a dual processor system. With an HT Technology enabled processor, two code threads can simultaneously execute on one processor at the same time, allowing multiple applications to simultaneously access the processor resources. The performance of the system has improved without a second physical processor.


Dual-Processor Systems
With two processors, multi-threaded code can simultaneously execute on two processors, allowing two applications to simultaneously access processor resources. Code can sequentially execute along two parallel paths. This provides a significant amount of speed up in throughput.


Dual Processor Hyper-Threading Technology supported Systems
With two Hyper-Threading Technology supported processors, four code threads can simultaneously execute on two processors at the same time, allowing multiple applications to simultaneously access both processor resources. Code can sequentially execute along four virtually parallel paths. This can result in additional speedup.


Hyper-Threading Technology Benefits
Intel Hyper-Threading Technology (HT Technology) improves the utilization of onboard resources so that a second thread can be processed in the same processor. HT Technology provides two logical processors in a single processor package.

Hyper-Threading Technology offers


  • Improved overall system performance

  • Increased number of users a platform can support

  • Improved throughput, because tasks run on separate threads

  • Improved reaction and response time

  • Increased number of transactions that can be executed

  • Compatibility with existing IA-32 software


Code written for dual-processor (DP) and multi-processor (MP) systems is compatible with Intel Hyper-Threading Technology supported platforms. A Hyper-Threading Technology supported processor will automatically process multiple threads of multi-threaded code.

In addition, Intel Hyper-Threading Technology further increases performance as processors are added. Multi-processor systems with HT Technology can outperform multi-processor systems without Hyper-Threading Technology.

Click the 'Advance' button to see the effects of Hyper-Threading.


Utilization of Processor Resources
Intel® Hyper-Threading Technology improves performance of multi-threaded applications by increasing the utilization of the on-chip resources available in the Intel NetBurst® microarchitecture. The Intel NetBurst microarchitecture provides optimal performance when executing a single instruction stream. A typical thread of code with a typical mix of Intel IA-32-based instructions, however, utilizes only about 35 percent of the Intel NetBurst microarchitecture execution resources.

By adding the necessary logic and resources to the processor die in order to schedule and control two threads of code, HT Technology makes these underutilized resources available to a second thread of code, offering increased throughput and overall system performance.

Hyper-Threading Technology provides a second logical processor in a single package for higher system performance. Systems containing multiple HT Technology supported processors can expect further performance improvement, processing two code threads for each processor.

Click the 'Advance' button to see how Hyper-Threading utilizes resources.


Review
Intel® Hyper-Threading Technology provides two logical processors in one processor package by adding minimal logic that enables the use of underutilized execution resources on the chip. HT Technology offers higher throughput of multi-threaded code and greater overall system performance for today's demanding applications and multi-threaded operating systems.

 

Check Your Progress

 

Aspects of Hyper-Threading Technology


Check all that apply to Hyper-Threading Technology.


Improves overall system performance by processing multiple threads of a multi-threaded application

Two logical processors in a single processor package

Adds clock cycles

Utilizes more of the processor resources

Two physical processors in a single processor package

Provides additional performance on multi-processor systems


Feedback:


Intel Hyper-Threading Technology provides two logical processors in one physical package. Hyper-Threading Technology utilizes more processor resources for multiple threads of code, providing improved performance of multi-threaded applications. Hyper-Threading offers additional performance on multi-processor systems.


Resource Utilization


On average, the typical mix of Intel® IA-32 instructions utilizes only (select only one):


25% of resources of the Intel NetBurst® microarchitecture

50% of resources of the Intel NetBurst microarchitecture

35% of resources of the Intel NetBurst microarchitecture


Feedback:


A typical mix of Intel® IA-32 instructions utilizes only 35% of the Intel NetBurst® microarchitecture. Hyper-Threading Technology enables more resources to be utilized and available to a second thread of code for improved throughput and system performance.

 


 

How Hyper-Threading Technology Works

 

Overview and Lesson Objectives

In this lesson, you will learn


  • How Hyper-Threading Technology utilizes more of the resources of the Intel® NetBurst® microarchitecture

  • How threads are processed


When you finish this lesson, you will


  • Be able to identify which resources of the Intel NetBurst microarchitecture are shared, partitioned, and replicated

  • Be able to identify in which state a logical processor is set when not running a thread

  • Be able
    to identify how tasks are scheduled and how each logical processor is controlled


Multi-threaded Applications
Virtually all contemporary operating systems (including Microsoft Windows* and Linux*) divide their workload up into processes and threads that can be independently scheduled and dispatched. The same division of workload can be found in many high-performance applications such as database engines, scientific computation programs, engineering-workstation tools, multimedia programs, and graphic-intensive single-player and networked games.

To gain access to increased processing power, programmers design these programs to execute in dual-processor (DP) or multiprocessor (MP) environments. Through the use of symmetric multiprocessing (SMP), processes and threads can be dispatched to run on a pool of several physical processors.

With multi-threaded, MP-aware applications, instructions from several threads are simultaneously dispatched for execution by the processor's core. In processors supporting Hyper-Threading Technology, a single processor core executes these two threads concurrently, using out-of-order instruction scheduling to keep as many of its execution units as possible busy during each clock cycle.

Intel® NetBurst® Microarchitecture Pipeline
Without Hyper-Threading Technology enabled, the Intel® NetBurst® microarchitecture processes a single thread through the pipeline. Recall that the typical mix of typical instructions only utilizes about 35% of the resources in the Intel NetBurst microarchitecture.

Click the 'Advance' button to move through the pipeline


Intel® NetBurst® Microarchitecture Pipeline with Hyper-Threading Technology
When Hyper-Threading Technology is enabled, resources at each stage within the Intel® NetBurst® microarchitecture are replicated, partitioned, and shared to execute two threads through the pipeline.

Click the 'Advance' button to move through the pipeline.



Intel® NetBurst® Microarchitecture Selection Points
As the microarchitecture processes code through the pipeline, each processor controls selections along the way at distinct points in the pipeline to access the resources. Processors switch the resources as needed.



Managing Resources
Intel® Hyper-Threading Technology enables two logical processors on a single physical processor by replicating, partitioning, and sharing the resources of the Intel NetBurst® microarchitecture.

Replicated resources create copies of the resources for the two threads for:


  • All per-CPU architectural states

  • Instruction Pointers, Renaming logic

  • Some smaller resources (e.g., return stack predictor, ITLB, etc.)


Partitioned resources divide the resources between the executing threads for:


  • Several buffers (Re-Order Buffer, Load/Store Buffers, queues, etc)


Shared resources make use of the resources as needed between the two executing threads for:


  • Out-of-Order execution engine

  • Caches


Task Scheduling
The operating system (OS) schedules and dispatches threads of code to each processor. When a thread is not dispatched, the associated logical processor is kept idle.

When a thread is scheduled and dispatched to a logical processor (LP0), Hyper-Threading Technology utilizes the necessary processor resources to execute the thread.

When a second thread is scheduled and dispatched on LP1, resources are replicated, divided, or shared to execute the second thread. As each thread finishes, the operating system idles the unused processor, freeing resources for the running processor.

To optimize performance in multi-processor systems with Hyper-Threading Technology, the OS should schedule and dispatch threads to alternate physical processors before dispatching to different logical processors on the same physical processor.

Click the 'Advance' button to move through the pipeline.


Review
Hyper-Threading Technology provides two logical processors by replicating, partitioning, and sharing resources within the Intel® NetBurst® microarchitecture pipeline. The OS schedules and dispatches threads to each logical processor, just as it would in a dual-processor or multi-processor system. As the system schedules and introduces threads into the pipeline, resources are utilized as necessary to process two threads. Each processor makes selections at points in the pipeline to control and process the threads.

 

Check Your Progress

 

Resources


Drag the resources into the proper buckets.



Task Scheduling


True or False (select only one):


When a thread completes executi
on and no other thread is ready to run, the operating system idles the logical processor running that thread.


True

False


Feedback:


The operating system idles a processor, waiting for the next thread to be processed.

 


 

Code for Hyper-Threading Technology

 

Overview and Lesson Objectives

In this lesson, you will learn


  • How threads that compete for shared resources can slow performance in Hyper-Threading Technology (HT Technology)

  • How to prepare code for HT Technology

When you finish this lesson, you will


  • Know which programming technique to use when a thread lock is expected to be longer or shorter than an OS quantum of time

  • Be able to identify when to use OS synchronization techniques to improve thread execution in a Hyper-Threading Technology environment


Reduce Resource Competition
A platform with one or more Hyper-Threading Technology supported processors will automatically process two threads on each processor. Some code modules, however, might compete for shared resources, reducing the performance of the HT Technology processor. In a multi-processor system with Hyper-Threading Technology supported processors, the OS should dispatch code to different physical processors before scheduling multiple threads on the same processor. This kind of scheduling will keep resource competition to a minimum.

When you suspect that two modules that compete for resources might be scheduled on the same Hyper-Threading Technology supported processor, you should prepare your code to detect for the presence of HT Technology and modify the code to prevent resource competition.

For information on how to detect a Hyper-Threading Technology processor, see the course Detecting Hyper-Threading Technology.

Spin-wait loops are an example where code written for a multi-processor (MP) environment might compete for resources in a Hyper-Threading Technology environment.



Spin-Wait Loops in MP Environment
In multi-processor environments, a thread might sit in a spin-wait loop while waiting for release of a lock that another thread holds. Until the lock is released, the thread in the loop will execute the loop as fast as it can.

A simple loop might consist of a load, compare, and a branch else sleep(0). An efficient execution of such a loop by physical processor 0 can include simultaneous loads, compares, and branches - an endless consumption of the processors cycles until the lock is released. In a multi-processor environment, one processor does not impact the resource availability to a code thread running on another processor.

pacing="0" cellpadding="2" width="300" border="1">










Processor 0 Processor 1
Spin Wait:

  Try to get lock

    If fail,

      Jump to spin wait
Important code

   ...

Release lock

Spin-Waits in Hyper-Threading Technology
With Hyper-Threading Technology, spin-wait loops can delay the completion of one thread when another thread holds a lock, because the looping thread is repeatedly scheduled and executed, consuming vital shared resources in its loop. Considering our simple load, compare, and branch loop, logical processor 0 keeps executing its loop as fast as it can, consuming shared resources and clock cycles that logical processor 1 needs to complete execution of its thread. This effect impacts performance of multi-threaded code in a Hyper-Threading Technology processor.

Use the PAUSE Instruction to Optimize Code
Intel recommends that software developers always use the PAUSE instruction in spin-wait loops.

Starting with the Intel® Pentium® 4 processor, this recommendation was made for the benefit of power savings. Executing a PAUSE in a spin-wait loop forces the spin-wait to finish one iteration of the loop before starting the next iteration. Halting the spin-wait loop reduces consumption of non-productive resources - and thus power.

With a Hyper-Threading Technology supported processor, using the PAUSE instruction optimizes processing of two threads. Pausing the spin-wait loop frees more resources for the other processor, allowing faster completion of its thread.

Use OS Synchronization Techniques on Long Waits
When a thread suspects a thread will take longer than an OS quantum of time before a lock is released, use OS synchronization techniques to idle the processor until the lock is released. Idling the Hyper-Threading Technology supported processor frees the locked processor to use all resources available to complete its execution. Use OS primitives to release the lock when thread execution has completed.

If a thread suspects a thread will release a lock within an OS quantum, use a spin-wait.

Review
In Hyper-Threading Technology supported environments, some code threads might compete for shared resources, which can impact system performance. Spin-wait loops are an example of where this competition can occur.

Where resource competition might be a concern, your code should detect the presence of Hyper-Threading Technology and implement techniques so that competition does not impact performance. The course Detecting Hyper-Threading Technology describes how to detect the presence of Hyper-Threading Technology supported processors.

 

Check Your Progress

 

Spin-Wait Loops
Fill in the blanks from the list of possibilities.














HOLD OS WAIT OS LOAD HALT
PAUSE HALT OS SYNCHRONIZATION

 

If a thread suspects a lock will be held longer than an OS quantum, use



If a thread suspects a lock will be held less than an OS quantum, use


Feedback:


Your code should use the OS SYNCHRONIZATION techniques when the lock is suspected to be longer than an OS quantum. Use a spin-wait PAUSE command to wait for a lock when it's suspected to last less than an OS quantum.


OS Synchronization Techniques


When should you use OS synchronization techniques to improve thread execution in a Hyper-Threading Technology environment? (select only one):


Use synchronization techniques to halt execution of the thread needing resources and to release its lock.

Use synchronization techniques to idle a thread waiting for release of a lock and to release a thread holding a lock when it completes its thread.

Use synchronization techniques to halt threads to save power and free resources.


Feedback:


Use synchronization techniques to idle the thread running the spin-wait when you suspect the other thread might take a long time to complete. Also use OS primitives to tell the thread holding the lock to release it when it has completed execution.

 


 

Supporting Hyper-Threading Technology

 

Overview and Lesson Objectives

In this lesson, you will learn about


  • Which Intel® processors support Hyper-Threading Technology

  • The environments in which you might see Hyper-Threading Technology implemented

When you finish this lesson, you will be able to identify


  • Which Intel processors support Hyper-Threading Technology


Hyper-Threading Technology is Based on Intel® NetBurst® Microarchitecture
Hyper-Threading Technology is built around Intel® NetBurst® microarchitecture. Currently, the following processors support Hyper-Threading Technology:


  • Intel Pentium® 4 processor Extreme Edition supporting HT
    Technology

  • Intel Pentium 4 processors supporting HT Technology

  • Mobile Intel Pentium 4 processors supporting HT Technology

  • Intel® Xeon® processor


Hyper-Threading Technology Applications
Multi-threaded code can be found in contemporary operating systems and high-end applications. The Intel Xeon processor with Hyper-Threading Technology is well-suited for servers and high-end scientific computing workstations, as well as demanding applications such as graphics, multimedia, and gaming.

Review
Intel® Hyper-Threading Technology (HT Technology) is based on the Intel NetBurst® microarchitecture. HT Technology is enabled in the Intel Xeon processor and the members of the Intel Pentium® 4 family which explicitly support Hyper-Threading Technology. HT Technology supported processors are well suited for many demanding applications.

 

Check Your Progress

 

Hyper-Threading Technology enabled Processors


Which Intel® processor can run multi-threaded code with Hyper-Threading Technology? (select only one):


All Intel processors

Only the Intel Xeon® processor

Intel Xeon® processor and Intel Pentium® 4 Processor supporting Hyper-Threading Technology

Only the Intel Pentium® 4 Processor with Hyper-Threading Technology


Feedback:


Hyper-Threading Technology is supported by all Intel® Xeon® processors and by members of the Intel Pentium® 4 family which are explicitly designated as supporting Hyper-Threading Technology.


 


 

Course Summary

 

This course introduced you to Intel® Hyper-Threading Technology. The course covered introductory information on Hyper-Threading Technology, its benefits to system performance, how it works, and optimizing issues that developers need to consider.

As a developer interested in preparing code for Hyper-Threading Technology, your code might need to detect Hyper-Threading Technology. For information on detecting Hyper-Threading Technology, see the tutorial Detecting Hyper-Threading Technology.

 


 

Additional Resources

 

Intel supports developer efforts to prepare their code for Hyper-Threading Technology with information, seminars, and tutorials. For more information on Hyper-Threading Technology, visit the following sites:


 


Étiquettes:
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.