by Robert Godley
There are several issues that all operating systems must consider in regard to Hyper-Threading Technology enabled processors. This paper will give you an overview of several of the more important ones and offer some guidance in responding to them. Note that some of these same issues will also apply to application programs.
The Windows* 2000, Windows XP and .NET operating systems support Hyper-Threading Technology enabled processors. This paper describes the different levels of support between the Windows 2000 family and the Windows XP and .NET family of operating systems. The Windows operating systems are licensed to support varying numbers of processors, and this paper shows the number of both logical and physical processors supported. Finally, the paper provides a list of parameters for the boot.ini file that relate to Hyper-Threading Technology.
What the OS Must Consider
First, the operating system must detect the presence of Hyper-Threading Technology enabled processors. There are two methods that an operating system can use to detect this class of processors:
- Execute the CPUID instruction and examine the information in the return registers.
Please see the Intel® 64 and IA-32 Architectures Developer's Manual: Vol. 2A for a description of the information returned by the CPUID instruction
- Examine the information in the MPS and ACPI tables returned by the BIOS.
The BIOS provides information in these two tables about the physical and logical processors in the system. The MPS table reports location information about the physical processors, and the ACPI table reports information about the logical processors. Intel Corporation recommends that BIOS vendors order the information returned in the ACPI table so that information about logical processor zero for all the physical processors is listed first, followed by information about logical processor one for all the physical processors, etc.
Application programs can also use the CUPID instruction to detect the presence of Hyper-Threading Technology enabled processors. This can be used to set affinity between threads in your application and specific logical processors.
The operating system must eliminate all execution-based timing loops after the operating system is running in multi-threaded mode. Since the execution resources on a Hyper-Threading Technology enabled processor are shared by logical processors, a timing loop cannot be guaranteed the use of the same resources on each iteration. Therefore, the elapsed time of a timing loop will vary, depending on the code being executed on the other logical processor. Application programs should also eliminate all execution-based timing loops from their codes.
Another issue that applies to both operating systems and application codes is that spin-wait loops must be enhanced to add the PAUSE instruction to the loop. A spin-wait loop is a short code segment that reads a memory location and then compares it to a particular value. If the contents of that memory location is equal to this value, then the loop completes and execution resumes with the code following the loop. Otherwise, the memory location is re-read and the comparison is done again. A spin-wait loop is commonly used to synchronize two or more threads of execution when the expected wait time is "short".
On a Hyper-Threading Technology enabled processor, a thread executing a spin-wait loop can consume a large percentage of the processor's shared resources. This will decrease the performance of the thread executing on the other logical processor. The PAUSE instruction in a spin-wait loop gives the other logical processor access to most of the processor's shared resources. Note that while a PAUSE instruction adds latency to the spin-wait loop, overall system performance is improved when using the PAUSE instruction.
The operating system must also optimize the idle loop with a HLT instruction. An idle loop, like a spin-wait loop, can consume a high percentage of a physical processor's execution resources. When the idle logical processor is halted, all shared resources can be fully utilized by the other logical processor.
The operating system's scheduling algorithm needs to be aware that Hyper-Threading Technology enabled processors have two logical processors that share some of the execution resources in each physical processor. The OS should not treat all logical processors on a physical processor as separate physical processors. The operating system has two competing goals that it must balance in its dispatch algorithm. First, it is desirable to assign a thread that is ready to execute on an idle logical processor. However, it is also desirable to assign a thread to the last processor on which it executed. This is predicated on the probability that the instruction and data caches will still contain some valid entries for this thread. For more information read the article by Intel's Henry Ou, Long Duration Spin-wait Loops on Hyper-Threading Technology Enabled Intel Processors.
Finally, since the Windows Operating Systems have licenses that limits the number of processors it will boot, these operating systems must decide how to count logical processors towards the operating systems processor license limit. Intel Corporation recommends that both operating systems and application programs count only the physical processors towards a license limit, not the total number of logical processors. (See Multi-core processors raise software licensing questions for additional information.)
Windows 2000 operating systems can detect Hyper-Threading Technology enabled processors, but it has not been fully optimized for their support. The Windows XP and .NET operating systems have been optimized for the support of Hyper-Threaded processors. Windows 2000 and Windows XP (and .NET server) operating systems count the number of CPUs differently. Windows 2000 counts all logical processors in the system towards the processor license limits. Windows XP* and .NET* count only the physical processors. The license limits are given in the following table.
|Operating System Version||Maximum Logical Processors||Corresponding Physical Processors|
|Windows 2000 Professional||2||2|
|Windows 2000 Server||4||4|
|Windows 2000 Advanced Server||8||8|
|Windows 2000 Datacenter Server||32||32|
|Windows XP Professional||4||2|
|Windows .NET Web Server||4||2|
|Windows .NET Standard Server||4||2|
|Windows .NET Enterprise Server||14||8|
|Windows .NET Datacenter Server||32||32|
During the booting process, the Microsoft kernel will only start as many CPUs as the given license allows. For example,
- If you install Windows .NET Professional on 4-processor Xeon MP server, it will boot two physical processors, each with two logical processors (logical processors on other two physical processors not booted).
- If you install Windows 2000 Professional on 4-processor Xeon MP server, it will boot two physical processors, each with just one logical processor (logical processors on other two physical processors not booted). If you install Windows 200 Advanced Server on this system, it will boot four physical processors, each with two logical processor active.
Also note that there is an upper limit of 32 (logical and/or physical) processors for the Datacenter operating systems. Please see the Microsoft Technet Web site* for more information.
Microsoft Boot Parameters
There is a hidden system file in the root directory of the C: drive, named boot.ini, that controls the boot process. There are two ways to view and/or edit this file:
- Bring up Windows Explorer, go to Tools -> Folder Options -> View and un-click the "Hide protected operating system files (Recommended)" option. You can then edit this file using notepad or wordpad text editing programs.
- Bring up a command prompt window and execute the
attrib -s -r -h c:boot.inicommand. When you are done viewing or editing this file execute the
attrib +s +r +h c:boot.inicommand
Each command line in the boot.ini file can take a number of options. Please see the Microsoft MSDN Library for a complete description of the options*. The arguments of interest for HT support are
- /NumProcs = N
The processing for this option is done by the kernel during system initialization and is used boot fewer logical processors than maximum allowed for the system and OS license
The processing for this option is done in the multi-processor HAL (Hardware Abstraction Layer) and it limits the number of processors booted to one. The option /ONECPU has the same effect as /NumProcs = 1.
There are three operating system kernel files on the Microsoft install CD:
ntoskrnl.exe - OS kernel for uni-processor systems
ntkrnlmp.exe - OS kernel for multi-processor systems
ntkrnlpa.exe - OS kernel that supports 36-bit addresses for physical memory
The Microsoft operating system install program will install the MP kernel if the motherboard is capable of having more than one microprocessor (even if only one processor is installed) or if it detects a single Hyper-Threading Technology enabled processor. The kernel file gets renames by the install program to ntoskenl.exe.
HAL is an abbreviation for Hardware Abstraction Layer. There are three HAL files on the Microsoft install CD:
halacpildll - "vanilla" ACPI-enabled motherboards
halaacpi.dll - single processor APIC motherboards
halmacpi.dll - ACPI motherboards with multi-processor support
Note that ACPI stands for Advanced Configuration and Power Interface and APIC is stands for Advanced Programmable Interrupt Controller. The Microsoft operating system install program determines what type of mother board is in installed, and selects the appropriate HAL file to install. The install program renames the file hal.dll
The complete set of HALs and kernels files are on the Windows install CD in the directory I386. They are compressed files, and to extract them use the Microsoft expand utility.
The previous sections have given you a short overview that should prepare you to work with Hyper-Threading Technology enabled processors under several Windows operating systems. Here is a final example to close the discussion.
Example boot.ini file
If you plan to boot fewer than the maximum number of processors or different kernels and HALs, it is a good idea to leave the original entry that follows the [operating systems] line. Here is a sample boot.ini file for a 4-processor Xeon system that boots eight (default), four, and one logical processors:
Enterprise Server" /fastdetect
Enterprise Server - 4 procs" /fastdetect /NumProcs=4
Enterprise Server - 1 CPU" /fastdetect /ONECPU
- Intel® Developer Zone
- Information on Hyper-Threading Technology and Windows 2000*
- Long Duration Spin-wait Loops on Hyper-Threading Technology Enabled Intel Processors
- Detailed discussion of Hyper-Threading Technology Architecture and Microarchitecture
Information about the boot.ini file is contained in several MSDN references. The following titles are especially helpful:
- Chapter 6 - Setup and Startup, Windows 2000 Professional Resource Kit
- Selecting the Operating System, Windows 2000 Server Resource Kit Online Books
- Parameters for the Boot.ini File, Driver Development Tools: Windows DDK
About the Author
Robert Godley is a Senior Applications Engineer working with Intel's Software and Solutions Group. Bob has worked at Intel for 14 years, eight of which was with the Supercomputer Systems Division where he worked on mathematical libraries and operating systems. He now is in a group that assists software vendors optimize their products for the latest Intel Architecture processors.