By Robert Godley
There are several issues that all operating systems must consider in regard to Hyper-Threading Technology enabled processors. This paper will give you an overview of several of the more important ones and offer some guidance in responding to them. Note that some of these same issues will also apply to application programs.
At the time this article was written in August, 2002, the operating system kernel that comes with most Linux* distributions did not support access to the second logical processor. This includes the Red Hat 7.2 distribution. To enable operating system support for the second logical processor, you need to install a kernel that has been optimized for Hyper-Threading Technology. This paper shows two ways to do this, either by installing a prepackaged kernel from Red Hat or by building a kernel yourself from the latest source files from the Linux Kernel web site.
What the OS Must Consider
First, the operating system must detect the presence of Hyper-Threading Technology enabled processors. There are two methods that an operating system can use to detect this class of processors:
- Execute the CPUID instruction and examine the information in the return registers. Please see the Intel® 64 and IA-32 Architectures Developer's Manual: Vol. 2A for a description of the information returned by the CPUID instruction
- Examine the information in the MPS and ACPI tables returned by the BIOS. The BIOS provides information in these two tables about the physical and logical processors in the system. The MPS table reports location information about the physical processors, and the ACPI table reports information about the logical processors. Intel Corporation recommends that BIOS vendors order the information returned in the ACPI table so that information about logical processor zero for all the physical processors is listed first, followed by information about logical processor one for all the physical processors, etc.
Application programs can also use the CUPID instruction to detect the presence of Hyper-Threading Technology enabled processors. This can be used to set affinity between threads in your application and specific logical processors.
The operating system must eliminate all execution-based timing loops after the operating system is running in threaded mode. Since the execution resources on a Hyper-Threading Technology enabled processor are shared by logical processors, a timing loop cannot be guaranteed the use of the same resources on each iteration. Therefore, the elapsed time of a timing loop will vary, depending on the code being executed on the other logical processor. Application programs should also eliminate all execution-based timing loops from their codes.
Another issue that applies to both operating systems and application codes is that spin-wait loops must be enhanced to add the PAUSE instruction to the loop. A spin-wait loop is a short code segment that reads a memory location and then compares it to a particular value. If the contents of that memory location is equal to this value, then the loop completes and execution resumes with the code following the loop. Otherwise, the memory location is re-read and the comparison is done again. A spin-wait loop is commonly used to synchronize two or more threads of execution when the expected wait time is "short".
On a Hyper-Threading Technology enabled processor, a thread executing a spin-wait loop can consume a large percentage of the processor's shared resources. This will decrease the performance of the thread executing on the other logical processor. The PAUSE instruction in a spin-wait loop gives the other logical processor access to most of the processor's shared resources. Note that while a PAUSE instruction adds latency to the spin-wait loop, overall system performance is improved when using the PAUSE instruction.
The operating system must also optimize the idle loop with a HLT instruction. An idle loop, like a spin-wait loop, can consume a high percentage of a physical processor's execution resources. When the idle logical processor is halted, all shared resources can be fully utilized by the other logical processor.
The operating system's scheduling algorithm needs to be aware that Hyper-Threading Technology enabled processors have two logical processors that share some of the execution resources in each physical processor. The OS should not treat all logical processors on a physical processor as separate physical processors. The operating system has two competing goals that it must balance in its dispatch algorithm. First, it is desirable to assign a thread that is ready to execute on an idle logical processor. However, it is also desirable to assign a thread to the last processor on which it executed. This is predicated on the probability that the instruction and data caches will still contain some valid entries for this thread. For more information read the article by Intel's Henry Ou, Long Duration Spin-wait Loops on Hyper-Threading Technology Enabled Intel Processors.
Finally, although the Linux operating system does not have a license that limits the number of processors it will boot, application programs must decide how to count logical processors towards the operating systems processor license limit. Intel Corporation recommends that application programs count only the physical processors towards a license limit, not the total number of logical processors. (See "Multi-core processors raise software licensing questions )."
Linux Hyper-Threading Technology Support From Red Hat
The SMP kernel that ships with Red Hat Linux* 7.2 (kernel version 2.4.7-10smp) does not recognize the second logical processor on a Xeon processor. You can visit Red Hat* to get a Hyper-Threading Technology enabled kernel, along with instructions for installing it. As of August 2002, the version of the kernel posted on this site is 2.4.9-21.1smp. Note that this kernel does not apply only to the Red Hat 7.2 distribution: it should be valid for use on any Linux distribution that is based on the 2.4 kernel.
The general steps for installing a Hyper-Threading Technology enabled Linux ker nel on a system are listed below. The instructions use the file and directory names from the Red Hat distribution. Other distributions may have slightly different names and directory locations.
- Install the Linux distribution.
- Modify the following networking support files as appropriate:
- In the /etc/xinetd.d directory, change "disable = yes" to "disable = no" in the appropriate files.
- If you want to preserve boot files from your original installation, then copy the following files and directories to a save area:
- Execute the following rpm commands:
rpm -Uvh kernel-headers-2.4.9-21.1.i386.rpm (optional)
rpm -Uvh kernel-doc-2.4.9-21.1.i386.rpm (optional)
rpm -Uvh modutils-2.4.10-1.i386.rpm
rpm -Uvh tux-2.2.0-1.i386.rpm
rpm -Uvh kernel-smp-2.4.9-21.1.i686.rpm
- Edit the appropriate boot loader file:
If you are using the /etc/lilo.conf boot loader file, add entry like the following:
image = /boot/vmlinuz=2.4.9-21.1smp
and then execute the /sbin/lilo command.
If you are using the /boot/grub/grub.conf boot loader file, then add an entry like the following (and for grub you don't need to execute the /sbin/grub command):
title Red Hat Linux-ht (2.4.9-21.1smp)
kernel /boot/vmlinuz-2.4.9-21.1 ro root=/dev/sda1 acpismp=force
- Restore the four files and directory saved in the previous step.
This Hyper-Threading Technology enabled kernel, booted without the "acpismp=force" argument, will boot only one logical processor in each physical processor in the system. It is also possible to boot fewer than the maximum number of logical processors. To do this, add the line append "maxcpus=N" parameter to the lilo.conf boot loader file or add maxcpus=N to the kernel line in the grub.conf boot loader file.
Building a Hyper-Threading Technology Enabled Kernel
At the time of this writing, August 2002, the 2.4.19 kernel is the latest stable version of the 2.4 kernel. This kernel supports Hyper-Threading Technology on a Xeon processor-based DP and MP systems.
To build a 2.4.19 kernel, first download the 2.4.19 kernel source from the http://www.kernel.org/* web site and then follow these instructions:
- Copy the Linux kernel source compressed tar file to the target system.
- Extract the 2.4.19 source:
tar xzvf [source path]/linux-2.4.19.tar.gz
mv linux linux-2.4.19
ln -s linux-2.4.19 linux
- Build the 2.4.19 kernel:
// (Optional) Change the EXTRAVERSION
// variable to -ht
// This command prepares the kernel
// source tree
// This command configures the build
// options; the output from this step is
// the .config file. Make sure you
// change 'Processor Family' to P4,
// check 'yes' for MRTT support, and
// check 'yes' for ACPI support.
// This command checks for dependencies
// This step removes any data from a
// previous build
// This step compiles the new kernel
// This step compiles the new modules
// This step installs the new modules
- Install the 2.4.19-ht kernel and module files:
either execute the command
or copy the two files by hand:
cp -p System.map /boot/System.map-2.4.19-ht
cp -p bzImage /boot/vmlinuz-2.4.19-ht
- Add a new entry in the boot loader file
The previous sections have given you a short overview that should prepare you to work with Hyper-Threading Technology enabled processors under one distribution of the Linux operating system. Visit some of the links given above or in the Resources section below to get the latest information on changes to the various Linux distributions that may affect how to work with Hyper-Threading Technology enabled processors.
A Parting Note ...
There are several commands and files of interest on your Hyper-Threading Technology enabled system.
- The /proc file system generates system information when its files are read. The /proc/cpuinfo file reports the values returned by the cupid instruction for each booted processor. You can easily verify that all logical processors were booted by reading this file.
- The uname -a command prints information about the currently booted kernel
- The xcpustate & command causes a graphical display of the performance of each booted processor
About the Author
Robert Godley is a Senior Applications Engineer working with Intel's Software and Solutions Group. Bob has worked at Intel for 14 years, eight of which was with the Supercomputer Systems Division where he worked on mathematical libraries and operating systems. He now is in a group that assists software vendors optimize their products for the latest Intel Architecture processors.
- Red Hat Web Site*
- Introduction to Linux and Open Source
- Long Duration Spin-wait Loops on Hyper-Threading Technology Enabled Intel Processors
- Detailed discussion of Hyper-Threading Technology Architecture and Microarchitecture