by Gail Lyons
Introduction
Recent advances in silicon technology have changed the trade-offs in micro-processor architecture. Larger transistor counts have facilitated the design of Hyper-Threading Technology (HT Technology) and multi-core processors. A processor with HT Technology can provide two logical processors in one physical core. The physical resources of the core are shared between the two logical processors, and the state information that is needed to support each logical processor is duplicated. Applications rarely consume all of the physical resources of a core, so HT Technology enabled cores can improve the performance of many applications.
Multi-core technology places more than one core in a physical package. Multi-core processors may have a shared cache between its cores, or have a dedicated cache topology for each core. The performance characteristics of HT Technology and multi-core processors depend on processor topology and its cache topology on the platform. Optimal performance of multithreaded software requires effective management of shared and dedicated resources available to each logical processor.
Although the processor and cache topology in a modern platform are more complex than those of a traditional symmetric multi-processor design, this paper will demonstrate a detailed algorithm of enumerating processor topology. It will also show how to bind a thread to a specific processor under a Linux environment.
Before delving into this topic, a few terms should be defined:
Processor Core: The circuitry that provides the dedicated ability to decode and execute instructions, and transfer data between certain components in a physical package. A processor core may contain one or more logical processors.
Logical Processor: The basic unit of processor hardware that allows the software executive in the operating system to dispatch a task or execute a thread context. Each logical processor can execute only one thread context at a time.
Physical Package: The physical package of a microprocessor capable of executing one or more threads of software at the same time. Each physical package plugs into a physical socket. Each physical package may contain one or more processor cores.
Hyper-Threading Technology: A feature within the IA-32 family of processors, where each processor core provides the functionality of more than one logical processor.
Multi-Core Processor: A physical package that contains more than one processor core.
Multi-Processor Platform: A computer system made of two or more physical sockets
SMT: Abbreviated name for simultaneous multithreading. This is an efficient means in silicon to provide the functionalities of multiple logical processors within the same processor core by sharing execution resources and cache hierarchy between logical processors.
Our goal is to discover the processor topology of the system. We want to know how many physical packages are present on the system. We want to find out if a physical package is multi-core, and if so, how many cores it contains. We want to know if the processor core contains Hyper-Threading Technology, and if so, if all logical processors are enabled by the BIOS and visible to the application.
This is the algorithm that we will use to discover the topology of the system.
- Determine the maximum cpuid input value.
- Determine the vendor ID.
- Determine if hardware multithreading capability is supported.
- Determine the maximum number of logical processors in a physical package.
- Determine the maximum number of cores in a physical package.
- Determine the initial APIC ID for all logical processors.
- Determine the number of available cores in the system.
- Determine the number of physical packages in the system.
Note that recent versions of some operating systems have updated the /proc/cpuinfo file to support multi-processor platforms. If the /proc/cpuinfo file on your system properly reflects multi-processor platform support, then there is no need to go through these steps. Please refer to the end of this article for information on how to interpret this data.
Determine the Maximum cpuid Input Value
The following is a code snippet from the program cpucount_linux.cpp:
// This macro returns the cpu id data.
//
#define cpuid( in, a, b, c, d )
asm ( "cpuid" :
"=a" (a), "=b" (b), "=c" (c), "=d" (d) : "a" (in));
// Returns the maximum input value for the cpuid instruction
// supported by the chip.
//
INT32 get_max_input_value()
{
UINT32 a,b,c,d;
try {
cpuid( 0, a, b, c, d );
}
catch (…) {
return 0;
}
return a;
}
// Returns a non-zero value if this system is running Genuine Intel hardware.
//
unsigned int GenuineIntel(void)
{
unsigned int a,b,c,d;
unsigned int venB = ('u' << 24) | ('n' << 16) | ('e' << 8) | 'G';
unsigned int venD = ('I' << 24) | ('e' << 16) | ('n' << 8) | 'i';
unsigned int venC = ('l' << 24) | ('e' << 16) | ('t' << 8) | 'n';
cpuid( 0, a, b, c, d );
return ( ( b == venB ) &&
( d == venD ) &&
( c == venC ) );
}
// EDX[28] - Bit 28 set indicates multithreading is supported in hardware.
#define MT_BIT 0x10000000
// Returns non-zero if hardware multithreading is supported.
//
INT32 MTSupported(void)
{
UINT32 a,b,c,d;
if ( GenuineIntel() ) {
INT32 max = get_max_input_value();
if ( max >= 1 ) {
// Get the chip information.
cpuid( 1, a, b, c, d );
//Indicate if MT is supported by the hardware
return ( d & MT_BIT );
}
}
return 0;
}
// EBX[23:16] indicates number of logical processors per package
#define NUM_LOGICAL_BITS 0x00FF0000
// Returns the number of logical processors per processors package.
//
INT32 LogicalProcessorsPerPackage(void)
{
unsigned int a, b, c, d;
if ( ! MTSupported() ) return 1;
cpuid( 1, a, b, d );
return ( b & NUM_LOGICAL_BITS ) >> 16;
}
// EAX[31:26] - Bit 26 thru 31 contains the cores per processor pack -1.
#define CORES_PER_PROCPAK 0xFC000000
// This macro calls cpuid with 2 inputs, eax and ecx.
//
#define cpuid2( in1, in2, a, b, c, d )
asm ( "cpuid" :
"=a" (a), "=b" (b), "=c" (c), "=d" (d) :
"a" (in1), "c" (in2) );
// Returns the number of cores per processor pack.
//
INT32 multiCoresPerProcPak( )
{
UINT32 a,b,c,d;
INT32 max, num;
max = get_max_input_value();
if ( max >= 4 ) {
cpuid2 (4, 0, a, b, c, d );
num = (( a & CORES_PER_PROCPAK ) >> 26 ) + 1;
}
else
num = 1;
return num;
}
// EBX[31:24] Bits 24-31 contains the 8-bit initial APIC ID for the
// processor this code is running on.
#define INITIAL_APIC_ID_BITS 0xFF000000
// Return the Advanced Programmable Interface Controller (APIC) ID.
//
UINT32 get_apic_id()
{
UINT32 a,b,c,d;
UINT32 apic_id;
cpuid( 1, a, b, c, d );
apic_id = ( b & INITIAL_APIC_ID_BITS ) >> 24;
return apic_id;
}
The initial APIC ID is 8-bits long. Bits 7:0 contain the physical package ID, and if Hyper-Threading Technology is enabled, the SMT ID, and if this physical package is multi-core, the core ID. Intel designed the initial APIC ID to represent the SMT ID and the core ID as dynamically sized fields. This means that these two fields are only as large as they need to be. To know how many bits are needed for each field, you need to know the number of cores in the physical package, and the number of processors per core that support HT Technology.
At this writing, only two logical processors can exist on one core, and only two cores can exist on a physical package. Therefore, only one bit each is currently needed to represent the IDs. If four cores were present on a physical package, then two bits would be needed to represent the core ID. The interpretation of these fields is dependent upon what is supported on the system.
As an example, different configurations and the resulting initial APIC ID is given below:
- Package ID = 7 with Hyper-Threading Technology disabled and no multi-core support, initial APIC ID = 0x7. Bits 2, 1, and 0 contain the package ID.
- Package ID = 3 with Hyper-Threading Technology enabled, SMT ID = 1, but no multi-core support, initial APIC ID = 0x7. Bits 2 and 1 contain the package ID, and bit 0 contains the SMT ID.
- Package ID = 3 with Hyper-Threading Technology disabled, core ID = 1, initial APIC ID = 0x7. Bits 2 and 1 contain the package ID, and bit 0 contains the core ID.
- Package ID = 1 with Hyper-Threading Technology enabled, SMT ID = 1, core ID=1, initial APIC ID = 0x7. Bit 2 contains the package ID, bit 1 contains the core ID, and bit 0 contains the SMT ID.
The following code displays how to find the width of the SMT ID and core ID. The mask and shift values needed to correctly interpret the initial APIC ID are determined by the number of logical processors per core, and the number of cores.
//
// Determine the width of the bit field that can represent the value countItem.
//
unsigned int find_maskwidth(unsigned int countItem)
{
unsigned int maskWidth;
unsigned int count = countItem;
asm (
#ifdef __x86_64__
"push %%rcx"
"push %%rax"
#else
"pushl %%ecx"
"pushl %%eax"
#endif
"xorl %%ecx, %%ecx"
: "=c" (maskWidth)
: "a" (count)
);
asm ( "decl %%eax"
"bsrw %%ax,%%cx"
"jz next"
"incw %%cx"
: "=c" (maskWidth)
);
asm
( "next:"
#ifdef __x86_64__
"pop %rax"
"pop %rcx"
#else
"popl %eax"
"popl %ecx"
#endif
);
return maskWidth;
}
// Extract the subset of bit field from the 8-bit value FullID.
// It returns the 8-bit sub ID value.
//
unsigned int getSubID(unsigned int fullID,
unsigned int maxSubIDValue,
unsigned int shiftCount)
{
unsigned int maskWidth;
unsigned int maskBits;
maskWidth = find_maskwidth( maxSubIDValue );
maskBits = (0xff << shiftCount) ^ (0xff << (shiftCount + maskWidth));
return (fullID & maskBits);
}
However, the initial APIC ID is different for each logical processor on the system. Each processor’s initial APIC ID must be examined. This is done by binding the thread to each processor in turn, and reading that processor’s initial APIC ID. Binding the thread is done using the sched_getaffinity() and sched_setaffinity() calls and cpu_set macros defined in /usr/include/sched.h. Once the thread is running on a processor, the initial APIC ID is read. Then this value is decoded to determine the SMT ID and the core ID of the processor. The number of logical processors in the system is also counted in this algorithm.
// Find the affinity and IDs of each logical processor in the
// system by reading the initial APIC for each processor.
//
unsigned int affinityMask = 1;
cpu_set_t currentCPU;
unsigned int packageIDMask;
int j = 0;
while ( j < numProcessors )
{
CPU_ZERO(¤tCPU);
CPU_SET(j, ¤tCPU);
if ( misc_sched_setaffinity (0, ¤tCPU) == 0 )
{
// Ensure system has switched to the right CPU
sleep(0);
// Get the initial APIC ID for this processor.
apicID = get_apic_id();
// Obtain SMT ID and core ID of each logical processor from
// initial APIC ID. Shift value for SMT ID is 0.
// Shift value for core ID is the mask width for maximum logical
// processors per core
//
tblSMTID[j] = getSubID(apicID, logicalPerCore, 0);
tblCoreID[j] = getSubID(apicID, corePerPack,
find_maskwidth(logicalPerCore));
// Extract package ID from the initial APIC ID.
// Shift value is the mask width for max Logical per package
//
packageIDMask = (0xff << find_maskwidth( logicalPerPack ));
tblPkgID[j] = apicID & packageIDMask;
// Number of available logical processors in the system.
//
numLPEnabled ++;
// Hold results to print at end.
//
sprintf(tmp,
"AffinityMask = 0x%x; Initial APIC = 0x%x; Physical ID = %d, Core ID = %d, SMT ID = %d",
affinityMask, apicID, tblPkgID[j], tblCoreID[j],
tblSMTID[j]);
strcat(procData, tmp);
}
j++;
affinityMask = 1 << j;
}
Determine the Number of Available Cores in the System
Now that we know how many logical processors are enabled on the system, and we have captured the package ID and core ID, the total number of available cores can be determined.
// Count the total number of available cores in the system.
//
int countAvailableCores( unsigned int tblPkgID[],
unsigned int tblCoreID[],
unsigned int numLPEnabled )
{
unsigned int CoreIDBucket[256];
int i, procNum;
int coreIDFound;
int total = 1;
CoreIDBucket[0] = tblPkgID[0] | tblCoreID[0];
for (procNum = 1; procNum < numLPEnabled; procNum++)
{
coreIDFound=0;
for (i = 0; i < total; i++)
{
// Comparing bit-fields of logical processors residing in
// different packages. Assuming the bit-masks are the same
// on all processors in the system.
//
if ((tblPkgID[procNum] | tblCoreID[procNum]) == CoreIDBucket[i])
{
coreIDFound = 1;
break;
}
}
// Did not match any bucket. Create a new one.
//
if (! coreIDFound )
{
CoreIDBucket[i] = tblPkgID[procNum] | tblCoreID[procNum];
// Number of available cores in the system
//
total++;
}
}
return total;
}
Determine the Number of Physical Packages in the System
A similar algorithm is used to count the number of physical packages in the system.
// Count the physical processors in the system
//
int countPhysicalPacks( unsigned int tblPkgID[],
unsigned int numLPEnabled )
{
unsigned int packageIDBucket[256];
int i, procNum;
int packageIDFound;
int total = 1;
packageIDBucket[0] = tblPkgID[0];
for (procNum = 1; procNum < numLPEnabled; procNum++)
{
packageIDFound = 0;
for (i = 0; i < total; i++)
{
// Comparing bit-fields of logical processors residing in
// different packages. Assuming the bit-masks are the same
// on all processors in the system.
//
if (tblPkgID[procNum] == packageIDBucket[i])
{
packageIDFound = 1;
break;
}
}
// Did not match any bucket. Create a new one.
//
if ( ! packageIDFound )
{
packageIDBucket[i] = tblPkgID[procNum];
// Total number of physical packages in the system
//
total++;
}
}
return total;
}
At this point, the topology of the system is known. The number of logical processors per core, the number of cores per physical package, and the number of physical packages has been calculated. It is known if Hyper-Threading Technology is supported and enabled on this system. And the initial APIC ID of each logical processor on the system has been deconstructed and reported. Now that you have determined how the processors on your system are related, you can tune your application to take advantage o f the features of hyper-threading and multi-core technologies.
Refer to the program associated with this paper to see a complete example that calculates the system topology on Linux.
Determining the System’s Topology Using /proc/cpuinfo
Recent versions of some operating systems have updated the /proc/cpuinfo file to support multi-processor platforms. If the /proc/cpuinfo file on your system correctly reflects the processor information, then there is no need to go through the steps listed above. Instead, the information in this file can be interpreted.
The /proc/cpuinfo file contains a paragraph of data for each processor on the system. There are six entries in the /proc/cpuinfo description that applies to the multi-core and Hyper-Threading Technology detection: processor, vendor id, physical id, siblings, core id and cpu cores.
- The processor entry contains a unique identifier for this logical processor.
- The physical id entry contains a unique identifier for each physical package.
- The core id entry holds a unique identifier for each core.
- The siblings entry lists the number of logical processors that exist on the same physical package.
- The cpu cores entry contains the number of cores that exist on the same physical package.
- The vendor id entry holds the string GenuineIntel if the processor is an Intel processor.
All logical processors that have the same physical id share the same physical socket. Each physical id represents a unique physical package. Siblings indicate the number of logical processors that exist on this physical package. They may or may not support Hyper-Threading Technology. Each core id represents a unique processor core. All the logical processors with the same core id exist on the same processor core. If more than one logical processor has the same core id, and the same physical id, then the system supports Hyper-Threading Technology. If there are two or more logical processors with the same physical id, but different core ids, then this represents a multi-core processor. Multi-core support is also indicated by the cpu cores entry.
As an example, if a system contained two physical packages, each which contained two processor cores that supported HT Technology, the /proc/cpuinfo file would contain this data. (Note that the data would not be in a table.)
| processor | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| physical id | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| core id | 0 | 2 | 1 | 3 | 0 | 2 | 1 | 3 |
| siblings | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
| cpu cores | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
This example shows that logical processors 0 & 4 reside on core 0, physical package 0. This indicates that logical processors 0 & 4 are enabled for HT Technology. The same observation can be made for logical processors 2 & 6 on core 1, package 0, logical processors 1 & 5 on core 2, package 1, and logical processors 3 & 7 on core 3, package 1. The system is enabled for HT Technology because two logical processors share the same core. Multi-core support can be determined in two ways. Since cores 0 & 1 exist on package 0, and cores 2 & 3 exist on package 1, this is a multi-core system. Also, the cpu cores entry is 2, which indicates that two cores reside in the physical package. It is a multi-processor system because there are two packages.
It is important to note that the numbering of the physical id and core id may or may not be contiguous. It is not uncommon to have two physical packages on the system, and have the physical ids equal to 0 and 3.
Conclusion
The performance characteristics of Hyper-Threading Technology and multi-core processors are more complex than those of a traditional symmetric multi-processor. To take advantage of the features of HT Technology and multi-core technology, it is important to know the type of processors on your system, and how they are related.
This paper has demonstrated how to discover the topology of a system. By invoking the cpuid instruction with various inputs, we have discovered whether multi-core is supported on this system, and if it is, the number of cores in a physical package. We also determined if the system supports Hyper-Threading Technology, and if so, whether Hyper-Threading Technology is enabled in the BIOS. We bound a thread to each logical processor in the system, and decoded its initial APIC ID.
The code that is displayed in the paper was developed and tested on tw o systems running Red Hat distributions. One system had RH 4AS-2.8 installed, which contains gcc 3.4.4. The other system had RH3AS-7.3 installed, which contains gcc 3.2.3. Other Linux distributions may implement the affinity calls and cpu_set macros differently. Please refer to /usr/include/sched.h on your system.
Additional Resources
- Hyper-Threading Technology and Multi-Core Processor Detection by Phil Kerly
- Intel Architecture Software Developer's Manual, Volume 2: Instruction Set Reference Manual
- Threading/Multi-Core Developer Community
- Itanium® Processor Family
- Open Source Developer Community
