Intel® 64 Architecture Processor Topology Enumeration

Processor topology information is important for a number of processor-resource management practices, ranging from task/thread scheduling, licensing policy enforcement, affinity control/migration, etc. Topology information of the cache hierarchy can be important to optimizing software performance. This white paper covers topology enumeration algorithm for single-socket to multiple-socket platforms using Intel 64 and IA-32 processors. The topology enumeration algorithms (both processor and cache) using initial APIC ID has been extended to use x2APIC ID, the latter mechanism is required for future platforms supporting more than 256 logical processors in a coherent domain.

Download "Intel® 64 Architecture Processor Topology Enumeration" [PDF 153KB]

Download Code Package: 20160519-cpuid_topo.tar.gz


Hardware multithreading in microprocessors has proliferated in recent years. The majority of Intel® architecture processors shipping today provide one or more forms of hardware multi-threading support (multicore and/or simultaneous multithreading (SMT), the latter introduced as HyperThreading Technology in 2002). From a processor hardware perspective, the physical package of an Intel 64 processor can support SMT and multi-core. Consequently, a physical processor is effectively a hierarchically ordered collection of logical processors with some forms of shared system resources (for example, memory, bus/system links, caches) From a platform hardware perspective, hardware multithreading that exists in a multi-processor system may consist of two or more physical processors organized in either uniform or non-uniform configuration with respect to the memory subsystem.

Application programming using hardware multithreading features must follow the programming models and software constructs provided by the underlying operating system. For example, an OS scheduler generally assigns a software task from a queue using hardware resource at the granularity of a logical processor; an OS may define its own data structure and provide services to applications that allows them to customize the assignment between task and logical processor via an affinity construct for multithreaded applications The OS and the software stack underneath an application (the BIOS, the OS loader) also play significant roles in bringing up the hardware multi-threading features and configuring the software constructs defined by the OS.

The CPUID instruction in Intel 64 architecture defines a rich set of information to assist BIOS, OS, and applications to query processor topology that are needed for efficient operation by each respective member of the software stack. Generally, the BIOS needs to gather topology information of a physical processor, determine how many physical processors are present in the system; prepare the necessary software constructs related to system topology, and pass along the system topology information to the next layer of the software stack that takes over control of the system. The OS and the application layers have a wide range of uses for topology information. This document covers several common software usages by OS and applications for using CPUID to analyze processor topology in a single-processor or multi-processor system.

The primary software usage of processor topology enumeration deals with querying and identifying the hierarchical relationship of logical processor, processor cores, and physical packages in a single-processor or multi-processor system. We’ll refer to this usage as system topology enumeration. System topology enumeration may be needed by OS or certain applications to implement licensing policy based on physical processors. It is used by OS to implement efficient task-scheduling, minimize thread migration, configure application thread management interfaces, and configure memory allocation services appropriate to the processor/memory topology. Multithreaded applications need system topology information to determine optimal thread binding, manage memory allocation for optimal locality, and improve performance scaling in multi-processor systems.

Intel 64 processors fulfill system topology enumeration requirements:

  • Each logical processor in an Intel 64 or IA-32 platform supporting coherent memory is assigned a unique ID (APIC ID) within the coherent domain. A multi-node cluster installation may employ vendor-specific BIOS that preserve the APIC IDs assigned (during processor reset) within each coherent domain, extend with node IDs to form a superset of unique IDs within the clustered system. This document will only cover the CPUID interfaces providing unique IDs within a coherent domain.
  • The values of unique IDs assigned within a coherent Intel 64 or IA-32 platform conform to an algorithm based on bit-field decomposition of the APIC ID into three sub-fields . The three sets of sub-fields correspond to three hierarchical levels defined as “SMT”, “processor core” (or “core”), and “physical package” (or “package”). This allows each hierarchical level to be mapped to a sub-field (a sub ID) within the APIC ID.


Conceptually, a topology enumeration algorithm is simply to extract the sub ID corresponding to a given hierarchical level from the APIC ID, based on deriving two parameters that defines the subset of bits within an APIC ID. The relevant parameters are: (a) the width of a mask that can be used to mask off unneeded bits in the APIC ID, (b) an offset relative to bit 0 of the APIC ID.

The “SMT” level corresponds to the innermost constituent of the processor topology. So it is located in the least significant portion of the APIC ID. If the corresponding width for “SMT” is 0, it implies there is only 1 logical processor within the next outer level of the hierarchy. For example, Intel® CoreTM2 Duo Processors generally produce a “SMT_Mask_Width” of 0. If the corresponding width is 1 bit wide, there could be two logical processors within the next outer level of the hierarchy.

If the corresponding width for “core” is 0, it implies there is only 1 processor core within a physical processor. If the corresponding width for “core” is 1 bit wide, there could be two proce ssor cores within a physical processor.

Note, the values of APIC ID that are assigned across all logical processor s in the system need not be contiguous. But the subsets of bit fields corresponding to three hierarchical levels are contiguous at bit boundary. Due to this requirement, the bit offset of the mask to extract a given Sub ID can be derived from the “mask width” of the inner hierarchical levels.

Unique APIC ID in a Multi-Processor System

Although legacy IA-32 multi-processor systems assigns unique APIC IDs for each logical processors in the system, the programming interfaces have evolved several times in the past. For Intel Pentium Pro processors and Pentium III Xeon processors, APIC IDs are accessible only from local APIC registers (Local APIC registers use memory-mapped IO interfaces and are managed by OS). In the first generation of Intel Pentium 4 and Intel Xeon processor processors (2000, 2001), CPUID instruction provided information on the initial APIC ID that is assigned during processor reset. The CPUID instruction in the first generation of Intel Xeon MP processor and Intel Pentium 4 processor supporting Hyper-Threading Technology (2002) provided additional information that allows software to decompose initial APIC IDs into a two-level topology enumeration. With the introduction of dual-core Intel 64 processors in 2005, system topology enumeration using CPUID evolved into a three-level algorithm on the 8-bit wide initial APIC IDs. Future Intel 64 platforms may be capable of supporting a large number of logical processors that exceed the capacity of the 8-bit initial APIC ID field. The x2APIC extension in Intel 64 architecture defines a 32-bit x2APIC ID, the CPUID instruction in future Intel 64 processors will allow software to enumerate system topology using x2APIC IDs. The extended topology enumeration leaf of CPUID (leaf 11) is the preferred interface for system topology enumeration for future Intel 64 processor.

The CPUID instruction in future Intel 64 processors may support leaf 11 independent of x2APIC hardware. For many future Intel 64 platforms, system topology enumeration may be performed using either CPUID leaf 11 or legacy initial APIC ID (via CPUID leaf 1 and leaf 4). Figure 1 shows an example of how software can choose which CPUID leaf information to use for system topology enumeration.

Figure 1 Example of Choosing CPUID Leaf Information for System Topology Enumeration


The maximum value of supported CPUID leaf can be determined by setting EAX = 0, execute CPUID and examine the returned value in EAX, i.e. CPUID.0:EAX. If CPUID.0:EAX >= 11, software can determine whether CPUID leaf 11 exists by setting EAX=11, ECX=0, execute CPUID to examine the non-zero value returned in EBX, i.e. CPUID. (EAX=11, ECX=0):EBX != 0.

Fully functional hardware multithreading requires full-reporting of CPUID leaves.

If software observes that CPUID.0:EAX < 4 on a newer Intel 64 or IA-32 processor (newer than 2004), it should examine the MSR IA32_MISC_ENABLES[bit 22].

If IA32_MISC_ENABLES[bit 22] was set to 1 (by BIOS or other means), the user can restore CPUID leaf function full reporting by having IA32_MISC_ENABLES[bit 22] set to ‘0’ (Modify BIOS CMO S setting or use WRMSR).

For older IA-32 processors that support only two-level topology, the three-level system topology enumeration algorithm (using CPUID leaf 1 and leaf 4) is fully compatible with older processors supporting two-level topology (SMT and physical package). For processors that report CPUID.1:EBX[23:16] as reserved (i.e. 0), the processor supports only one level of topology.

Table A-1 shows a code example of dealing with CPUID leaf functions across three categories of processor hardware.

System Topology Enumeration Using CPUID Extended Topology Leaf

The algorithm of system topology enumeration can be summarized as three phase of operation:

  • Derive “mask width” constants that will be used to extract each Sub IDs.
  • Gather the unique APIC IDs of each logical processor in the system, and extract/decompose each APIC ID into three sets of Sub IDs.
  • Analyze the relationship of hierarchical Sub IDs to establish mapping tables between OS’s thread management services according to three hierarchical levels of processor topology.


Table A-2 shows an example of the basic structure of the three phases of system wide topology as applied to processor topology and cache topology.

Figure 2 outlines the procedures of querying CPUID leaf 11 for the x2APIC ID and extracting sub IDs corresponding to the “SMT”, “Core”, “physical package” levels of the hierarchy.

Figure 2 Procedures to Extract Sub IDs from the x2APIC ID of Each Logical Processor

Table A-3 lists a data structure that holds the APIC ID, various sub IDs, and a hierarchical set of ordinal numbering scheme to enumerate each entity in the processor topology and/or cache topology of a system.

System topology enumeration at the application level using CPUID involves executing CPUID instruction on each logical processor in the system. This implies context switching using services provided by an OS. On-demand context switching by user code generally relies on a thread affinity management API provided by the OS. The capability and limitation of thread affinity API by different OS vary. For example, in some OS, the thread affinity API has a limit of 32 or 64 logical processors. It is expected that enhancement to thread affinity API to manage larger number of logical processor will be available in future versions.

System Topology Enumeration Using CPUID Leaf 1 and Leaf 4

Figure 3 outlines the procedures of querying initial APIC ID via CPUID leaf 1 and extracting sub IDs corresponding to the SMT, Core, physical package levels of the hierarchy using CPUID leaf 1 and leaf 4. Extraction of sub ID from initial APIC ID is based on querying CPUID leaf 1 and 4 to derive the bit widths of three select masks (SMT mask, Core mask, Pkg mask) that make up the 8-bit initial APIC ID field. The select masks allow software to extract the sub IDs corresponding to “SMT”, “core”, “package” from the initial APIC ID of each logical processor.

Figure 3 Procedures to Extract Sub IDs from the Initial APIC ID of Each Logical Processor

Table A-4 shows an example of querying the APID ID for each logical processor in the system and parsing each APIC ID into respective sub IDs for later analysis of the topological makeup.

Table A-5 lists support routines to extract various sub IDs from each APIC ID of the logical processor that we have bound the current execution context.

Table A-6 shows OS-specific wrapper functions.

Cache Topology Enumeration

The physical package of an Intel 64 processor has a hierarchy of cache. A given level of the cache hierarchy may be shared by one or more logical processors. Some software may wish to optimize performance by taking advantage of the shared cache of a particular level of the cache hierarchy. Performance tuning using cache topology can be accomplished by combining the system topology information with the addition of cache topology information. Figure 4 outlines the procedures of decomposition of sub IDs to enumerate logical processors sharing the target cache level and enumerating the target level caches visible in the system. The Cache_ID can be extracted from the x2APIC ID for processors that reports 32-bit x2APIC ID or from the initial APIC ID for processors that do not report x2APIC ID. The array of “Cache_ID” can be used to enumerate different caches in conjunction with other sub ID derived from the processor topology to implement code-tuning techniques.

Figure 4 Procedures to Extract Cache_ID of the Target Cache Level of Each Logical Processor

The three-level sub IDs, SMT_ID[k], Core_ID[k], Pkg_ID[k], k = 0, .., N-1 can be used by software in a number of application-specific ways. Some of the more common usages include:

  • Determine the number of physical processors to implement a per-package licensing policy. Each unique value in the Pkg_ID[] array represents a physical processor.
  • A thread-binding strategy may choose to favor binding each new task to a separate core in the system. This may require the software to know the relationships between the affinity mask of each logical processor relative to each distinct processor core.
  • An MP-scaling optimization strategy may wish to partition its data working set according to the size of the large last-level cache and allow multiple threads to process the data tile residing in each last level cache. This will require software to manage the affinity masks and thread binding relative to each Cache_ID and APIC ID in the system.


Data Processing of Sub IDs of the Topology

Each hierarchy of the sub IDs represents a subset of the APIC ID (either x2APIC ID or initial APIC ID). It allows software to address each distinct entity within the parent hierarchical level. For processor topology enumeration:

  • SMT_ID: each unique SMT_ID allows software to distinguish different logical processors within a processor core,
  • Core_ID: each unique Core_ID allows software to distinguish different processor cores within a physical package,
  • Pkg_ID: each unique Pkg_ID allows software to distinguish different physical packages in a multi-processor system.

For cache topology enumeration:

  • CacheSMT_ID: each unique CacheSMT_ID allows software to distinguish different logical processors sharing the same target cache level.
  • Cache_ID: each unique Cache_ID allows software to distinguish different target level cache in the system.


The extraction of sub IDs from the APIC ID makes use of constant parameters that are derived from CPUID instruction. From platform hardware perspective, Intel 64 and IA-32 multi-processor system require each physical processor support the same hardware multi-threading capabilities. Therefore, system topology enumeration can execute the relevant CPUID leaf functions on one logical processor to derive system-wide sub-ID extraction parameters. But the APIC IDs must be queried by executing CPUID instruction on each logical processor in the system.

Sub ID Extraction Parameters for x2APIC ID

Extraction of sub ID from x2APIC ID is based on querying the value of CPUID.(EAX=11, ECX=n):EAX[4:0] for a valid sub leaf index n to obtain the bit width parameter to derive an extraction mask while x2APIC ID is queried by CPUID.(EAX=1,ECX=0):EDX[31:0]. The extraction mask allows software to extract a subset of bits from the x2APIC ID as a sub ID for the respective level of the hierarchy. In order to enumerate the sub IDs, increase sub leaf index (ECX=n) by 1 until CPUID.(ECX=11,ECX=n).EBX[15:0] == 0

  • SMT_ID:CPUID.(EAX=11, ECX=0):EAX[4:0] provides the width parameter to derive a SMT select mask to extract the SMT_IDs of logical processors within the same processor core. The sub leaf index (ECX=0) is architecturally defined and associated with the “SMT” level type (CPUID.(EAX=11, ECX=0):ECX[15:8] == 1)
    • SMT_Mask_Width = CPUID.(EAX=11, ECX=0):EAX[4:0] if CPUID.(EAX=11, ECX=0):ECX[15:8] is 1
    • SMT_Select_Mask = ~((-1) << SMT_Mask_Width )
    • SMT_ID = x2APIC_ID & SMT_Select_Mask
  • Core_ID:The level type associated with the sub leaf index (ECX=1) may vary across processors with different hardware multithreading capabilities. If CPUID.(EAX=11, ECX=1):ECX[15:8] is 2, it is associated with “processor core” level type. Then, CPUID.(EAX=11,ECX=1):EAX[4:0] provides the width parameter to derive a select mask of all logical processors within the same physical package. The “processor core” includes “SMT” in this case, and enumerating different cores in the package can be done by zeroing out the SMT portion of the inclusive mask derived from this.
    • CorePlus_Mask_Width = CPUID.(EAX=11,ECX=1):EAX[4:0] if CPUID.(EAX=11, ECX=1):ECX[15:8] is 2
    • CoreOnly_Select_Mask = (~((-1) << CorePlus_Mask_Width ) ) ^ SMT_Select_Mask.
    • Core_ID = (x2APIC_ID & CoreOnly_Select_Mask) >> SMT_Mask_Width
  • Pkg_ID:Within a coherent domain of the three-level topology, the upper bits of the APIC_ID (except the lower “CorePlus_Mask_Width” bits) can enumerate different physical packages in the system. In a clustered installation, software may need to consult vendor specific documentation to distinguish the topology of how many physical packages are organized within a given node.
    • Pkg_Select_Mask = (-1) < ;< CorePlus_Mask_Width
    • Pkg_ID = (x2APIC_ID & Pkg_Select_Mask) >> CorePlus_Mask_Width


An example of deriving the extraction parameters for x2APIC ID can be found in the support function “CPUTopologyLeafBConstants()” in the Appendix.

Table A-7 lists the support function to derive bitmask extraction parameters from CPUID leaf 0BH to extract sub IDs from x2APIC ID.

Sub ID Extraction Parameters for Initial APIC ID

Topological sub ID extraction from an INITIAL_APIC_ID (CPUID.1:EBX[31:24]) uses parameters derived from CPUID.1:EBX[23:16] and CPUID.(EAX=04H, ECX=0):EAX[31:26]. CPUID.1:EBX[23:16] represents the maximum number of addressable IDs (initial APIC ID) that can be assigned to logical processors in a physical package. The value may not be the same as the number of logical processors that are present in the hardware of a physical package. The value of (1 + (CPUID.(EAX=4, ECX=0):EAX[31:26] )) represents the maximum number of addressable IDs (Core_ID) that can be used to enumerate different processor cores in a physical package. The value also can be different than the actual number of processor cores that are present in the hardware of a physical package.

  • SMT_ID:The equivalent “SMT_Mask_Width” can be derived from dividing maximum number of addressable initial APIC IDs by maximum number of addressable Core IDs
    • SMT_Mask_Width = Log2[1]( RoundToNearestPof2(CPUID.1:EBX[23:16]) / ((CPUID.(EAX=4, ECX=0):EAX[31:26] ) + 1)), where Log2 is the logarithmic based on 2 and RoundToNearestPof2() operation is to round the input integer to the nearest power-of-two integer that is not less than the input value.
    • SMT_Select_Mask = ~((-1) << SMT_Mask_Width )
    • SMT_ID = INITIAL_APIC_ID & SMT_Select_Mask
  • Core_ID:The value of (1 + (CPUID.(EAX=04H, ECX=0):EAX[31:26] )) can also be use to derive an equivalent “CoreOnly_Mask_Width”.
    • CoreOnly_Mask_Width = Log2(1 + (CPUID.(EAX=4, ECX=0):EAX[31:26] ))
    • CoreOnly_Select_Mask = (~((-1) << (CoreOnly_Mask_Width + SMT_Mask_Width) ) ) ^ SMT_Select_Mask.
    • Core_ID = (INITIAL_APIC_ID & CoreOnly_Select_Mask) >> SMT_Mask_Width
  • Pkg_ID:Pkg_Select_Mask can be derived as follows;
    • CorePlus_Mask_Width = CoreOnly_Mask_Width + SMT_Mask_Width
    • Pkg_Select_Mask = ((-1) << CorePlus_Mask_Width)
    • Pkg_ID = (INITIAL_APIC_ID & Pkg_Select_Mask) >> CorePlus_Mask_Width


Table A-8 lists the support function to derive bitmask extraction parameters from CPUID leaf 01H and Leaf 04H to extract sub IDs from initial APIC ID. Table A-9 shows the support function to derive mask widths from the system-wide extraction parameters.

Cache ID Extraction Parameters

Cache IDs are specific to a target level cache of the cache hierarchy. Software must determine, a priori, the target cache level (the sub-level index n associated with CPUID leaf 4) it wishes to optimize with respect to the processor topology. After it has chosen the sub -leaf index ECX=n, then Log2(RoundToNearestPof2( (1 + CPUID.(EAX=4, ECX=n):EAX[25:14])) is the equivalent “Cache_Mask_Width” parameter. The “Cache_Mask_Width” parameter forms the basis to construct either a select mask to extract the sub IDs of logical processors sharing the target cache level, or a complementary mask to select the upper bits from APID to identify different cache entities of the specified target level in the system. To construct a mask to extract sub IDs of different logical processors sharing a cache, it is simply ~((-1) << Cache_Mask_Width ).

The derivation of bitmask extraction parameters for cache topology is analogous to those shown in Table A-8. Software may choose to focus on one specific cache level in the cache hierarchy. In the companion full source code package that is released separately with this paper, the reader can find code examples of derivation for the bitmask extraction parameters for each cache level, and corresponding cache topology sorting examples. For space consideration, the full source of the cache topology code is not listed in the Appendix.

Analyzing Topology Enumeration Result and Customization

How can software make use of topological information (in the form of hierarchical sub IDs)? This really depends on the needs and situations specific to each application. It may need adaptation due to differences of APIs provided by different OS. For the purpose of illustration, we consider some examples of using sub IDs to establish manage affinity masks hierarchically.

The knowledge of sub IDs of each topological hierarchy may be useful in several ways, For example:

  • Count the number of entities in a given hierarchical level across the system;
  • Use OS threading management services (e.g. affinity masks) while adding topological insights (per-core, per-package, per-target-level-cache) to optimize application performance.


Affinity mask is a data structure that is defined within a specific OS, different OS may use the same concept but providing different means of application programming interface. For example, Microsoft Windows* provides affinity mask as a data type that can be directly manipulated via bit field by applications for affinity control. Linux implements a similar data structure internally but abstracts it so application can manipulate affinity through an iterative interface that assigned zero-based numbers to each logical processor.

The affinity mask or the equivalent numbering scheme provided by OS does not carry attributes that can store hierarchical attributes of the system topology. We will use the “affinity mask” terminology generically in this section (as the technique can be easily generalized to the numbered interface of affinity control).

In the reference code example, we use the sub IDs to create an ordinal numbering scheme (zero-based) for each hierarchical level. Different entities in the system topology (packages, cores) can be referenced by applications using a set of hierarchical ordinal numbers. Using the hierarchical ordinal number scheme and a look-up table to the corresponding affinity masks, software can easily control thread binding, optimize cache usage, etc.

Figure 4 depicts a basic example of data processing of Pkg_IDs and Core_IDs in the system to acquire information on the number of so ftware visible physical packages, processor cores in the system. This basic technique can also be adapted to acquire affinity mappings, hierarchical breakdowns, and asymmetry information in the system.

Table A-10 part a, b, and c lists an algorithm to analyze the sub IDs of all logical processor in the system and derive a triplet of zero-based numbering scheme to index unique entities within each topological level.

Table A-11 lists a data structure that organizes miscellaneous global variable, arrays, workspace items that are used throughout the rest of the code example. The full set of source code is provided in a separate package. The full source code can be compiled under 32-bit and 64-bit Windows and Linux operating systems. A limited set of OS and compiler tools have been tested.

Dynamic Software Visibility of Topology Enumeration

When application software examines/uses topology information, it must keep in mind the dynamic nature of software visibility. The hardware capability present at the platform level may be presented differently through BIOS setting, through OS boot option, through OS-supported user interfaces. For example, Intel 64 and IA-32 multi-processor system require each physical processor support the same hardware multi-threading capabilities. This hardware symmetry that exists at the platform hardware level may be presented differently at application level. System topology enumeration can uncover dynamic software visible asymmetry irrespective of the cause of such asymmetry may be caused by BIOS setting, OS boot option, or UI configurations.

The appendix lists the bulk of the supporting functions that are used in enumerating processor topology of the system as visible to the current software process. A complete source code is provided separately for download. The reference code can be compiled in either 32-bit or 64-bit Windows* environment. In 64-bit environment, the cpuid64.asm file is needed to provide an enhanced intrinsic function for querying CPUID sub-leaves. An equivalent reference code implementation for 32-bit and 64-bit Linux environment will also be available.


Physical Processor: The physical package of a microprocessor capable of executing one or more threads of software at the same time. Each physical package plugs into a physical socket. Each physical package may contain one or more processor cores, also referred to as a physical package.

Processor Core: The circuitry that provides dedicated functionalities to decode, execute instructions, and transfer data between certain sub-systems in a physical package. A processor core may contain one or more logical processors.

Logical Processor: The basic modularity of processor hardware resource that allow software executive (OS) to dispatch task or execute a thread context. Each logical processor can execute only one thread context at a time.

Hyper-Threading Technology: A feature within the IA-32 family of processors, where each processor core provides the functionality of more than one logical processor.

SMT: Abbreviated name for Simultaneous Multi-Threading. An efficient means in silicon to provide the f unctionalities of multiple logical processors within the same processor core by sharing execution resources and cache hierarchy between logical processors.

Multi-core Processor: A physical processor that contains more than one processor cores.

Multi-processor Platform: A computer system made of two or more physical sockets.

Hardware Multi-threading: Refers to any combination of hardware support to allow a system to run multi-threaded software. The forms of hardware support for multi-threading are: SMT, multi-core, and multi-processor.

Processor Topology: Hierarchical relationships processor entities (logical processors, processor cores) within a physical package relative to the sharing hierarchy of hardware resources within the physical processor.

Cache Hierarchy: Physical arrangement of cache levels that buffers data transport between a processor entity and the physical memory subsystem.

Cache Topology: Hierarchical relationships of a cache level relative to the logical processors in a physical processor.


Table A-1 Determination of System-wide CPU Topology Constant


// Derive parameters used to extract/decompose APIC ID for CPU topology
// The algorithm assumes CPUID feature symmetry across all physical packages.
// Since CPUID reporting by each logical processor in a physical package are
// identical, we only execute CPUID on one logical processor to derive these
// system-wide parameters
// return 0 if successful, non-zero if error occurred
static int CPUTopologyParams()
DWORD maxCPUID; // highest CPUID leaf index this processor supports
CPUIDinfo info; // data structure to store register data reported by CPUID
_CPUID(&info, 0, 0);
maxCPUID = info.EAX;
// cpuid leaf B detection
if (maxCPUID >= 0xB)
_CPUID(&CPUInfoB,0xB, 0);
//glbl_ptr points t
o assortment of global data, workspace, etc
glbl_ptr->hasLeafB = (CPUInfoB.EBX != 0);
_CPUID(&info, 1, 0);
// Use HWMT feature flag CPUID.01:EDX[28] to treat three configurations:
if (getBitsFromDWORD(info.EDX,28,28))
// #1, Processors that support CPUID leaf 0BH
if (glbl_ptr->hasLeafB)
// use CPUID leaf B to derive extraction parameters
//#2, Processors that support legacy parameters
// using CPUID leaf 1 and leaf 4
CPUTopologyLegacyConstants(&info, maxCPUID);
//#3, Prior to HT, there is only one logical
//processor in a physical package

glbl_ptr->CoreSelectMask = 0;
glbl_ptr->SMTMaskWidth = 0;
glbl_ptr->PkgSelectMask = (-1);
glbl_ptr->PkgSelectMaskShift = 0;
glbl_ptr->SMTSelectMask = 0;

if( glbl_ptr->error)return -1;
else return 0;


Table A-2 Modular Structure of Deriving System Topology Enumeration Information


* BuildSystemTopologyTables
* Construct the processor topology tables and values necessary to
* support the external functions that display CPU topology and/or
* cache topology derived from system topology enumeration.
* Arguments: None
* Return: None, sets glbl_ptr->error if tables or values can not be calculated.
static void
unsigned lcl_OSProcessorCount, subleaf;
int numMappings = 0;
// call OS-specific service to find out how many logical processors
// are supported by the OS
glbl_ptr->OSProcessorCount = lcl_OSProcessorCount = GetMaxCPUSupportedByOS();

// allocated the memory buffers within the global pointer

// Gather all the system-wide constant parameters needed to
// derive topology information
if (CPUTopologyParams() ) return ;

if (CacheTopologyParams() ) return ;

// For each logical processor, collect APIC ID and
// parse sub IDs for each APIC ID
numMappings = QueryParseSubIDs();
if ( numMappings < 0 ) return ;
// Derived separate numbering schemes for each level of the cpu topology
if( AnalyzeCPUHierarchy(numMappings) < 0 ) {
// an example of building cache topology info for each cache level
if(glbl_ptr->maxCacheSubleaf != -1) {
for(subleaf=0; subleaf <= glbl_ptr->maxCacheSubleaf; subleaf++) {
if( glbl_ptr->EachCacheMaskWidth[subleaf] != 0xffffffff) {
// ensure there is at least one core in the target level cache
if (AnalyzeEachCHierarchy(subleaf, numMappings) < 0) {



Table A-3 Data Structure of APIC ID, Sub IDs, and Mapping of Ordinal Based Numbering Schemes

typedef struct {
unsigned int32 APICID;
// the full x2APIC ID or initial APIC ID of a logical
//processor assigned by HW
unsigned __int32 OrdIndexOAMsk;
// An ordinal index (zero-based) for each logical
// processor in the system, 1:1 with "APICID"
// Next three members are the sub IDs for processor topology enumeration
unsigned __int32 pkg_IDAPIC;
// Pkg_ID field, subset of APICID bits
// to distinguish different packages
unsigned __int32 Core_IDAPIC;
// Core_ID field, subset of APICID bits to
// distinguish different cores in a package
unsigned __int32 SMT_IDAPIC;

// SMT_ID field, subset of APICID bits to
// distinguish different logical processors in a core
// the next three members stores a numbering scheme of ordinal index
// for each level of the processor topology.
unsigned __int32 packageORD;
// a zero-based numbering scheme for each physical
// package in the system
unsigned __int32 coreORD;
// a zero-based numbering scheme for each core in the
// same package
unsigned __int32 threadORD;
// a zero-based numbering scheme for each thread in
// the same core
// Next two members are the sub IDs for cache topology enumeration
unsigned __int32 EaCacheSMTIDAPIC[MAX_CACHE_SUBLEAFS];
// SMT_ID field, subset of
// APICID bits to distinguish different logical processors
// sharing the same cache level
unsigned __int32 EaCacheIDAPIC[MAX_CACHE_SUBLEAFS];
// sub ID to enumerate
// different cache entities of the cache level corresponding
// to the array index/cpuid leaf 4 subleaf index
// the next three members stores a numbering scheme of ordinal index
// for enumerating different cache entities of a cache level, and enumerating
// logical processors sharing the same cache entity.
unsigned __int32 EachCacheORD[MAX_CACHE_SUBLEAFS];
// a zero-based numbering
// scheme for each cache entity of the specified cache level in the system
unsigned __int32 threadPerEaCacheORD[MAX_CACHE_SUBLEAFS];
// a zero-based
// numbering scheme for each logical processor sharing the same cache of the
// specified cache level

} IdAffMskOrdMapping;

/* Alternate technique for ring 3 code to infer the effect of CMOS setting in BIOS
* that restricted CPUID instruction to report highest leaf index
is 2, i.e.
* MSR IA32_MISC_ENABLES[22] was set to 1; This situation
* will prevent software from using CPUID to conduct topology enumeration
* RDMSR instruction is privileged, this alternate routine can run in ring 3.
Int InferBIOSCPUIDLimitSetting()
{ DWORD maxleaf, max8xleaf;
CPUIDinfo info; // data structure to store register data reported by CPUID
// check CPUID leaf reporting capability is intact
CPUID(&info, 0);
maxleaf = info.EAX;
CPUID(&info, 0x80000000);
max8xleaf = info.EAX;
// Earlier Pentium 4 and Intel Xeon processor (prior to 90nm Intel Pentium 4
// processor)support extended with max extended leaf index 0x80000004,
// 90nm Intel Pentium 4 processor and later processors supports higher extended
// leaf index greater than 0x80000004.
If ( maxleaf <= 4 && max8xleaf > 0x80000004) return 1;
else return 0;


Table A-4 Query APIC ID and Parsing APIC ID into Sub IDs

* QueryParseSubIDs
* Use OS specific service to find out how many logical processors can be accessed
* by this application.
* Querying CPUID on each logical processor requires using OS-specific API to
* bind current context to each logical processor first.
* After gathering the APIC ID's for each logical processor,
* we can parse APIC ID into sub IDs for each topological levels
* The thread affnity API to bind the current context limits us
* in dealing with the limit of specific OS
* The loop to iterate each logical processor managed by the OS can be done
* in a manner that abstract the OS-specific affinity mask data structure.
* Here, we construct a generic affinity mask that can handle arbitrary number
* of logical processors.
* Return: 0 is no error
long QueryParseSubIDs(void)
{ unsigned i;
//DWORD_PTR processAffinity;
//DWORD_PTR systemAffinity;
unsigned long numMappings = 0, lcl_OSProcessorCount;
unsigned long APICID;
// we already queried OS how many logical processor it sees.
lcl_OSProcessorCount = glbl_ptr->OSProcessorCount;
// we will use our generic affinity bitmap that can be generalized from
// OS specific affinity mask constructs or the bitmap representation of an OS
AllocateGenericAffinityMask(&glbl_ptr->cpuid_values_processAffinity, lcl_OSProcessorCount);
AllocateGenericAffinityMask(&glbl_ptr->cpuid_values_systemAffinity, lcl_OSProcessorCount);
// Set the affinity bits of our generic affinity bitmap according to
// the system affinity mask and process affinity mask
if (glbl_ptr->error) return -1;

for (i=0; i < glbl_ptr->OSProcessorCount;i++) {
// can't asume OS affinity bit mask is contiguous,
// but we are using our generic bitmap representation for affinity
if(TestGenericAffinityBit(&glbl_ptr->cpuid_values_processAffinity, i) == 1) {
// bind the execution context to the ith logical processor
// using OS-specifi API

if( BindContext(i, glbl_ptr->cpuid_values_OSProcessorCount) ) {
glbl_ptr->error |= _MSGTYP_UNKNOWNERR_OS;
// now the execution context is on the i'th cpu, call the parsing routine
ParseIDS4EachThread(i, numMappings);
glbl_ptr->EnumeratedThreadCount = numMappings;
if( glbl_ptr->error)return -1;
else return numMappings;

Table A-5 Support Routine for Parsing APIC ID into Sub IDs

* ParseIDS4EachThread
* after execution context has already bound to the target logical processor
* Query the 32-bit x2APIC ID if the processor supports it, or
* Query the 8-bit initial APIC ID for older processors. Apply various
* system-wide topology constant to parse the APIC ID into various sub IDs
* Arguments:
* i : the ordinal index to reference a logical processo
* r in the system
* numMappings : running count ot how many processors we've parsed
* Return: 0 is no error
unsigned ParseIDS4EachThread(unsigned i, unsigned numMappings)
{ unsigned APICID;
unsigned subleaf;

APICID = glbl_ptr->PApicAffOrdMapping[numMappings].APICID = GetApicID(i);
glbl_ptr->PApicAffOrdMapping[numMappings].OrdIndexOAMsk = i;
// this an ordinal number that can relate to generic affinitymask
glbl_ptr->PApicAffOrdMapping[numMappings].pkg_IDAPIC = ((APICID & glbl_ptr->PkgSelectMask)
>> glbl_ptr->PkgSelectMaskShift);
glbl_ptr->PApicAffOrdMapping[numMappings].Core_IDAPIC = ((APICID & glbl_ptr->CoreSelectMask)
>> glbl_ptr->SMTMaskWidth);
glbl_ptr->PApicAffOrdMapping[numMappings].SMT_IDAPIC = (APICID & glbl_ptr->SMTSelectMask);
if(glbl_ptr->maxCacheSubleaf != -1) {
for(subleaf=0; subleaf <= glbl_ptr->maxCacheSubleaf; subleaf++) {
= (APICID & glbl_ptr->EachCacheSelectMask[subleaf]);
For more complete information about compiler optimizations, see our Optimization Notice.
There are downloads available under the BSD 3-clause License license. Download Now


Edwin X. (Intel)'s picture

Hi Shih Kuo,

Thanks for the article & sample code.

The attached source code compiles fine for X86 platform. But getting error while compiling in X64 platform for file : get_cpuid.asm.


1>get_cpuid.asm(82): error A2070: invalid instruction operands

1>get_cpuid.asm(83): error A2070: invalid instruction operands

1>get_cpuid.asm(90): error A2070: invalid instruction operands

1>get_cpuid.asm(91): error A2070: invalid instruction operands


I have Compiled with Visual Studio 2015. We want to compile for X64 platform , Pls help to resolve the issue.

Code Snippet from file for reference & corresponding error lines marked:


PUBLIC _get_cpuid_info

; Function compile flags: /Ogtpy


_get_cpuid_info PROC

mov edx, DWORD PTR 8[esp-4] ; addr of start of output array

mov eax, DWORD PTR 12[esp-4] ; leaf

mov ecx, DWORD PTR 16[esp-4] ; subleaf

push edi   ; <----------Error Here

push ebx  ; <----------Error Here

mov edi, edx ; edi has output addr


mov DWORD PTR [edi], eax

mov DWORD PTR [edi+4], ebx

mov DWORD PTR [edi+8], ecx

mov DWORD PTR [edi+12], edx

pop ebx ;<----------Error Here

pop edi ;<----------Error Here


_get_cpuid_info ENDP




CyrIng's picture

Fyi in CoreFreq Linux driver (thus ring 0 executed) @ , function Map_Extended_Topology() , is provided a Core / SMT/ (single package) topology for 64-bits Core 2 and superior processors.

Geof S.'s picture

It seems there's a typo / error here:  "to obtain the bit width parameter to derive an extraction mask while x2APIC ID is queried by CPUID.(EAX=1,ECX=0):EDX[31:0]."

That should be EAX=[11|BH], ya?

This threw me off for a while

Shih Kuo (Intel)'s picture

An updated sample code package is released and available for download. The update primarily addresses the shortcoming that 5 years ago, I did not anticipate processors might report an L4 in CPUID leaf 4.

CyrIng's picture


I have solved the issue in this new code x2topology_np.c

To be short, a loop should create a function thread per core. In this function, the affinity will be changed prior a call to CPUID.0BH

It means that changing the affinity in the main thread makes that CPUID returns no value.


CyrIng's picture

Hello Shih Kuo

First of all, thank you for your white paper.

So far, I have succesfully built an APIC list in my C code inside a Shell loop where the CPU affinity is handled by the taskset command.

Going further, I want to program an APIC array using Linux API (pthread_setaffinity_np or sched_setaffinity) instead of taskset :

#define	LEVEL_THREAD	1
#define	LEVEL_CORE	2

typedef	struct
		OS_ID    : 32-0,
		APIC_ID	 : 32-0,
		Core_ID	 : 32-0,
		Thread_ID: 32-0;

int main(int argc, char *argv[]) {
	FEATURES Features; // see previous code for definition & usage.

	int	cpu=0, InputLevel=0, NoMoreLevels=0,
		SMT_Mask_Width=0, SMT_Select_Mask=0,
		CorePlus_Mask_Width=0, CoreOnly_Select_Mask=0, rc=0;

	Topology=calloc(Features.ThreadCount, sizeof(TOPOLOGY));
	for(cpu=0; cpu < Features.ThreadCount; cpu++)
		cpu_set_t cpuset;
		pthread_t thisTID=pthread_self();
//		pid_t thisPID=getpid();
		CPU_SET(Topology[cpu].OS_ID, &cpuset);
		if((rc=pthread_setaffinity_np(thisTID, sizeof(cpu_set_t), &cpuset)) == 0)
//		if((rc=sched_setaffinity(thisPID, sizeof(cpu_set_t), &cpuset)) == 0)
				__asm__ volatile
					"movq	$0xb, %%rax;"
					: "=a"	(Features.ExtTopology.EAX),
					  "=b"	(Features.ExtTopology.EBX),
					  "=c"	(Features.ExtTopology.ECX),
					  "=d"	(Features.ExtTopology.EDX)
					: "c"	(InputLevel)
				if(!Features.ExtTopology.EAX.Register && !Features.ExtTopology.EBX.Register)
							case LEVEL_THREAD:
							case LEVEL_CORE:
						case LEVEL_THREAD:
							SMT_Mask_Width = Features.ExtTopology.EAX.SHRbits;
							SMT_Select_Mask= ~((-1) << SMT_Mask_Width );
							Topology[cpu].Thread_ID=Features.ExtTopology.EDX.x2APIC_ID & SMT_Select_Mask;
						case LEVEL_CORE:
							CorePlus_Mask_Width = Features.ExtTopology.EAX.SHRbits;
							CoreOnly_Select_Mask = (~((-1) << CorePlus_Mask_Width ) ) ^ SMT_Select_Mask;
							Topology[cpu].Core_ID=(Features.ExtTopology.EDX.x2APIC_ID & CoreOnly_Select_Mask) >> SMT_Mask_Width;

My issue is that the first iteration of the loop stops with EAX and EBX registers equal to zero right after the CPUID call.

Whereas this does not happen when using taskset, may you please tell me what's wrong in my code ?

Best Regards,


Shih Kuo (Intel)'s picture


Just to reset expectations of where the reference code would be useful...

In situations where a user process wishes its thread contexts are scheduled with specific knowledge of the topology on SMT or cache hierarchy, the reference code can provide information on that native hardware that are under OS control.

The reference code assumes the OS is running natively (not under Virtulization), the hardware configuration is static between boot and shutdown, process control attribute for process that executes the reference code is unconstrained.

While the API differences between Windows and Linux are taken care of in the reference doe, the code did not try to address ancillary environment-specific issues. For example, different compilers use different default settings (e.g. some treats all warnings as compile time error), C runtime libary may have slight differences between different environments, etc.. Some degree of customization for the specific environment/tool as needed is expected for programmers who adapt from this code.

I think your interest in cache hits/misses is a subject of an entirel different domain.

Typically, attempts to characterize cache hits/misses are associated with some target workload that runs for some macroscopically meaningful duration, so that bulk cache hits/misses can be monitored using performance monitoring hardware without the monitoring tool disturbing/distorting the characteristics of the workload. Intel's Vtune and various utilities are available under Linux for programmers.

In such tools, the dynamics of cache hits/misses of the workload are sampled in granularity much larger than individual cache hit/miss.


sol s.'s picture

On executing the following command :
i get the error:
cpu_topo.c: In function ‘getPkgCoreThrdStr’:
cpu_topo.c:1986:4: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 4 has type ‘long long int’ [-Wformat]
I am running the code on the Intel Core i3-350M machine, so is there a way to get individual "cache id" or to get individual cache hits by the individual physical cores?? i.e for eg logical cores 0 and 1 share a L2 cache , so the hits/misses for that L2 cache. Thanks in advance.

anonymous's picture

i need intel instructions set with their clock cycles and other criterias of an instruction

constm's picture

nice Figures :)


Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.