These questions were received by Intel Software Network Support, followed by the responses received from our engineering contacts and the authors of the original article at http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration:
Q. Has anyone done testing on the cpucount.exe on a multi-node platform (for example, an IBM x460/x3950-MXE460/x3950) running Microsoft* Windows* 2003 32-bit with SP1?
A. This paper clarifies that cluster installations are not covered, natually, the reference code did not consider multi-node cluster in its scope. The bottom line is that the 3 level topology can be extended for cluster installations, but the cluster vendor needs to architect a 4 level topology scheme that extends from the 3-level scheme and observe the 8-bit size limit of initial apic id infrastructure. In such cases, our reference code will report the lower 3 level of topology, it is up to the user to extend this reference code to report the cluster node id (the 4th level). The cluster vendor may choose other schemes, and our reference code cannot possibly know how of deal with vendor specific schemes.
Q. I would like to know more about determining cluster IDs. All documents, including the SDM, mention cluster but stop short of describing how to identify cluster IDs separate from package IDs. This makes sense since cluster is a system-level issue and not something that the processor u-arch determines. Aside from knowing that a specific chipset, system or node has a clustered topology, can system software discover cluster ID masks in a portable manner, possibly involving a BIOS or ACPI standard?
A. Cluster ID is really a cluster vendor issue. The cluster software stack can choose what kind of protocol/schema that it uses to indentify the entities within its topology. The identity in a cluster obviously need not be constrained by the 8-bit widith of initial APIC ID that cpu microcode negotiate/assigns logical processors within a node. The overall identification scheme in a cluster is really vendor specific. Intels BIOS writer guide provides as a recommendation to allow cluster vendors assign IDs within a cluster in a manner that is compatible to the package_ID, core_ID, SMT_ID are laid out.
For example, if a particular 4-socket SMP node uses 6-bits in the package ID, a cluster vendor can carve out the upper bits to use as cluster ID (or even extend that cluster I D beyond the 8-bit constaints of initial APIC ID). Leaving cluster aside, initial APIC ID is the identification hw uses to uniquely indentify each logical processor in a system. The OS uses some OS constructs (affinity mask bit in Windows*; In Linux it uses 0-based contiguous integer numbers) These OS constructs has its lineage in initial APIC ID on Intel platform, but allows OS to fulfill its hw independence objectives.
Q. Can the CPUID.4 field Logical Processors Sharing a Cache be used to infer the actual identities of the logical processors that share a cache?
A. "Logical Processors Sharing a Cache is used to derive cache sharing topology in a system, similar to enumerating processor topology. See more information about this in the Software Optimization Manuals at http://www.intel.com/products/processor/manuals/.
Q. I would like some supplementary information about the error message "assertion failed: PhysicalNum * MaxLogicalProcPerPhysicalProc() >= ToAvailLogical, file cpucountcode.c, line 171" when trying to use this with certain older processors.
A. As the topology of MP platforms evolves, CPUID instructions are extended to provide new fields to provide data to assist software to enumerate processor topology. The CPUID features needed to detect platform in 2001, 2003, 2005 may not be present in older processors like the Intel Pentium III Xeon processors. Despite our best effort to make the processor enumeration algorithm robust and backward compatibile, it can not be done backward enough to cover the Intel Pentium III Xeon processors. In the white paper, we explained the roles of several key CPUID features: i) CPUID.1:EBX[31:24]:Initial APIC ID, ii) CPUID.1:EBX[23:16]: Maximum # logical processors per physical package, iii) CPUID.4:EAX[31:26] +1: Max # of cores per physical package. If any of these three pieces of data are not present or valid, the algorithm cannot work. Therefore we put in the assert statement to guard against problems such as this. When you attempt to run the code on very old processors, the code behaves as expected because not all of the basic input data needed to enumerate processor topology are present/valid.
Q. I have downloaded the cpucount.cpp and successfully built it on Linux and Windows operating systems. Could you please let me know what precautions to take if I want to convert it to the C programming language?
A. The code uses C++ specific features minimally, primarily in local variable declarations. Therefore,converting to the C programming language should be straig htforward if the developer has a good knowledge of C programming.
Q. I'm using CPUCount in order to detect theCPU on anIntel Core 2 machine. I'm developing an ActiveX that runs in the context of another application that opens other threads except the one that CPUCount is running in. my problem it that with this scenario, GetAPIC_ID returns the same ID. When Im running a standalone executable or my ActiveX is using a process that dosent open additional threads, GetAPIC_ID works fine and returns different IDs. Im encountering this problem only on the Core 2 platform in Duo or Pentium D everything is fine.
The process affinity mask is the same for the process in which the problem isreproduced and the one in which it isn't (the affinity is 3 for both of them). The only difference is the fact that the 'problematic' process has additional active threads when CPUCount code is called and the other process only has the main thread. In addition, the behavior is inconsistent on other IntelCore platforms.
Do you have any workaround or recommended code change for that?
A. Sounds like the issue is between the interaction of OS and process attributes.
CPUCount needs to bind the execution context to each logical processor so it can query APIC_ID for each logical processor (or free to migrate).
Whether you run the CPUcount code as a standalone EXE or as an ActiveX, each process inherits certain attributes from its parent process. One of these attributes is the affinitymask the process is allowed to run; and similarly the affinitymask a child process is allowed to run can be set.
In a normal situation, the OS usually allows user processes to run on any logical processor in the system. When a child process is created, usually the parents affinitymask is inherited by default. This is the underlying assumption that CPUcount requires.
If the other application that creates additional child threads decides that the main thread should be bound to run on a single logical processor (not allowing migration), your injection of the ActiveX into the main thread probably inherited the no-migration restriction; then the assumption of free-to-migrate premise is not valid.
You will need to figure out how to ensure the injection of ActiveX into the other application can fulfill the free-to-migrate/bind to any logical processor for the CPUcount code.
The key is that the execution context of the ActiveX (running cpucount-like code) needs to be freely-migratable on all logical processes in the system. Thereis probably more than one API in the OS that can change process attributes, including those that create a new thread. You can see the symptom that a thread execution context got stuck; what caused it is really in the interaction of application and OS, not the hardware. To our knowledge on affinity masks and task scheduling of the OS, they dont treat the dual-core processors you referred to differently.
The other point is that software can execu te GetProcessAffinityMask and retrieve the affinity mask for the process dynamically, The dynamic aspect means the information is guaranteed valid only at the point you made the inquiry. There are OS APIs and services that can probably change that dynamically as well.
Intel is a registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.
*Other names and brands may be claimed as the property of others.