| Last Modified On : | May 7, 2008 1:06 AM PDT |
Rate |
|
When targeting x64 platforms in Visual Studio .NET* 2005, programmers are no longer able to use inline assembly code as they did for 32-bit code. This forces the programmer to either rely on C/C++ code using intrinsics, or to tediously create a 64-bit MASM (.asm) version of the function. Unfortunately, the VS .Net 2005 implementation of the intrinsic for CPUID (__cpuid) recognizes only input arguments in the register eax, and not the more recently defined inputs in ecx, which are required for queries regarding cache parameters and certain multi-core characteristics. Thus, a 64-bit .asm listing is required for full use of the CPUID instruction.
The following code samples demonstrate how to use the CPUID and RDTSC instructions with VS .Net 2005 for 64-bit (x64) platforms. The CPUID instruction is commonly used to obtain detailed information about the system’s CPU(s), and RDTSC is used to read the CPU’s internal time-stamp counter for timing and performance-measurement purposes. The RDTSC intrinsic (__rdtsc) does work as expected and can be used to replace inline assembly.
To build the 64-bit .asm file, create a custom build step that calls the 64-bit MASM, "ml64.exe", as shown in the screen-shot below. For the 32-bit configuration, the cpuid64.asm file should not be built, so for platform Win32, set General -> Excluded From Build to Yes.
(Click image for larger version)
The header file below (cpuid_32_64.h) creates a single definition of the functions _CPUID and _RDTSC that can be used in both 32-bit and 64-bit builds. For 64-bit builds, _CPUID uses the .asm function cpuid64, and _RDTSC uses the intrinsic __rdtsc. For 32-bit builds, _CPUID uses the inline-assembly function cpuid32, and _RDTSC uses the inline-assembly function _inl_rdtsc32.
There are two examples shown in the C file below (cpuid_32_64.c). The first is GetCoresPerPackage(), which calls _CPUID with eax=4 and ecx=0 in order to read the first set deterministic cache parameters reported by the CPU and extract the field indicating the number of processor cores per processor package. (For example, this function would return 1 for a single-core Intel® Pentium® 4 processor, and 2 for a dual-core Intel® Pentium® D processor.) If the intrinsic __cpuid were used in this function on an x64 platform instead of the cpuid64 function, the input value of ecx would be nondeterministic, and the output would be unreliable. The second example function is timeSomethingExample(), which calls _RDTSC twice and calculates the elapsed timer ticks in the loop. The _CPUID example shows how to use one definition to invoke either 64-bit .asm code or 32-bit inline assembly, and the _RDTSC example shows how to use one definition to invoke either a 64-bit intrinsic or 32-bit inline assembly.
Both the _CPUID and _RDTSC examples show how to create utility functions that are transparently portable from Win32 to x64 platforms in cases where different underlying code is required for each platform. Furthermore, the cpuid64 function provides a workaround for a deficiency in the __cpuid intrinsic, allowing both 32-bit and 64-bit app lications to fully utilize the capability of the CPUID instruction.
Header file (cpuid_32_64.h):
#pragma once |
32/64-bit .c file (cpuid_32_64.c):
#include "windows.h" |
64-bit .asm file (cpuid64.asm):
; call cpuid with args in eax, ecx |
| April 1, 2008 1:00 PM PDT
Eric Palmer |
You cannot detect the number of physical processors without some use of OS APIs. When you issue the CPUID instruction, it runs on the processor on which the OS has scheduled your thread. The enumeration algorithms in the samples below work by forcing the current thread to each of the processors in the system and then reading the APIC-related fields from the CPUID instruction. This requires that the OS provide the total number of logical processors and a way to force the current thread to run on each (set affinity). See http://softwarecommunity.intel.com/articles/eng/2728.htm and/or http://softwarecommunity.intel.com/articles/eng/2728.htm. The first has the most up-to-date detection code. |
| April 1, 2008 1:01 PM PDT
Eric Palmer |
The 2nd link below should have been http://softwarecommunity.intel.com/articles/eng/1855.htm |
| January 12, 2009 3:07 PM PST
Cobra El Diablo |
For CPU count determination use the following snippets, something I found whilst research some asm copy routines for mmx and sse enabled processors. unsigned int cpuCount( void ) { unsigned int count = 1; // Always assume 1. #if defined( LINUX ) count = sysconf( _SC_NPROCESSORS_CONF ); #elif defined( WINDOWS ) SYSTEM_INFO si; GetSystemInfo( &si ); count = si.dwNumberOfProcessors; #endif return( count ); } Make sure you have the right headers 'windows.h' for Windows (like duh!) and sysconf.h for Linux. Hope it helps, have fun and may The Source be with you :) |

English | 中文 | Русский | Français
Eric Palmer (Intel)
|
Heath
Is it possible to detect the number of physical processors using this technique? The sample works perfectly but I need a method to query the processor(s) directly to get a count of physical processors in the system without using Windows API.
Many thanks.