Limiting the Number of Cores of Execution on a Windows System

How to Limit the Number of Cores of Execution on a Windows Multicore System


If you have a Windows test machine with a large number of CPU cores that you wish to use for testing performance and compatibility with a lesser number of CPU cores you may be able to use that one machine to test several multicore scenarios. For example, an 8-core machine can be configured to restrict Windows from using all 8-cores, forcing it to use, for example, only 4, 2, or even one core in the machine. This technique can be used to more evenly compare system performance by varying the number of cores on the machine and to verify proper operation or scalability of your software for a variety of multicore configurations.

Note:
You will have to modify your BOOT.INI file or Boot Configuration Data (BCD on Windows Vista) to setup a machine for this purpose.

To restrict the number of CPU cores Windows will use on a multicore machine follow these instructions:

1.) Determine how many logical CPUs are in your system. Do this by opening the following registry key:

HKEY_LOCAL_MACHINEHARDWAREDESCRIPTIONSystemCentralProcessor

There will be one subkey for each logical CPU. The precise number of logical CPUs depends on the number of cores in your processor(s) and whether or not those cores are using Hyper-Threading technology. For example, a four-core machine with Hyper-Threading technology can appear to have eight logical CPUs (two threads per core * four cores = eight logical CPUs or threads).

2.) On a Windows XP system, edit the BOOT.INI file located in the root directory of your boot drive (usually C:).

Find the line in the [Operating Systems] section that boots the Windows system you are using (on systems with multiple Windows boot partitions you must take care to identify the correct line). This line usually looks something like the following:

multi(0)disk(0)rdisk(0)partition(1)WINDOWS="Microsoft Windows XP" /fastdetect

3.) Add one (or more) lines with the /NUMPROC parameter to create a second boot option that limits Windows to the desired number of logical cores.

For example assuming an eight logical CPU system (eight possible execution threads), you might want to add the following boot scenarios (new text is marked in bold):

multi(0)disk(0)rdisk(0)partition(1)WINDOWS="Microsoft Windows XP" /fastdetect
multi(0)disk(0)rdisk(0)partition(1)WINDOWS="Microsoft Windows XP, 1 core" /fastdetect /NUMPROC=1
multi(0)disk(0)rdisk(0)partition(1)WINDOWS="Microsoft Windows XP, 2 cores" /fastdetect /NUMPROC=2
multi(0)disk(0)rdisk(0)partition(1)WINDOWS="Microsoft Windows XP, 4 cores" /fastdetect /NUMPROC=4
multi(0)disk(0)rdisk(0)partition(1)WINDOWS="Microsoft Windows XP, 6 cores" /fastdetect /NUMPROC=6

This configuration would allow you, at boot time, to choose to boot Windows in 1-threads, 2-threads, 4-threads, 6-threads and "all-threads" modes. Your Windows configuration is identical other than the number of execution threads available.

Note:
you should modify each description so you can easily identify the specific boot options at boot time.

For full details of all of the BOOT.INI options see the SysInternals website (now part of Microsoft) at:
http://www.microsoft.com/technet/sysinternals/information/bootini.mspx

To configure your Windows Vista system to include a second boot option you will need a BCD editor. For example, you can use VistaBootPRO (www.pro-networks.org) to create and name additional boot entries in your Vista BCD and then use the Microsoft msconfig tool to configure each of your boot options.

-- Configure the number of threads with msconfig.
-- Use VistaBootPRO to create and name those boot options.

Using msconfig seems to work better than VistaBootPRO to configure boot options. VistaBootPRO is easier to use to create additional BCD boot entries.

Some Related Articles,

Using KMP_AFFINITY to create OpenMP thread mapping to OS proc IDs

Control OpenMP thread affinity with ippSetAffinity function

/sites/products/documentation/studio/composer/en-us/2009/compiler_c/optaps/common/optaps_openmp_thread_affinity.htm

For more complete information about compiler optimizations, see our Optimization Notice.

Comments

I usually do this by executing SetProcessAffinityMask(GetCurrentProcess(), 1/3/7/15...) in the beginning of main(). Probably it's not as complete simulation as your, but I find it quite enough for scalability testing. And the advantage is that one does not have to reboot the system every time.

All about lock-free algorithms, multicore, scalability, parallel computing and related topics:
http://www.1024cores.net


Such techniques can be used not only for performance/scalability testing but also for more extensive correctness testing. I think it's very important in the multi-core world, and for correctness testing your BOOT.INI-based method is more preferable than just SetProcessAffinityMask().
Here is a piece of my recent email correspondence:
------------------------------------------------------------------------
When I manually test multi-threaded software, I usually don't want to test just on multi-core machine. I want to test on as many and as diverse machines as possible. Of course that will include multi-core machines too, but also hyper-threaded machines and single-core machines as well + WindowsXP and Windows7, etc.
Manual testing on single machine (whatever it is) may reveal only let's say several thousands different thread interleavings (out of zillions), even if you run tests for hours. On single-core machine it will be completely different interleavings than on multi-core machine, and that's very important.
Consider following program. There are 2 threads: thread 1 and thread 2. Thread 2 is blocked on an event/semaphore. Thread 1 currently runs and signals thread 2.
On multicore machine thread 2 will be scheduled for execution in tens of microseconds or so. Or if core count is low and system is busy thread 2 may scheduled in several milliseconds.
On singlecore machine, depending on scheduler and thread priorities, thread 2 may be scheduled instantly ejecting thread 1, or scheduled after thread 1 time-slice end, i.e. in tens of milliseconds.
All of the above 4 scenarios may reveal a bug with basically equal probability (depends on a nature of a bug).
So it the same as for single-threaded algorithm testing: one don't want to test his single-threaded algo on single input data, even if the data is considered as "the worst". One want to test his algo on as many data-sets as possible. Number of processors/cores/hardware threads + OS scheduler characteristics is an important "input data" for multi-threaded algorithms (along with actual user input data).
Since hardware/OS becomes an "input data", interesting consequence arises - single dedicated build/test machine is not enough anymore. Team needs a number of dedicated build/test machines...
------------------------------------------------------------------------

... which may be overcome with your BOOT.ini-based technique.

All about lock-free algorithms, multicore, scalability, parallel computing and related topics:
http://www.1024cores.net