This document addresses some of the performance issues inherent within a virtualized environment. Those interested in this whitepaper may include individuals and groups looking at using virtualized environments for the first (or very nearly) time. Once you have gone through the exercise of determining the performance of your server(s)/application(s) in a virtualized environment, you will be able to move forward with virtualization more comfortably.
FSB scales effectively. That is, higher FSB frequency will shows better performance with VMs. So using the faster available memory with the fastest FSB gives best performance. Core Frequency (CPU speed) has also seemed to scale effectively.
Virtualization deployments benefit from larger cache. As in un-virtualized environments, cache will show different amounts of benefit for different workloads. If your workloads use very little cache, then there is no benefit in using larger size cache. However, you must remember the amount of cache used is the sum of cache needed of all workloads and your VMM.
The prefetch settings which provide the best performance for native O/S’s usually provide the best performance for the same application/server running in a virtualized environment. You should investigate which prefetcher settings to use for your set of VMs.
Some Virtual Machine Managers have optimizations for NUMA architecture. Typically this requires node interleaving to be disabled in BIOS. In such cases, disabling node interleaving can bring better virtualization performance than enabling it.
In order to run 64-Bit guest O/S’s, Intel® Virtualization Technology (VT) must be enabled in the BIOS on Intel Corporation© systems.
Scoring the results
Scoring requires proper balance between the VMs and the (critical importance) of an individual VM. The easiest method is to assign equivalent weighting. Also, the method of scoring depends upon the presentation to be used. In a comparison, use one set as the baseline and score each comparable VM accordingly. To determine a final score, use the geometric mean of the relative scores of the different VMs Clients. This gives you a single score to use for the relative performance of the systems being compared.
Time of test
Since running VMs have to share resources (CPU time, Memory, Disk, Network, etc.), it is best to have all benchmarks utilize equivalent execution times. This is necessary to prevent a benchmark from running al one (i.e. all other have finished), thereby artificially inflating it’s performance.
You should determine the number of virtual machines (VMs), as well as the O/S for each VM. You will get better performance by having the VMs spread across the available physical disks for their “local HDD”. The best performance is by having a separate physical disk for each VM, thereby utilizing more spindles. However, utilizing two VMs per spindle usually has acceptable performance in non-disk intensive VMs (VMs using very little of their virtualized disks).
How the vCPUs of a VM are assigned to the physical cores (pCPU) or threads (logical CPU) can affect the performance of the VMs and the platform. The following possible vCPU configurations are examined.
- No Affinity - the algorithms of the VMM scheduler determine how the tasks that will execute on the vCPUs will be assigned to the cores or threads
- Affinity or Pinned – the algorithms of the VMM scheduler are overridden by making a hard assignment of the vCPUs to one or more cores or threads. If a VM is pinned to more than one core, the location of the cores determines if the cores are pinned within the same socket (package) or across sockets.
Figure 1 vCPU to pCPU Affinity Configurations
VMMs typically configure newly created VMs to run their vCPUs with No Affinity. In general, running with no affinity is the best configuration option as it allows the VMM scheduler to share the vCPU load of all VMs across all of the available pCPUs. Configuring VMs with no affinity is particularly beneficial when the total vCPUs in all of the active VMs are greater than the pCPUs, or the pCPUs are overcommitted. In an overcommitted configuration, the scheduler algorithms are able to load balance the tasks that are assigned to each vCPU to all of the pCPUs.
VMMs may allow the user to make hard assignments or ‘pin’ VM vCPU(s) to specific pCPU(s). When a VM is pinned to pCPUs, the VMM scheduler will only run those VM vCPUs on those specific pCPUs and no others. Pinning is typically a special case configuration where the characteristics of the workload would benefit from executing on specific cores. One workload characteristic of particular interest is the impact of the pCPU caches on the workload performance.
The performance of some workloads can benefit from always executing on the same core and the various caches associated with that core by minimizing the cache misses per instruction. If a VM is pinned to more than one pCPU, pinning the pCPUs within a socket or across a socket can also impact performance. Given the difference between workloads, this performance impact is best determined by experimentation. Note that platform BIOSs number the pCPUs differently, so the user must be aware of the physical location of a core to the number presented by the BIOS.
Example: In a dual core, dual processor platform (4 pCPUs), the following pCPU numbering has been observed.
The user should know the exact location of each physical CPU before beginning any pinning experiments. The following link contains a simple utility that will report the O/S numbering compared to the physical location (socket) of the CPU for Intel Corporation© systems.
The inaccuracy of system time of VMs has been well documented. This behavior has been labeled time drift. Because of the time drift while executing, it is recommended to use Network Time Protocol (NTP). Use LINUX to set up an NTP server within the network of the physical machine(s) being used. This will allow the VMs to sync up their system time. The observed overhead appears to be less than 1%.
On Windows 2003*, which uses Simple Network Time Protocol (SNTP), there are 2 UDP packets received and 1 UDP packet sent by the system under test.
On SLES 9 SP1, which uses NTP, there are 4 TCP packets received and 1 TCP packet sent.
On a lightly loaded system, updating every 10 minutes is sufficient. On a heavily loaded system, it is recommended to update every minute. I have set updates to occur every minute, without a significant load added to the system.
If the VMM being used has a time synch option, it should be enabled.
There are just a few steps necessary to have your Windows 2003 system synchronizing with your NTP server, even from within a Virtual Machine.
Right click on your time display. Then click on the Adjust/Time selection. Then click on the Internet Time tab. Enter the IP address or Name of the NTP server you wish to synchronize with.
You will need to create a batch file for your scheduled task to execute. The batch file needs only one line:
Go to C:WindowsTasks folder. You will see an entry to “Add a scheduled task”. Click on it to bring up the wizard. Browse to specify your newly created batch file as the executable for the new task. Specify daily execution. After you have created the task, use the advanced features to execute it every 10 minutes or 5 minutes as necessary.
There are just a few steps necessary to have your LINUX system synchronizing with your NTP server, even from within a Virtual Machine.
Edit the file /etc/ntp.conf. Specify your server by adding a line like:
Create a bash shell script with the lines:
/etc/init.d/xntpd restart 2>&1 > /dev/null
Edit /etc/crontab to add an entry for your newly created shell script. It would look like (to execute every 5 minutes):
0,5,10,15,20,25,30,35,40,45,50,55 * * * root /root/myNewScript
In order to synchronize all tests starting and ending simultaneously, use network trigger files. Samba is an Open Source/Free Software suite that has, since 1992, provided file and print services to all manner of SMB/CIFS clients, including the numerous versions of Microsoft Windows operating systems. Samba is freely available under the GNU General Public License. Setup a samba share on your NTP server. Trigger files can be also used for transmitting necessary initialization data to tests running on VMs as well as clients. You can also store results from the clients and the VMs to allow for automatic generations of results.
For more information on virtualization, visit the virtualization community at: http://software.intel.com/en-us/virtualization.
About the Author
Steven Allen Thomsen is a Senior Performance Engineer at Intel Corporation, working on Server Platform performance in the areas of Virtualization, Databases, Encryption, Scaling, Telephony, Web Server and Linux*. He has worked with Virtualization since 2000.