Contents
- Overview
- Benchmark Tests
- Benchmark Test Suite Explanation
- Terminology
- Benchmark Results
- Conclusion 8
Overview
At Intel, we constantly benchmark virtual environment deployment scenarios on platforms with Intel® Virtualization Technology. Our testing helps us develop some valuable techniques for optimizing virtual environments and various types of workloads. These techniques are outlined in the paper “Optimizing Virtual Environments on Intel® Virtualization Technology-enabled Platforms.”
The performance goal for a fully virtualized workload is to achieve as close to native performance as possible. Native performance is defined as running the same workload on the same system configuration in a non-virtualized environment and measuring the results. There is typically a performance difference, or delta, between servers running virtual environments and non-virtualized servers. Managing the virtual environment always requires some overhead, and other factors can affect this delta.
The benchmark results in this paper show how our optimization techniques helped minimize the delta between virtual and native performance. The benchmarks are based on SUSE Linux Enterprise Server* 10 using the Xen* virtual machine manager (VMM) hosts and both Linux and Microsoft Windows* guests on Intel Virtualization Technology-enabled platforms.
Benchmark Tests
Our tests covered three different types of environments:
Test Environment 1: Single IA32 SUSE Linux Enterprise Server 10 Virtual Machine guest running on IA32pae (physical address extension) SUSE Linux Enterprise Server 10 host. We compared fully virtualized Linux guest performance vs. native performance on the following tests: SysBench and SPECjbb.
Test Environment 2: Single Windows* 2003 SP1 x32 Virtual Machine guest running on IA32e SUSE Linux Enterprise Server 10 host. We compared fully virtualized Windows guest performance vs. native performance using SPECjbb.
Test Envi ronment 3: Single Win2k3 SP1 x32 Virtual Machine guest running on IA32pae SUSE Linux Enterprise Server 10 host. We compared fully virtualized Windows guest performance vs. native performance using a SQL Server workload and running SysBench.
Benchmark Test Suite Explanation
SPECjbb is a Java* program emulating a three-tier system with emphasis on the middle tier. Random input selection represents the first-tier user interface. SPECjbb fully implements the middle-tier business logic. The third-tier database is replaced by binary trees.
SPECjbb is inspired by the TPC-C benchmark and loosely follows the TPC-C specification for its schema, input generation, and transaction profile. SPECjbb replaces database tables with Java classes, and it replaces data records with Java objects. The objects are held by either binary trees (also Java objects) or other data objects. SPECjbb runs in a single JVM in which threads represent terminals in a warehouse. Each thread independently generates random input (tier 1 emulation) before calling transaction-specific business logic. The business logic operates on the data held in the binary trees (tier 3 emulation). The benchmark does no disk I/O or network I/O.
The metric is ops/second (bigger is better). SPECjbb ops/second is a composite throughput measurement representing throughput averaged over a range of points. For more information, see: www.spec.org/jbb2005 or www.spec.org/jbb2000.
SysBench is a modular, cross-platform and multithreaded benchmark tool for evaluating OS parameters that are important for a system running a database under intensive load. The idea of this benchmark suite is to quickly get an impression about system performance without setting up complex database benchmarks.
CPU: The CPU test mode is one of the simplest benchmarks in SysBench. In this mode, each request consists of a calculation of prime numbers up to a value specified by the CPU-max-primes option. All calculations are performed using 64-bit integers. Each thread executes the requests concurrently until either the total number of requests or the total execution time exceeds the limits specified by the common command-line options.
Threads: This test mode benchmarks scheduler performance – more specifically the cases when a scheduler has a large number of threads competing for a set of mutexes.
Mutex: The purpose of this benchmark is to examine the performance of mutex implementation. This test mode emulates a situation when all threads run concurrently most of the time, acquiring the mutex lock only for a short period of time.
Memory: This test mode benchmarks sequential memory reads or writes. Depending on command-line options, each thread can access either a global or local block for all memory operations.
Fileio: This test mode produces various kinds of file I/O workloads. At the Prepare stage, SysBench creates a number of files with a particular total size. Then at the Run stage, each thread performs certain I/O operations on the set of files.
OLTP: This test mode benchmarks real database performance.
SysBench creates a specified number of threads and a specified number of mutexes. Then, each thread starts running the requests consisting of locking the mutex, yielding the CPU so the thread is placed in the run queue by the scheduler, and unlocking the mutex when the thread is rescheduled back to execution. For each request, the above actions are run several times in a loop, so the more iterations performed, the more concurrency is placed on each mutex.
The metrics are execution times of each test (smaller is better). For more information, see: sysbench
Native: Performance results from running a workload on a non-virtualized system.
VMX: Performance results from running a workload on a fully virtualized guest that uses Intel® Virtualization Technology.
VMX/Native: Performance percentage, showing the delta between fully virtualized and native performance.
Physical address extensions (PAE): Aa processor feature that allows for up to 64 GB of memory to be used in 32-bit systems, given the appropriate OS support.
Note: In every figure, 1 or 100% equals the performance of the same configuration in a native environment.
Benchmark Results
Table 1 lists the system configuration for this environment.
Table 1. Test Environment 1 Configuration
|
CPU |
Intel® Xeon® 5100 3 Ghz |
|
Guest Operating System |
SUSE Linux Enterprise Server 10 |
|
Hypervisor |
Xen 3.0.2 (which is integrated within SUSE Linux Enterprise Server 10) |
|
Native Operating System |
SUSE Linux Enterprise Server 10 |
|
Guest Memory |
512 MB |
|
Native Memory |
512 MB |

|
SysBench |
CPU |
Threads |
Mutex |
Memory |
Fileio |
OLTP |
|
Native |
45.31 |
115.66 |
32.6 |
44.4 |
40.95 |
132.25 |
|
VMX |
45.71 |
121.45 |
33.45 |
45 |
76.74 |
178.23 |
|
VMX/Native |
99.12% |
95.22% |
97.46% |
98.67% |
53.37% |
74.20% |
SPECjbb Results

|
SPECjbb |
Throughput |
|
Native |
54970 |
|
VMX |
48506 |
|
VMX/Native |
88.24% |
Test Environment 2 – SPECjbb Results: SPECjbb2005
Table 2 lists the system configuration for this environment.
Table 2. Test Environment 2 Configuration
|
CPU |
Intel® Xeon® 5100 3 Ghz |
|
Guest Operating System |
Microsoft Windows* 2003 SP1 |
|
Hypervisor |
Xen 3.0.2 (which is integrated within SUSE Linux Enterprise Server 10) |
|
Native Operating System |
SUSE Linux Enterprise Server 10 |

|
SPECjbb2005 |
Native Win2003 SP1 X32 |
VMX Win2003 SP1 X32 |
VMX/Native |
|
Run1 |
25171 |
23951 |
0.951531624 |
|
Run2 |
25267 |
23895 |
0.945699925 |
|
Run3 |
25372 |
23999 |
0.945885228 |
Test Environment 3 – SysBench Results: SQL Server 2005 OLTP
Table 3 lists the system configuration for this environment.
Table 3. Test Environment 3 Configuration
|
CPU |
Intel® Xeon® 5100 3 Ghz |
|
Guest Operating System |
Microsoft Windows* 2003 SP1 |
|
Hypervisor |
Xen 3.0.2 (which is integrated within SUSE Linux Enterprise Server 10) |
|
Native Operating System |
SUSE Linux Enterprise Server 10 |

|
SysBench |
Native Win2003 SP1 X32 |
VMX Win2003 SP1 X32 |
VMX/Native |
|
Run1 |
270.27 |
170.94 |
0.632478632 |
|
Run2 |
284.9 |
173.61 |
0.609371709 |
|
Run3 |
271.72 |
172.12 |
0.626528829 |
Conclusion
Minimizing the delta between virtualized and native performance gives your virtualized systems more resources for running your workloads. Using good optimization techniques can reduce this delta. Optimization techniques our testing labs engineers developed and adopted over many benchmark tests resulted in the level of virtualization performance seen in the benchmarks enclosed here.
In your environment, the specific workloads being run can have an impact on your own testing, so your results might vary from those seen here. Often, experimentation on non-production systems can lead to further optimization for your particular environment. And your VMM developer might also have valuable recommendations for added optimizations; consult the developer’s web site for such information.
