Performance Impacts with Optimized Virtual Environments on Intel® Virtualization Technology-based Platforms

Contents


Overview

At Intel, we constantly benchmark virtual environment deployment scenarios on platforms with Intel® Virtualization Technology. Our testing helps us develop some valuable techniques for optimizing virtual environments and various types of workloads. These techniques are outlined in the paper “Optimizing Virtual Environments on Intel® Virtualization Technology-enabled Platforms.”

The performance goal for a fully virtualized workload is to achieve as close to native performance as possible. Native performance is defined as running the same workload on the same system configuration in a non-virtualized environment and measuring the results. There is typically a performance difference, or delta, between servers running virtual environments and non-virtualized servers. Managing the virtual environment always requires some overhead, and other factors can affect this delta.

The benchmark results in this paper show how our optimization techniques helped minimize the delta between virtual and native performance. The benchmarks are based on SUSE Linux Enterprise Server* 10 using the Xen* virtual machine manager (VMM) hosts and both Linux and Microsoft Windows* guests on Intel Virtualization Technology-enabled platforms.

Benchmark Tests

Our tests covered three different types of environments:

Test Environment 1: Single IA32 SUSE Linux Enterprise Server 10 Virtual Machine guest running on IA32pae (physical address extension) SUSE Linux Enterprise Server 10 host. We compared fully virtualized Linux guest performance vs. native performance on the following tests: SysBench and SPECjbb.

Test Environment 2: Single Windows* 2003 SP1 x32 Virtual Machine guest running on IA32e SUSE Linux Enterprise Server 10 host. We compared fully virtualized Windows guest performance vs. native performance using SPECjbb.

Test Envi ronment 3: Single Win2k3 SP1 x32 Virtual Machine guest running on IA32pae SUSE Linux Enterprise Server 10 host. We compared fully virtualized Windows guest performance vs. native performance using a SQL Server workload and running SysBench.

Benchmark Test Suite Explanation

SPECjbb

SPECjbb is a Java* program emulating a three-tier system with emphasis on the middle tier. Random input selection represents the first-tier user interface. SPECjbb fully implements the middle-tier business logic. The third-tier database is replaced by binary trees.

SPECjbb is inspired by the TPC-C benchmark and loosely follows the TPC-C specification for its schema, input generation, and transaction profile. SPECjbb replaces database tables with Java classes, and it replaces data records with Java objects. The objects are held by either binary trees (also Java objects) or other data objects. SPECjbb runs in a single JVM in which threads represent terminals in a warehouse. Each thread independently generates random input (tier 1 emulation) before calling transaction-specific business logic. The business logic operates on the data held in the binary trees (tier 3 emulation). The benchmark does no disk I/O or network I/O.

The metric is ops/second (bigger is better). SPECjbb ops/second is a composite throughput measurement representing throughput averaged over a range of points. For more information, see: www.spec.org/jbb2005 or www.spec.org/jbb2000.

SysBench

SysBench is a modular, cross-platform and multithreaded benchmark tool for evaluating OS parameters that are important for a system running a database under intensive load. The idea of this benchmark suite is to quickly get an impression about system performance without setting up complex database benchmarks.

Test Modes

CPU: The CPU test mode is one of the simplest benchmarks in SysBench. In this mode, each request consists of a calculation of prime numbers up to a value specified by the CPU-max-primes option. All calculations are performed using 64-bit integers. Each thread executes the requests concurrently until either the total number of requests or the total execution time exceeds the limits specified by the common command-line options.

Threads: This test mode benchmarks scheduler performance – more specifically the cases when a scheduler has a large number of threads competing for a set of mutexes.

Mutex: The purpose of this benchmark is to examine the performance of mutex implementation. This test mode emulates a situation when all threads run concurrently most of the time, acquiring the mutex lock only for a short period of time.

Memory: This test mode benchmarks sequential memory reads or writes. Depending on command-line options, each thread can access either a global or local block for all memory operations.

Fileio: This test mode produces various kinds of file I/O workloads. At the Prepare stage, SysBench creates a number of files with a particular total size. Then at the Run stage, each thread performs certain I/O operations on the set of files.

OLTP: This test mode benchmarks real database performance.

SysBench creates a specified number of threads and a specified number of mutexes. Then, each thread starts running the requests consisting of locking the mutex, yielding the CPU so the thread is placed in the run queue by the scheduler, and unlocking the mutex when the thread is rescheduled back to execution. For each request, the above actions are run several times in a loop, so the more iterations performed, the more concurrency is placed on each mutex.

The metrics are execution times of each test (smaller is better). For more information, see: sysbench

Terminology

Native: Performance results from running a workload on a non-virtualized system.

VMX: Performance results from running a workload on a fully virtualized guest that uses Intel® Virtualization Technology.

VMX/Native: Performance percentage, showing the delta between fully virtualized and native performance.

Physical address extensions (PAE): Aa processor feature that allows for up to 64 GB of memory to be used in 32-bit systems, given the appropriate OS support.

Note: In every figure, 1 or 100% equals the performance of the same configuration in a native environment.


Benchmark Results

Test Environment 1

Table 1 lists the system configuration for this environment.

Table 1. Test Environment 1 Configuration

CPU

Intel® Xeon® 5100 3 Ghz

Guest Operating System

SUSE Linux Enterprise Server 10

Hypervisor

Xen 3.0.2 (which is integrated within SUSE Linux Enterprise Server 10)

Native Operating System

SUSE Linux Enterprise Server 10

Guest Memory

512 MB

Native Memory

512 MB

SysBench Results

SysBench

CPU

Threads

Mutex

Memory

Fileio

OLTP

Native

45.31

115.66

32.6

44.4

40.95

132.25

VMX

45.71

121.45

33.45

45

76.74

178.23

VMX/Native

99.12%

95.22%

97.46%

98.67%

53.37%

74.20%


SPECjbb Results

SPECjbb

Throughput

Native

54970

VMX

48506

VMX/Native

88.24%

 

Test Environment 2 – SPECjbb Results: SPECjbb2005

Table 2 lists the system configuration for this environment.

Table 2. Test Environment 2 Configuration

CPU

Intel® Xeon® 5100 3 Ghz

Guest Operating System

Microsoft Windows* 2003 SP1

Hypervisor

Xen 3.0.2 (which is integrated within SUSE Linux Enterprise Server 10)

Native Operating System

SUSE Linux Enterprise Server 10


SPECjbb2005

Native Win2003 SP1 X32

VMX Win2003 SP1 X32

VMX/Native

Run1

25171

23951

0.951531624

Run2

25267

23895

0.945699925

Run3

25372

23999

0.945885228

 

Test Environment 3 – SysBench Results: SQL Server 2005 OLTP

Table 3 lists the system configuration for this environment.

Table 3. Test Environment 3 Configuration

CPU

Intel® Xeon® 5100 3 Ghz

Guest Operating System

Microsoft Windows* 2003 SP1

Hypervisor

Xen 3.0.2 (which is integrated within SUSE Linux Enterprise Server 10)

Native Operating System

SUSE Linux Enterprise Server 10


 

SysBench

Native Win2003 SP1 X32

VMX Win2003 SP1 X32

VMX/Native

Run1

270.27

170.94

0.632478632

Run2

284.9

173.61

0.609371709

Run3

271.72

172.12

0.626528829

 

Conclusion

Minimizing the delta between virtualized and native performance gives your virtualized systems more resources for running your workloads. Using good optimization techniques can reduce this delta. Optimization techniques our testing labs engineers developed and adopted over many benchmark tests resulted in the level of virtualization performance seen in the benchmarks enclosed here.

In your environment, the specific workloads being run can have an impact on your own testing, so your results might vary from those seen here. Often, experimentation on non-production systems can lead to further optimization for your particular environment.  And your VMM developer might also have valuable recommendations for added optimizations; consult the developer’s web site for such information.

Categorie:
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione