Intel® Intelligent Storage Acceleration Library Performance under Xen* Project Hypervisor

Abstract

Intel® Intelligent Storage Acceleration Library (Intel® ISA-L) provides the tools to help accelerate and optimize storage on Intel architecture (IA) for everything from small office NAS appliances to enterprise storage systems. Intel® ISA-L can run on a variety of Intel® server processors and provides operation acceleration via the following instruction sets:

  • Intel® Advanced Encryption Standard-New Instruction (AES-NI),
  • Intel® Streaming SIMD Extensions (Intel® SSE),
  • Intel® Advanced Vector Extensions (Intel® AVX), and Intel® Advanced Vector Extensions 2 (Intel® AVX2).

Xen* project hypervisor is an open source hypervisor (or virtual machine manager – VMM) using a microkernel design, providing services that allow multiple computer operating systems to execute concurrently on the same computer hardware.

In today’s private or public cloud infrastructure, software defined storage (SDS) uses virtualization. To answer storage developers’ questions about how Intel® ISA-L performs under a VMM, the Intel team ran Intel® ISA-L using a single threaded test suite (warm cache) on both a barebones system and on a Xen* project hypervisor. This article captures the performance data and lists the setup instructions for developers interested in reproducing this experiment in their own environment.

Results

The warm cache performance data in Table 1 shows the results (delta < ~9%) from running Intel® ISA-L on a single core, single threaded barebones system and on a VMM. The data has been converted to raw data (GB/s) with the barebones system running SLES standardized at relative throughput of 1.0 and the Xen-Ubuntu* results showing the throughput gain:

Table 1: Running Intel® ISA-L on a barebones system vs. under a virtual machine

Intel ISA-L function

SLES (barebones system)

Ubuntu
Xen* Project Hypervisor

PQ Gen (16+2)

1

0.92

XOR Gen (16+1)

1

0.96

Reed Solomon EC (10+4)

1

0.91

Multibuffer SHA-1 & 256

1

0.99

Multibuffer SHA-51210.98
Multibuffer SHA- MD510.97

AES-XTS 128

1

0.99

AES-XTS 25610.98

CRC T10

1

1.0

CRC IEEE (802.3)

1

0.97

CRC32 iSCSI

1

1.0

Compress “Deflate” & Compress “Stateless”

1

0.98

How to Set Up a System Using a Xen* Project Hypervisor for Performance Tests

  1. Install SLES11SP3, using default settings wherever possible. Add Xen* as the primary function and “C++ compiler” package groups during software selection step.
  2. Boot into SLES and open YAST. Set Xen* project hypervisor as the default (under Boot Loader configuration).
  3. Reboot into SLES on Xen*.
  4. Create a new VMM following the steps documented at https://www.suse.com/documentation/sles11/singlehtml/book_xen/book_xen.html#sec.xen.vm.create

    Note: No special configuration is necessary to enable AVX/AVX2 but see step 9.
     
  5. Specify 1 core, 2GB RAM, and “Ubuntu-Other” as the operating system.
  6. Provide an Ubuntu 14.04 ISO as the install disk.
  7. Launch the Virtual Machine Manager.
  8. In the Virtual hardware details, set the Machine Type to "xenpv"
  9. Start Ubuntu by double clicking the Ubuntu icon.
  10. Once booted into Xen*, run “cat /proc/cpuinfo | grep aux” to check for CPU flags. If “avx”, “avx2” are not listed, you will have to fix the VMM configuration or use a different processor.


    Figure 1: Confirm the CPU configuration

How to Set Up Intel® ISA-L (with or without a VMM)

  1. To access the full suite of Intel® Storage Acceleration Library functions, please fill out and submit this request form.
    You will receive an email that provides information on how to get the complete ISA-L zip file.
  2. Download and unzip the library source into the OS. The folder should contain the following:


    Figure 2: Unzipped ISA-L directory listing
     
  3. Read the ISA-L_Getting_Started.pdf and Release_notes.txt supplied with the source. From the Guide, follow the instructions to build the source depending on your needs.

How to Run the Benchmarks

  1. Run "make perfs". This will build all unit function tests set for 'cache cold – larger data set exceeds LLC size'

    Note: For warm cache, run "make perfs D=CACHED_TEST" ('cache warm– smaller data set fits within cache’)
     
  2. Run "make perf_report". This will run each unit test supported by the platform architecture. Performance results will output to the console.

Optional: Run “make other”. This will build additional functions including compression functions and unit tests. Compression tests (igzip_file_perf and igzip_stateless_file_perf) are run using each file of a standard corpus -The Calgary Corpus - as an input. It is available here.

ISA-L Performance Reporting Categories:

The following unit test results are reported in the above snapshot/overview data Table 1:

  1. aes_xts_128_dec_warm:
  2. aes_xts_256_dec_warm:
  3. crc16_t10dif_warm:
  4. crc32_ieee_warm:
  5. crc32_iscsi_warm:
  6. erasure_code_decode_warm:
  7. md5_mb_avx2_warm:
  8. pq_gen_avx2_warm:
  9. sha1_avx2_mb_warm:
  10. sha256_avx2_warm:
  11. sha512_avx2_warm:
  12. xor_gen_avx_warm:
  13. igzip_file_perf with Calgary Corpus* as an input
  14. igzip_stateless_file_perf with Calgary Corpus* as an input

Note: Actual Unit tests run and the data reported will depend on architecture and instructions supported.

Each unit test will report results in MB/s. For normalization across platforms, cycles/byte is reported based on the throughput and system frequency.

Compression test results are reported as a weighted average of the results from each file being compressed. The throughput is then also calculated to cycles/byte.

Platform Configuration

Table 2: Configuration Used for Testing

CPU & Chipset

Intel® Xeon® processor E5-2697 v3 processor, 2.6GHz

  • # of cores per chip: 14 (only used single core)
  • # of sockets: 2
  • Chipset: Intel® C610 (code named "Wellsburg"), QS (B-1 step)
  • System bus: 9.6GT/s QPI

Platform

Platform: Intel baseboard system (code named Wildcat Pass)

  • BIOS: GRNDSDP1.86B.0046.R00.1502111331 BMC 0.20.6013 FRUSDR 0.10
  • DIMM slots: 24
  • Power supply: 1x1100W

Memory

Memory size: 128GB (8x16GB) DDR4 2133P

Brand/model: Samsung M393A2G40DB0-CPB, NUMA Memory Configuration

Storage

Brand & model: 80GB Western Digital* Caviar Blue (WD800AAJS)

Operating system

SLES* 11 SP3 64-bit OS,
Ubuntu version 14.04.1 LTS running under Xen* Version 1.4.2.
Hypervisor: Xen*, Arch: x86_64, Emulator: qemu-kvm.
Kernel 3.13.0-36-generic.
Compiled under Intel® C Compiler 15.0.2 and yasm 1.3.0

Test functions

Functions run from user space. Functions average multiple cycles.& Functions run “cache cold” conditions. For some functions “cache warm” conditions may result in higher performance.

Related Links and Resources

For more complete information about compiler optimizations, see our Optimization Notice.