By Quoc-Thai V Le,
Published:05/11/2015 Last Updated:05/11/2015
Intel® Intelligent Storage Acceleration Library (Intel® ISA-L) provides the tools to help accelerate and optimize storage on Intel architecture (IA) for everything from small office NAS appliances to enterprise storage systems. Intel® ISA-L can run on a variety of Intel® server processors and provides operation acceleration via the following instruction sets:
Xen* project hypervisor is an open source hypervisor (or virtual machine manager – VMM) using a microkernel design, providing services that allow multiple computer operating systems to execute concurrently on the same computer hardware.
In today’s private or public cloud infrastructure, software defined storage (SDS) uses virtualization. To answer storage developers’ questions about how Intel® ISA-L performs under a VMM, the Intel team ran Intel® ISA-L using a single threaded test suite (warm cache) on both a barebones system and on a Xen* project hypervisor. This article captures the performance data and lists the setup instructions for developers interested in reproducing this experiment in their own environment.
The warm cache performance data in Table 1 shows the results (delta < ~9%) from running Intel® ISA-L on a single core, single threaded barebones system and on a VMM. The data has been converted to raw data (GB/s) with the barebones system running SLES standardized at relative throughput of 1.0 and the Xen-Ubuntu* results showing the throughput gain:
Table 1: Running Intel® ISA-L on a barebones system vs. under a virtual machine
Intel ISA-L function |
SLES (barebones system) |
Ubuntu |
---|---|---|
PQ Gen (16+2) |
1 |
0.92 |
XOR Gen (16+1) |
1 |
0.96 |
Reed Solomon EC (10+4) |
1 |
0.91 |
Multibuffer SHA-1 & 256 |
1 |
0.99 |
Multibuffer SHA-512 | 1 | 0.98 |
Multibuffer SHA- MD5 | 1 | 0.97 |
AES-XTS 128 |
1 |
0.99 |
AES-XTS 256 | 1 | 0.98 |
CRC T10 |
1 |
1.0 |
CRC IEEE (802.3) |
1 |
0.97 |
CRC32 iSCSI |
1 |
1.0 |
Compress “Deflate” & Compress “Stateless” |
1 |
0.98 |
"make perfs"
. This will build all unit function tests set for 'cache cold – larger data set exceeds LLC size'
"make perfs D=CACHED_TEST"
('cache warm– smaller data set fits within cache’
)"make perf_report"
. This will run each unit test supported by the platform architecture. Performance results will output to the console.Optional: Run “make other”. This will build additional functions including compression functions and unit tests. Compression tests (igzip_file_perf and igzip_stateless_file_perf) are run using each file of a standard corpus -The Calgary Corpus - as an input. It is available here.
The following unit test results are reported in the above snapshot/overview data Table 1:
Note: Actual Unit tests run and the data reported will depend on architecture and instructions supported.
Each unit test will report results in MB/s. For normalization across platforms, cycles/byte is reported based on the throughput and system frequency.
Compression test results are reported as a weighted average of the results from each file being compressed. The throughput is then also calculated to cycles/byte.
Table 2: Configuration Used for Testing
CPU & Chipset |
Intel® Xeon® processor E5-2697 v3 processor, 2.6GHz
|
Platform |
Platform: Intel baseboard system (code named Wildcat Pass)
|
Memory |
Memory size: 128GB (8x16GB) DDR4 2133P Brand/model: Samsung M393A2G40DB0-CPB, NUMA Memory Configuration |
Storage |
Brand & model: 80GB Western Digital* Caviar Blue (WD800AAJS) |
Operating system |
SLES* 11 SP3 64-bit OS, |
Test functions |
Functions run from user space. Functions average multiple cycles.& Functions run “cache cold” conditions. For some functions “cache warm” conditions may result in higher performance. |
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.