Boosting OpenSSL AES Encryption with Intel® IPP

•l Introduction
•l Hardware and Software Requirements
•l Building the Application : Windows* build and Linux* build
•l Performance Comparison: Intel AES NI Benefit and Intel® IPP multithread Benefit
•l Known Issues and Limitations
•l Case Study

Intel® AES instructions are a new set of instructions available on the 32nm Intel® microarchitecture (formerly codenamed Westmere-EP). These instructions enable fast and secure data encryption and decryption, using the Advanced Encryption Standard (AES) which is defined by FIPS Publication number 197 and widely used today in secure commerce, database and full disk encryption. 

The Intel® AES-NI consists of seven instructions. Six of them offer full hardware support for AES. Four instructions support AES encryption and decryption, and the other two instructions support AES key expansion. The seventh aids in carry-less multiplication.  The AES instructions have the flexibility to support all usages of AES, including all standard key lengths, standard modes of operation, and even some nonstandard or future variants. They offer a significant increase in performance compared to the current pure-software implementations.

See the details in:
Intel® Advanced Encryption Standard (AES) Instructions Set - Rev 3
Securing the Enterprise with Intel® AES-NI

Intel® Integrated Performance Primitives (Intel® IPP) 6.1 update 2 and later versions support AES-NI instructions in the Intel IPP Cryptography library and the Intel IPP data Compression domain. See AES-NI support in Intel® IPP. The IPP Cryptography implementation allows users to easily migrate their original application to a new platform without any code changes and gain immediate performance benefits on new processors.

OpenSSL* is one of the most popular cryptographic libraries for security applications. Intel IPP provides an example in the cryptography sample package, which uses IPP functions to replace the corresponding functions in OpenSSL*. The sample support three IPP optimized popular cryptographic algorithms: Advanced Encryption Standard (AES), Security Hash Standard (SHA) and widely used public key cryptosystem RSA.

This paper focuses on the Advanced Encryption Standard (AES), and illustrates a way to use IPP AES API to replace basic AES routines in OpenSSL and examine the performance benefit.

Hardware and Software Requirement

Intel IPP main package
•l  Windows*: Intel® Parallel Studio - Intel® Parallel Composer updates 5, composer_update5_setup.exe
•l  Linux*: l _ipp_em64t_p_6.1.4.059.tar.gz (Get the latest version from Intel® IPP home page)

IPP crypto separate package  
Note: Intel IPP crypto package is released as seperated package as export law limitation.  Please refer to below KB article to get the corresponding crypto package. Where Do I Download The Intel® IPP Cryptography Libraries? and make sure the version number of crypto package and main Intel IPP package should be exact same).
•l   Windows*: composer_crypto_ipp_update5_intel64_setup.exe
•l   Linux*: l _crypto_ipp_em64t_p_6.1.4.059.tar.gz

•l Intel IPP crypto sample : l/w _ipp-samples-cryptography_p_6.1.4.065.tgz
•l OpenSSL* package :  openssl-0.9.8j.tar.tar (Download from the )

Hardware: Intel® Xeon® processor 5600 series (formerly codenamed Westmere-EP), which supports Intel® AES-NI.

This application note has been tested on 2 x 2.93GHz Intel Xeon processor, Westmere-EP QPI 6.4 1333 6 core 12MB cache, NUMA enabled, HT Disabled, Total 12 cores.

•l   Windows* Server 2003, Windows* XP, Windows* Vista, Windows* Server 2008; Redhat EL 5.x. SUSE9.x
•l   Microsoft* Visual C++* 2005/2008
•l   Intel Parallel Composer/ Intel® C/C++ Compiler 11.1/ GCC 4.1.x
This application note applies to the GCC 4.1.2 Compiler and Red Hat* EL5.4 for x64 kernel version 2.6.18-164.el5

Building Application

Under Windows*:
Step 1: Install the Intel IPP main package and the Intel IPP crypto package in order.
Intel Parallel Studio is a suit tools Intel provided for easy parallel programming for Windows* users.  In Intel Parallel Composer, Intel IPP is one of the key components. If you installed Intel Parallel Composer, the IPP library will be installed by default and located in the directory:
C:\Program Files (x86)\Intel\Parallel Studio\Composer\ipp\em64t

Step 2: Unpack the Intel IPP crypto sample and the OpenSSL* package.
Assume that the work directory is as below

Step3: Patch OpenSSL* with the Intel IPP API and build it for a target system:
The Windows* build requires the below software.  Presume they are ready on the test machine:
-  Cygwin* or MinGW
-  Microsoft* Macro Assembler Version 6.15
-  Microsoft* Platform SDK
-  Perl
In Cygwin* or MinGW Commands windows:
$ cd ./openssl-0.9.8j
$ patch -p 1 -i ../patch-0.9.8j_win

In Intel Parallel Studio Intel® 64 Microsoft* Visual Studio 2005 Commands:  (open the windows from Start=>All Programs=>Intel Parallel Studio =>Command Prompt
$ cd ..
$ winem64t
The resulting libraries will be copied into the .\usr\local\ssl_ipp\em64t\bin directories.

Step 4: Build the performance test program
$ winem64t
The resulting executables timing_aes.exe, timing_rsa.exe and timing_sha.exe will be created in ./bin/winem64t.

Under Linux:
Step 1: Unpack the Intel IPP main package and the Intel IPP crypto package and install them in order:
$ tar -xzvf l_ipp_em64t_p_6.1.4.059.tar.gz
$ ./l_ipp_em64t_p_6.1.4.059/
(may require root passwd if install library to default directory /opt/intel/ipp)
$ tar -xzvf l_crypto_ipp_em64t_p_6.1.4.059.tar.gz
$ ./l_crypto_ipp_em64t_p_6.1.4.059/

Step 2: unpack the Intel IPP crypto sample and the OpenSSL* package.
Assume that the two tar files are in work directory:/home/Mywork/
$ tar -xzvf  l_ipp-samples-cryptography_p_6.1.4.065.tgz
$ cd /home/Mywork/ipp-samples/cryptography/openssl-ipp
$ tar -xvf ../../../openssl-0.9.8j.tar.tar

Step3: Patch OpenSSL* with the Intel IPP API and build it for a target Linux*-based system:
$ cd ./openssl-0.9.8j
$ patch -p1 <../patch-0.9.8j_lin
$ cd ..
$ ./ openssl-0.9.8j linuxem64t
The resulting libraries will be copied into the ./usr/local/ssl_ipp/bin | include | lib directories.

$ ./ openssl-0.9.8j dynamic
The resulting libraries will be copied into the ./usr/local/ssl/bin | include | lib directories.

Step 4: Build the performance test program
$. ./ linuxem64t
The resulting executables timing_aes , timing_rsa and timing_sha will be created in ./bin.

•l Performance Comparison
Intel AES-NI benefit:
As all computations in AES operation are being done by hardware, it offers a significant increase in performance compared to the current pure-software implementations.

Intel IPP cryptography library includes AES-NI optimizations in IPP 6.1 update 2 and later. It provides the below functions API for AES operation:
ippsRijndael128{Encrypt|Decrypt{ECB|CBC|CFB|OFB|CTR} }
ippsRijndael192{Encrypt|Decrypt{ECB|CBC|CFB|OFB|CTR} }
ippsRijndae256{Encrypt|Decrypt{ECB|CBC|CFB|OFB|CTR} }

The Intel IPP function's name shows what crypto mode and what key length it supports. Thereby it is easy to replace the associated AES routines in OpenSSL*. For example, encrypt with AES CBC mode, call Intel IPP API ippsRijndael128EncryptCBC() to replace the call AES_encrypt() in function AES_cbc_encrypt() in file openssl-0.9.8j\crypto\aes\aec_cbc.c.

The figure 1 show the replacement

(See more the replacement in the Patch file)

Running the performance test using the command : $  linuxem64t  ssl
It will collect performance data for AES based on original OpenSSL* code for target platform Intel® 64. The data will be stored in ssl_perf_aes_linuxem64. csv files.

Running : $  linuxem64t  ssl_ipp
It will collect similar performance data but based on patched OpenSSL code for the target platform. In this case data will be stored in ssl_ipp_perf _aes_linuxem64. csv files.

The performance test to encrypt a piece of plaintext, then decrypt it, measure the performance: how many cpu cycles to handle each byte will take.  The input plaintext is random 8bit number. The length of plaintext is 256bytes, 512bytes, 1024bytes, 2048bytes, 2096bytes, 8192bytes and 16384bytes, get the average from 1000 loops for each operation. Set thread number OMP_NUM_THREADS=1.   Summary the AES performance data as below,


























Figure 2:


The algorithm and implementation of AES are highly optimized in Intel IPP. The table above shows that, on 1 core with AES NI, Intel IPP based AEC encryption performs 3x faster and decryption perform over 10x than OpenSSL* libraries (pure software implements). 

IPP internal multithreading benefit:
In addition, as we know, beside the architecture optimization, part of IPP function also is threaded internally. Seeing from the ThreadedFunctionsList.txt file under /opt/intel/ipp/, some AES functions also are threaded.

Please see more information from Threading and Intel® Integrated Performance Primitives.

AES CBC and CFB decrypt are threaded. We select them. Additionally, considering the mulitithreading overhead, we select the heavy workload, the length of plaintext 16384 bytes as test input. The same test run on 12 cores, the test results as below. T: OpenMP thread number=1, 2, 4, 6, 8, 12








IPP T=12


Max Speed-up





















Figure 3:

As internal threading, IPP AES can benefit from multi-core architecture naturally. The test demonstrate that a 3x reduction and maximum 40x speed-up comparing OpenSSL* can be achieved to decrypt 16384 bytes data when IPP internal threading is on. The feature is available in IPP dynamic library by default and IPP static thread library.

We must notice that it is not always true for the more threads number, the better. In fact, for such workload 16384 bytes data, when the threads number is 6, we get best performance.  This is a trade-off between workload and threading overhead. The default maximum number of OpenMP threads used by the multi-threaded IPP primitives is equal to the number of hardware threads in the system. you may control OpenMP Threading in the Intel IPP Primitives manually according to the workload. Please see OpenMP and the Intel® IPP Library

In summary, Intel IPP offer significant performance improvements as adopt AES-NI and threaded within function. When used IPP AES API in multicore westermere processor environments, about 3x in CBC encrypt and 10x in CBC decrypt speed up, and more than 40x in parallel modes of AES decrypt operation comparing with OpenSSL AES pure software and serial implementation.

*The tests in the application notes were running with the latest IPP 6.1 update 4 and openssl-0.9.8j on 2 x 2.93GHz, 12 core Xeon processors running with Redhat EL5.4 for x64 kernel version 2.6.18-164.el5

Here is some more performance benchmark data with IPP 6.1 update 3 and openssl-0.9.8.

And the performance on Windows* keeps consistency

•l Known Issues and Limitations
•  Lower performance when HT is enabled.
Please see IPP Crypto Sample Performance for OpenSSL too Slow on Hyper-Threading Systems

•  The workload is so small that too many multithreads shouldn't be used.
As the figure 3 shows, when work granularity is so small, it is not worth to distribute the workload to several threads. In fact, too many threads bring negative affect on performance because of the threads itself have overhead. It is a feature for any multi-threading application. IPP provide function ippSetNumThreads ()to control the thread number.  Please See OpenMP* and the Intel® IPP Library

•l         Case study

IPP AES API gives users the automatic boost from new silicon without work and does not require changes to their S/W to support it. Many customers love it
•  Here is one case study from Giant in Chinese and in English

The further AES optimization are introduced in new IPP version. The latest one is from Intel Parallel Studio 2011. See Getting Intel® IPP from stand-alone version or bundled products?

Optimization Notice in English

For more complete information about compiler optimizations, see our Optimization Notice.