Performance Impact of Intel® Secure Key on OpenSSL

Introduction

The goal of this paper is to demonstrate the performance gains obtained when using the Intel® Secure Key in applications that depend on OpenSSL* for cryptographically secure random numbers. We examine three scenarios:

  1. Raw, random number generation
  2. A client application that consumes large quantities of randomness
  3. A server application that depends on cryptographically secure randomness for encrypted sessions

Intel® Secure Key Features

At the heart of Intel® Secure Key is a high-quality, high-performance entropy source and digital random number generator, or DRNG, which has been added to the Intel® 64 and IA-32 Architecture instruction set beginning with Core i5 and i7 processors in the 3rd generation Intel® Core™ processor family. Random numbers are obtained from the DRNG using the RDRAND instruction.

The DRNG can be visualized as three logical components:

  1. A digital entropy source produces random bits from a nondeterministic hardware circuit that is based on thermal noise within the silicon.
  2. An entropy conditioner uses AES in CBC-MAC mode to distill entropy into high-quality, nondeterministic random numbers.
  3. A cryptographically secure, pseudo random number generator (PRNG) compliant to NIST SP800-90A. The specific deterministic random bit generator, or DRBG, chosen is CTR_DRBG, using an AES block cipher.

The DRNG autonomously reseeds itself in a manner that is both unpredictable and transparent to the RDRAND caller. No more than 1022 sequential random numbers will be generated from the same seed value.

The DRNG also contains a number of self-validation processes that include Online Health Tests and Built-In Self Tests, designed to ensure the proper functioning of the entropy source. The end result is a robust, high-performance random number generator that is compliant with NIST SP800-90A, FIPS-140-2 certifiable, and effectively non-deterministic.

Audience

Software developers and system administrators can use this document to understand the performance impact of the RDRAND instructions on operations that require cryptographically secure random numbers.

System Setup and Configuration

For all of these tests, the hardware components shown in Table 1 were used.

Table 1. Hardware components

Component Details
Processor Pre-release, 3rd generation Intel® Core™ Processor, 2.2 GHz, 4 cores, Hyper-Threading disabled
Chipset Intel® Q77 Express Chipset
Memory 4 GB (2x2GB) DDR3-1333
Storage 250 GB Intel® SSD 510 Series (Max rated sequential write throughput: 315 MB/sec)

 

Software Setup

All of the software tests revolve around OpenSSL, an open source toolkit that implements the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols, as well as a general purpose cryptographic library. OpenSSL was chosen because it is a popular library in the UNIX* software environment for cryptographic needs, and includes a PRNG that is cryptographically secure. The software components used in these tests are shown in Table 2.

Table 2. Software components

Component Details
Operating System Ubuntu* Server 10.04 LTS 64-bit
Libraries OpenSSL* 1.0.1a
Applications OpenSSL 1.0.1a
Nginx* 1.0.14
cryptsetup* 1.1.3

 

In v1.0.1, OpenSSL added Intel Secure Key to the ENGINE API, and the RAND_bytes() function directly executes the RDRAND instruction. The Intel® DRNG becomes a drop-in replacement for OpenSSL's own PRNG, and software built around OpenSSL will inherit Intel Secure Key if it is rebuilt against the new library. Note, however, that the DRNG does not need to be seeded via the RAND_seed() function because the DRNG is self-seeding: for optimal performance, code that is aware of the underlying random engine can dispense with gathering entropy for this purpose.

To determine the performance impact of the RDRAND instruction on application software, the OpenSSL suite was built twice: once using the default configuration which enables use of the RDRAND instruction, and once with the source modified to explicitly disable the RDRAND instruction. The modification was made to crypto/engine/eng_rdrand.c to explicitly define ENGINE_load_rdrand as an empty function.

void ENGINE_load_rdrand (void) {}

This allowed a direct comparison of software using the same version of OpenSSL on the same hardware, both with and without RDRAND support.

OpenSSL was compiled from the source distribution using gcc v4.6.1, configured for 64-bit Linux* with the default configuration options.

% ./Configure linux-x86_64

Tests++

Several software tests were made to assess the performance of RDRAND-enabled applications. Each test examines a different usage model of RDRAND.

Test 1: OpenSSL* Random Number Performance

This first test examined the raw performance of OpenSSL's random number generator, with and without RDRAND support. For these tests, Expect* was used to run OpenSSL in interactive mode so that the load and startup of the OpenSSL binary could be excluded. Several hundred iterations were run over the course of multiple days to obtain an average execution time. Each request was for 1 GB of random data using the following interactive command:

OpenSSL> rand –out /dev/null 1073741824

The RDRAND-enabled version of OpenSSL consistently outperformed the non-RDRAND version by an order of magnitude, as shown in Figure 1 and Figure 2.

Test 2: Encrypted Storage Volume Initialization

The second test examined the performance of a typical client and server application: writing random numbers to a disk volume in the post-initialization step of creating an encrypted disk volume. This step is essential for strong security in an encrypted volume as it makes it impossible for an attacker to differentiate between real data and free disk space.

For this test, a 1 GB dm-crypt volume was created using cryptsetup, which was installed as a pre-built binary from the Ubuntu* distribution. The chosen cipher was AES-XTS-plain with a 256-byte key:

% cryptsetup -y --cipher aes-xts-plain --key-size 256 luksFormat /dev/sda4



Figure 1. Execution times for OpenSSL*'s rand command



Figure 2. Throughput from OpenSSL*'s rand command

The post-initialization was performed by opening the volume and using OpenSSL and dd to write cryptographically strong random numbers to the drive:

% cryptsetup luksOpen /dev/sda4 testfs

% openssl rand 1073741824 | dd of=/dev/mapper/testfs bs=1M

This operation consumes random numbers in bulk so we expect to see a significant performance difference between the RDRAND and non-RDRAND operations, but because the disk write time is a fixed cost we also expect the overall performance gain to be less than the ideal case in Test 1.

Multiple runs were made over the course of several hours to obtain an average result for each. The results are shown in Figure 3 and Figure 4. As expected, the time spent writing to disk and managing the pipeline impacts the effective disk throughput. While there is some overlap from the parallelism inherent in the operation-dd can write to disk as OpenSSL continues to generate random numbers-about two seconds are still lost to overhead in both cases.



Figure 3. Time to fill a LUKS Volume with Random Data



Figure 4. Effective Disk Throughput when Filling a LUKS Volume with Random Data

Test 3: Secure Web Server

The third test looked at the performance impact on an SSL web server configured to accept only strong ciphers. For our purposes, a strong cipher was defined as OpenSSL's “HIGH” cipher suite, and at the time the tests were conducted this referred to ciphers with key lengths larger than 128 bits, and some cipher suites with 128-bit keys. The web server chosen was Nginx, and it was built from source against OpenSSL both with and without RDRAND support.

Nginx was configured to use four worker processes (one worker process per core) and accept SSLv3 and TLSv1 protocols. Excerpts from the Nginx configuration file are shown in Figure 5.

In this application, random numbers are a critical component of the initial session setup between a new client and the server, but no further randomness is required once the session has been established. Since the primary concern in this test is the impact of the DRNG on OpenSSL-enabled applications, the goal was to determine the maximum, new connection rate that the web server could handle. This test can be thought of as measuring the server's ability to respond to a sudden rush of client requests, whether those be legitimate clients or malicious connections, such as during a Distributed Denial of Services (DDoS) attack.

worker_processes	 4;

events {
	worker_connections 10240;
}

…

server {

	ssl	 on;

	ssl_session_timeout 5m;

	ssl_protocols SSLv3 TLSv1;
	ssl_ciphers HIGH:!aNULL:!MD5;

	ssl_prefer_server_ciphers on;

	…
}

Figure 5. Excerpts from the Nginx* configuration

The tests were carried out using five client systems, each running httperf to simultaneously generate SSL connections to the test server at a constant rate for a full minute. Each client was monitored to ensure that the individual client systems were not saturated by their httperf runs, so that any connection errors could be attributed to server rather than client limitations. The httperf runs were repeated, gradually increasing the connection rate until the server was no longer able to respond to clients at the same rate at which connections were coming in. This is the point where the server falls behind the clients, and is no longer able to catch up.

To maximize the stress on the server's ability to establish new connections and to eliminate the impact of managing a sustained connection on the results, each client session was a single request for 512 bytes of static data.

The results of the test are shown in Table 3. As expected, there is a small improvement in the number of connections per second that the RDRAND-enabled server can handle, on the order of about 1%. Even though this scenario places a high demand on the need for random numbers, random number generation is just one of many steps in the process of establishing an SSL session. To produce visible gains in a high-level application one needs to either improve the overall performance of the system as a whole, or make large improvements to an individual subsystem as has been done here.

Table 3. Sustainable connection rates for Nginx*

Configuration Maximum Connect Rate (connections/sec)
Non-DRNG 1264
DRNG 1279

 

The larger benefit to the web server is that the DRNG-enabled system has a source of high-quality entropy at its disposal, and the DRNG can deliver it fast enough to provide a small, but measurable, performance boost.

Also interesting is the system activity during the test runs. In both the DRNG and non-DRNG cases the server's CPU was between 98 and 100% busy for the duration of the tests, but in the non-DRNG case, the number of context switches was, on average, 20% greater than in the non-DRNG case as shown in Figure 6.



Figure 6. Context switches for Nginx* under maximum connection rate saturation

Conclusion

Intel Secure Key provides a significant performance boost to OpenSSL's random number generator and those improvements carry through to applications that rely on it. It is not surprising that the most significant gains are seen in applications that consume random numbers in bulk, but measurable savings are observed even in server applications where random number generation is only a small part of a complex system. The reduction in context switching is particularly beneficial, since that is CPU time that is lost completely from the application's point of view.

Terminology

Term Description
AES Advanced encryption standard
CBC-MAC Cipher block chaining message authentication code
DDoS Distributed denial of service attack
DRNG Digital random number generator
FIPS-PUB 140-2 /sites/default/files/m/c/c/5/fips1402.pdf
NIST SP800-90A /sites/default/files/m/4/6/9/DRBGVS.pdf
PRNG Pseudo-random number generator

 

About the Author

John Mechalas lives just outside of Beaverton, Oregon with his wife and their dogs, currently numbering two Irish wolfhounds, and a greyhound. He works in the Developer Relations Division of the Software and Services Group and has been with Intel since 1994. In his spare time John performs improvisational comedy with a number of troupes in the Portland area, and enjoys photography, hiking, and paying someone else to do the yard work.

 

Notices

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you infully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Any software source code reprinted in this document is furnished under a software license and may only be used or copied in accordance with the terms of that license.

Intel, Core and the Intel logo are trademarks of Intel Corporation in the US and/or other countries.

Copyright © 2012 Intel Corporation. All rights reserved.

*Other names and brands may be claimed as the property of others.

Pour de plus amples informations sur les optimisations de compilation, consultez notre Avertissement concernant les optimisations.